Skip to content

Add reduce() list folding function#2435

Open
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:add-reduce-list-folding-function
Open

Add reduce() list folding function#2435
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:add-reduce-list-folding-function

Conversation

@jrgemignani

Copy link
Copy Markdown
Contributor

Implement the openCypher reduce(acc = init, var IN list | body) expression, which folds an arbitrary expression over a list, threading an accumulator across the elements in list order. This closes a long-standing gap (reduce() was previously unsupported) and works both at the SQL top level and inside a cypher() RETURN/WHERE.

Implementation

reduce() is desugared, at transform time, into a correlated scalar subquery over a new ordered aggregate rather than a new executor node, so no PostgreSQL core changes are required:

CASE WHEN list IS NULL THEN NULL
     ELSE COALESCE((SELECT ag_catalog.age_reduce(
                        <init>, '<serialized-body>'::text,
                        r.elem ORDER BY r.ord)
                    FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                   <init>)
END
  • A new cypher_reduce extensible node carries the accumulator/element names and the init/list/body expressions (grammar production, keyword, and the copy/out/read serialization plumbing).
  • The fold body is transformed against a throwaway two-column agtype namespace, its accumulator and element Var references are rewritten to PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a text argument.
  • age_reduce_transfn (a custom agtype aggregate transition function) deserializes and compiles the body once per group with ExecInitExpr, then evaluates it per element with ExecEvalExpr, rebinding the two params. The body is normalized to agtype at transform time so a boolean or other non-agtype result cannot be misread as a by-reference Datum.

Semantics

  • List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  • An empty list yields the initial value; a NULL list yields NULL.
  • The list and initial value may reference outer-query variables (e.g. reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference only the accumulator and element.
  • Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE, and element property-access bodies are all supported.
  • Outer-variable, query-parameter, nested-reduce, and aggregate references inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  • reduce is registered as a safe keyword so it remains usable as a property or map key, preserving backward compatibility.

Tests

Adds the age_reduce regression test (registered in the install SQL and the upgrade template so age_upgrade passes), covering: arithmetic/product/string folds; order sensitivity; empty/NULL list; NULL element and NULL init; list-building and CASE bodies; boolean and comparison bodies; element property access; multiple and nested (in list/init) reduce(); reduce() in boolean expressions, WHERE, and list comprehensions; folds over collected nodes and node list properties; the not-supported rejections; and reduce as a map key. All multi-row results are ordered. 38/38 installcheck pass.

Future work

The body restriction (accumulator and element only) is a property of the standalone expression evaluation and can be relaxed without core changes:

  • Allow loop-invariant outer-variable and cypher $parameter references in the body by capturing them as additional eager aggregate arguments bound to extra param slots.
  • Support a nested reduce() inside the body via an SPI-based evaluation fallback for subquery-bearing bodies. Aggregates inside the body remain intentionally unsupported, matching the openCypher specification.

Co-authored-by: Copilot copilot@github.com

modified: Makefile
modified: age--1.7.0--y.y.y.sql
new file: regress/expected/age_reduce.out
new file: regress/sql/age_reduce.sql
modified: sql/age_aggregate.sql
modified: src/backend/nodes/ag_nodes.c
modified: src/backend/nodes/cypher_copyfuncs.c
modified: src/backend/nodes/cypher_outfuncs.c
modified: src/backend/nodes/cypher_readfuncs.c
modified: src/backend/parser/cypher_analyze.c
modified: src/backend/parser/cypher_clause.c
modified: src/backend/parser/cypher_gram.y
modified: src/backend/utils/adt/agtype.c
modified: src/include/nodes/ag_nodes.h
modified: src/include/nodes/cypher_copyfuncs.h
modified: src/include/nodes/cypher_nodes.h
modified: src/include/nodes/cypher_outfuncs.h
modified: src/include/nodes/cypher_readfuncs.h
modified: src/include/parser/cypher_kwlist.h

Implement the openCypher reduce(acc = init, var IN list | body) expression,
which folds an arbitrary expression over a list, threading an accumulator
across the elements in list order. This closes a long-standing gap (reduce()
was previously unsupported) and works both at the SQL top level and inside a
cypher() RETURN/WHERE.

Implementation
--------------
reduce() is desugared, at transform time, into a correlated scalar subquery
over a new ordered aggregate rather than a new executor node, so no PostgreSQL
core changes are required:

    CASE WHEN list IS NULL THEN NULL
         ELSE COALESCE((SELECT ag_catalog.age_reduce(
                            <init>, '<serialized-body>'::text,
                            r.elem ORDER BY r.ord)
                        FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)),
                       <init>)
    END

  - A new cypher_reduce extensible node carries the accumulator/element names
    and the init/list/body expressions (grammar production, keyword, and the
    copy/out/read serialization plumbing).
  - The fold body is transformed against a throwaway two-column agtype
    namespace, its accumulator and element Var references are rewritten to
    PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a
    text argument.
  - age_reduce_transfn (a custom agtype aggregate transition function)
    deserializes and compiles the body once per group with ExecInitExpr, then
    evaluates it per element with ExecEvalExpr, rebinding the two params. The
    body is normalized to agtype at transform time so a boolean or other
    non-agtype result cannot be misread as a by-reference Datum.

Semantics
---------
  - List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY).
  - An empty list yields the initial value; a NULL list yields NULL.
  - The list and initial value may reference outer-query variables (e.g.
    reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference
    only the accumulator and element.
  - Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE,
    and element property-access bodies are all supported.
  - Outer-variable, query-parameter, nested-reduce, and aggregate references
    inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error.
  - reduce is registered as a safe keyword so it remains usable as a property
    or map key, preserving backward compatibility.

Tests
-----
Adds the age_reduce regression test (registered in the install SQL and the
upgrade template so age_upgrade passes), covering: arithmetic/product/string
folds; order sensitivity; empty/NULL list; NULL element and NULL init;
list-building and CASE bodies; boolean and comparison bodies; element property
access; multiple and nested (in list/init) reduce(); reduce() in boolean
expressions, WHERE, and list comprehensions; folds over collected nodes and
node list properties; the not-supported rejections; and reduce as a map key.
All multi-row results are ordered. 38/38 installcheck pass.

Future work
-----------
The body restriction (accumulator and element only) is a property of the
standalone expression evaluation and can be relaxed without core changes:
  - Allow loop-invariant outer-variable and cypher $parameter references in
    the body by capturing them as additional eager aggregate arguments bound
    to extra param slots.
  - Support a nested reduce() inside the body via an SPI-based evaluation
    fallback for subquery-bearing bodies.
Aggregates inside the body remain intentionally unsupported, matching the
openCypher specification.

Co-authored-by: Copilot <copilot@github.com>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/age_reduce.out
new file:   regress/sql/age_reduce.sql
modified:   sql/age_aggregate.sql
modified:   src/backend/nodes/ag_nodes.c
modified:   src/backend/nodes/cypher_copyfuncs.c
modified:   src/backend/nodes/cypher_outfuncs.c
modified:   src/backend/nodes/cypher_readfuncs.c
modified:   src/backend/parser/cypher_analyze.c
modified:   src/backend/parser/cypher_clause.c
modified:   src/backend/parser/cypher_gram.y
modified:   src/backend/utils/adt/agtype.c
modified:   src/include/nodes/ag_nodes.h
modified:   src/include/nodes/cypher_copyfuncs.h
modified:   src/include/nodes/cypher_nodes.h
modified:   src/include/nodes/cypher_outfuncs.h
modified:   src/include/nodes/cypher_readfuncs.h
modified:   src/include/parser/cypher_kwlist.h

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds openCypher reduce(acc = init, var IN list | body) support by desugaring the expression at transform time into a correlated scalar subquery that folds over unnest(list) WITH ORDINALITY using a new ordered aggregate (ag_catalog.age_reduce) and a serialized standalone fold-body expression evaluated inside the aggregate transition function.

Changes:

  • Adds a new cypher_reduce ExtensibleNode plus grammar/keyword support for parsing reduce(...).
  • Implements transform_cypher_reduce() to rewrite reduce() into an ordered aggregate over unnest(... ) WITH ORDINALITY, serializing the fold body with accumulator/element rewritten to PARAM_EXEC slots.
  • Adds ag_catalog.age_reduce + age_reduce_transfn (C) and a comprehensive regression test suite (age_reduce).

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Makefile Registers age_reduce in the regression test list.
age--1.7.0--y.y.y.sql Adds upgrade-time creation of age_reduce_transfn and age_reduce aggregate.
regress/expected/age_reduce.out Expected output for the new reduce() regression coverage.
regress/sql/age_reduce.sql New regression tests covering semantics, order, null/empty behavior, and unsupported constructs.
sql/age_aggregate.sql Defines age_reduce_transfn and age_reduce aggregate in install SQL.
src/backend/nodes/ag_nodes.c Registers cypher_reduce in node name/method tables.
src/backend/nodes/cypher_copyfuncs.c Adds copy support for cypher_reduce.
src/backend/nodes/cypher_outfuncs.c Adds serialization support for cypher_reduce.
src/backend/nodes/cypher_readfuncs.c Adds deserialization support for cypher_reduce.
src/backend/parser/cypher_analyze.c Extends raw-expression walker to traverse cypher_reduce fields.
src/backend/parser/cypher_clause.c Implements transform_cypher_reduce() desugaring and fold-body param rewrite/validation.
src/backend/parser/cypher_gram.y Adds reduce(...) grammar production and marks REDUCE as safe keyword for map keys.
src/backend/utils/adt/agtype.c Implements age_reduce_transfn (aggregate transition fn) that deserializes/compiles and evaluates the fold body.
src/include/nodes/ag_nodes.h Adds cypher_reduce_t tag.
src/include/nodes/cypher_copyfuncs.h Declares copy_cypher_reduce.
src/include/nodes/cypher_nodes.h Defines the cypher_reduce ExtensibleNode struct.
src/include/nodes/cypher_outfuncs.h Declares out_cypher_reduce.
src/include/nodes/cypher_readfuncs.h Declares read_cypher_reduce.
src/include/parser/cypher_kwlist.h Adds reduce keyword token (REDUCE).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11662 to +11668
oldctx = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
rc = (reduce_eval_ctx *) palloc0(sizeof(reduce_eval_ctx));
body_txt = PG_GETARG_TEXT_PP(2);
body_str = text_to_cstring(body_txt);
body_node = (Node *) stringToNode(body_str);
rc->body_state = ExecInitExpr((Expr *) body_node, NULL);
rc->econtext = CreateStandaloneExprContext();
Comment on lines +2291 to +2295
agg->aggdirectargs = NIL;
agg->args = list_make4(makeTargetEntry((Expr *) init_node, 1, NULL, false),
makeTargetEntry((Expr *) body_const, 2, NULL, false),
makeTargetEntry((Expr *) elem_var, 3, NULL, false),
ord_te);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants