Restore contsel/contjoinsel for containment & key-existence operators (#2356)#2417
Conversation
9b27aec to
50beb10
Compare
|
Self-review caught a bug in the smoke test: Force-pushed amend |
50beb10 to
bd52fd1
Compare
There was a problem hiding this comment.
Pull request overview
Restores lightweight planner selectivity/joinsel functions (contsel/contjoinsel) for agtype containment and key-existence operators to prevent planner-time regressions caused by matchingsel evaluating operator functions against MCV statistics during planning.
Changes:
- Rebinds 10
agtypecontainment/key-existence operators tocontsel/contjoinselin the install SQL. - Adds upgrade SQL (
ALTER OPERATOR ... SET (RESTRICT, JOIN)) so existing installs flip bindings on extension update. - Introduces a regression test that pins the bindings via
pg_operator, plus updates expected outputs impacted by the resulting plan/order changes.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sql/agtype_operators.sql | Switches containment operators’ RESTRICT/JOIN to contsel/contjoinsel. |
| sql/agtype_exists.sql | Switches key-existence operator overloads’ RESTRICT/JOIN to contsel/contjoinsel. |
| age--1.7.0--y.y.y.sql | Adds upgrade-time ALTER OPERATOR ... SET (RESTRICT, JOIN) for all affected operators. |
| regress/sql/containment_selectivity.sql | Adds regression coverage to pin operator bindings and smoke-test operator behavior. |
| regress/expected/containment_selectivity.out | Expected output for the new regression test. |
| regress/expected/cypher_match.out | Updates expected output for plan/order changes caused by new selectivity bindings. |
| regress/expected/cypher_vle.out | Updates expected output for row order changes (no ORDER BY) under the new plan. |
| Makefile | Registers containment_selectivity in REGRESS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bd52fd1 to
bff7be6
Compare
|
@crprashant Can you rebase this PR? There have been some significant updates to the master as of late |
bff7be6 to
d4842d0
Compare
|
Done — rebased onto current master (73d0705). Conflicts in \Makefile\ and \�ge--1.7.0--y.y.y.sql\ resolved (kept the new \�gtype_jsonb_cast\ test entry and jsonb cast functions that landed since, appended the contsel ALTER OPERATOR block after). Ready for re-review. |
b68dcb6 to
f090a09
Compare
|
Pushed a fix addressing the latest review. During the rebase, an automated merge garbled Fixes in the latest push (
All three earlier review items (deterministic |
|
@crprashant Opus has opinions. Could you address these before I merge it? Review summaryThanks for the detailed writeup and benchmarks, @crprashant — the planner-time problem is real and the The "core precedent" claim is inaccurateThe PR's Why states:
That isn't what core does. In
Sources (
So this isn't "restoring core precedent" — core deliberately keeps jsonb on This also surfaces an internal contradiction: the Notes for reviewers section already says "matchingsel does provide better estimates when good statistics exist; PostgreSQL core accepts the same trade-off for jsonb" — the opposite of the Why section. (Minor: there's no The mechanism (and thus the perf win) is genuineFor the record, the cost model checks out: One coverage gap
Bottom lineMechanically correct, complete operator coverage, real perf win. Happy to approve once:
Optional: a one-line note that very-low-selectivity containment predicates may get worse estimates under |
|
@crprashant Btw, this was what it suggests - WhyThe containment ( This PR rebinds those operators to the lightweight Note this is not the binding PostgreSQL core uses for |
…apache#2356) The containment (`@>`, `<@`, `@>>`, `<<@`) and key-existence (`?`, `?|`, `?&`) operators on `agtype` were bound to `matchingsel`/`matchingjoinsel` on the PG14+ source tree. `matchingsel` is built for pattern operators (LIKE/regex) and during planning invokes the operator's underlying function (`agtype_contains`) once per `pg_statistic` MCV. With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries. Rebind those operators to the lighter `contsel`/`contjoinsel` estimators, which return fixed selectivity constants without invoking the operator function during planning. This is a deliberate planning-speed vs. estimate-accuracy trade-off. Note it DIVERGES from PostgreSQL core, which keeps jsonb's `@>`, `<@`, `?`, `?|`, `?&` on `matchingsel`/`matchingjoinsel` (verified on REL_16/17/18_STABLE in `pg_operator.dat`); it is an AGE-specific choice favoring workloads where these operators appear in selective point lookups. A future improvement could add a custom `agtype` selectivity function that is both cheap and statistics-aware. Changes: * `sql/agtype_operators.sql`, `sql/agtype_exists.sql`: 10 operators flipped from `matchingsel`/`matchingjoinsel` to `contsel`/`contjoinsel`. * `age--1.7.0--y.y.y.sql`: appended `ALTER OPERATOR ... SET (RESTRICT, JOIN)` for all 10 operators so existing installs flip on `ALTER EXTENSION age UPDATE`. * `regress/sql/containment_selectivity.sql` (+ `expected/.out`): pin the bindings via `pg_operator`, plus a scoped "no leaked matchingsel" guard and functional smoke for all 10 operators. Adds an upgrade-path assertion that simulates a stale (pre-fix) install, replays the shipped `ALTER OPERATOR` block, and confirms every overload flips to `contsel`/`contjoinsel` (run in a rolled-back transaction). * `regress/expected/cypher_match.out`, `regress/expected/cypher_vle.out`: refresh expected to reflect new (and better) plan shapes that the lower-selectivity helper produces — `test_enable_containment` now picks Nested Loop + Index Only Scans over a Seq Scan/Hash Join, and two `MATCH p=...` and `show_list_use_vle` queries flip row order (queries had no `ORDER BY`; result set is unchanged, only ordering). * `Makefile`: register `containment_selectivity` in `REGRESS`. Validation: * Build: clean, `-Werror`. * Regression: 36/37 tests pass under `EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm"`. Only `age_upgrade` fails — pre-existing on master at 774e781 (verified by `git stash && installcheck`). * Reporter's exact methodology (LDBC-SNB-style snb_graph + pgbench on `bench_message_content`) reproduces the regression and the fix: | Metric | matchingsel | contsel | Delta | |----------------------------|-------------|---------|-------| | EXPLAIN planning time (ms) | 1.42 | 0.97 | -32% | | EXPLAIN execution time (ms)| 0.34 | 0.31 | ~equal| | pgbench TPS (8c x 30s) | 5247 | 7378 | +40.6%| Run with `default_statistics_target = 1000` to populate MCV lists, matching the reporter's analyzed-graph conditions. * Upgrade path: validated end-to-end during the benchmark — operator bindings were flipped from `matchingsel` -> `contsel` via the same `ALTER OPERATOR` statements the upgrade SQL ships, while operators remained functional throughout. Driver workflows (python/go/node/jdbc) intentionally not run: this PR only adjusts pg_operator selectivity metadata. There is no C, type, or protocol change that drivers could observe. Closes apache#2356.
f090a09 to
85bca9f
Compare
|
Thanks for the careful review — both points addressed in the latest push ( 1. Rationale corrected. You're right, the core-precedent claim was wrong — core keeps jsonb's 2. Upgrade-path assertion added.
The whole section runs in a |
Several VLE regression queries RETURN multiple rows without an ORDER BY,
so their row order depends on traversal/scan order and can vary between
runs and platforms. Add ORDER BY ASC to those queries (on the path,
edge-list, or graphid as appropriate) so the expected output is stable.
The queries are pinned by path (p), edge list (e), or graphid
(id(u)/id(v)/id(e[n])) depending on what each RETURN projects.
Full audit of cypher_vle: all 38 multi-row result blocks were checked.
After this change, every multi-row RETURN is deterministically ordered
except the two SELECT * FROM show_list_use_vle('list01') calls, which are
already deterministic because the function body orders its results with
RETURN v ORDER BY id(v) (added in apache#2417); their result blocks are
unchanged by this commit.
This is a test-only change (regress/sql/cypher_vle.sql and
regress/expected/cypher_vle.out); no extension C code or SQL is modified.
Row counts are unchanged (pure reordering).
All 37 regression tests pass (installcheck) on PostgreSQL 18.3.
Co-authored-by: GitHub Copilot <noreply@github.com>
modified: regress/expected/cypher_vle.out
modified: regress/sql/cypher_vle.sql

Restore contsel/contjoinsel for containment & key-existence operators
Closes #2356.
Why
The containment (
@>,<@,@>>,<<@) and key-existence (?,?|,?&) operators onagtypeare bound tomatchingsel/matchingjoinsel.matchingselis intended for pattern operators (LIKE/regex) — during planning it invokes the operator's underlying function (agtype_contains) once perpg_statisticMCV entry (and per histogram bin). With realistic statistics targets that produces a planner-time regression that dominates simple OLTP-style point queries.This PR rebinds those operators to the lightweight
contsel/contjoinselestimators, which return fixed selectivity constants without calling the operator function. This is a deliberate planning-speed vs. estimate-accuracy trade-off. Note this is not the binding PostgreSQL core uses forjsonb— core keeps jsonb's@>,<@,?,?|,?&onmatchingsel/matchingjoinsel(verified on REL_16/17/18_STABLE inpg_operator.dat). So this is an AGE-specific choice favoring workloads where these operators appear in selective point lookups, not a core-precedent restoration. A future improvement (tracked separately) could add a customagtypeselectivity function that is both cheap and statistics-aware.What
sql/agtype_operators.sql,sql/agtype_exists.sql: 10 operators flipped tocontsel/contjoinsel.age--1.7.0--y.y.y.sql: appendedALTER OPERATOR ... SET (RESTRICT, JOIN)for all 10 operators so existing installs flip onALTER EXTENSION age UPDATE.regress/sql/containment_selectivity.sql: new test that pins the bindings viapg_operator, a scoped no-leaked-matchingsel guard, functional smoke for all 10 operators, and an upgrade-path assertion that simulates a stale (pre-fix) install, replays the shippedALTER OPERATORblock, and confirms every overload flips tocontsel/contjoinsel(run in a rolled-back transaction).regress/expected/cypher_match.out,regress/expected/cypher_vle.out: refresh expected —test_enable_containmentnow picks Nested Loop + Index Only Scans over Seq Scan/Hash Join (a direct, better plan), and twoMATCH p=...andshow_list_use_vlequeries flip row order (they had noORDER BY, now given a stableORDER BY; result set unchanged).Makefile: register the new test inREGRESS.Validation
Regression suite
36/37 pass with
EXTRA_TESTS="pgvector fuzzystrmatch pg_trgm". Onlyage_upgradefails — pre-existing on master at774e781b(verified bygit stash && installcheckbaseline of 32/33 with the sameage_upgradefailure). The newcontainment_selectivitytest (including the upgrade-path assertion) passes on PG18.Reporter's exact methodology — reproduced
Reporter's three scripts (
generate_graph.sql,setup_func_for_workload.sql,workload_select.sql) used unchanged. Run on PG18 withdefault_statistics_target = 1000to populate MCV lists, matching the reporter's analyzed-graph conditions:The −32% planning-time delta lines up with the reporter's "~30% of execution time spent in agtype_contains during planning" observation; the +40.6% TPS gain matches the "severe TPS drop in OLTP-style workloads" they reported. The win comes from discarding stats-based estimation at plan time (
contselreturns a flat constant) — i.e. exactly the trade-off described above.Upgrade path
Validated end-to-end during the benchmark and pinned by the new regression assertion: operator bindings flip from
matchingsel→contselvia the sameALTER OPERATORstatements the upgrade SQL ships, while operators remain functional throughout.pg_operatorsnapshotsDriver workflows
Intentionally not run: this PR only adjusts
pg_operatorselectivity metadata. There is no C code, type, or wire-protocol change that python / go / node / JDBC drivers could observe.Notes for reviewers
matchingseldoes provide better estimates when good statistics exist on heavily-analyzedagtypecolumns, so very-low-selectivity containment predicates may get worse estimates undercontsel. The win is constant-time planning. PostgreSQL core keepsjsonbonmatchingsel, so this is a deliberate AGE-specific divergence, not a precedent restoration. A future improvement (out of scope here) would be a cheap, statistics-awareagtypeselectivity function; happy to file as a follow-up if there's interest.containment_selectivityregression test is intentionally precise so the diff is loud if anyone re-introducesmatchingselhere. The scoped guard catches future operator additions that forget the right helper, and the upgrade-path assertion catches a silent regression in the shippedALTER OPERATORblock.