Skip to content

fix(gs): reject whole-row references in debug SQL#3967

Open
davidleomay wants to merge 2 commits into
developfrom
fix/gs-debug-row-bypass
Open

fix(gs): reject whole-row references in debug SQL#3967
davidleomay wants to merge 2 commits into
developfrom
fix/gs-debug-row-bypass

Conversation

@davidleomay

Copy link
Copy Markdown
Member

Close a per-column-blocklist bypass on POST /v1/gs/debug. Any expression that yields a whole row from a table slips past pre-execution validation (the parser surfaces it as a wildcard or a bare alias) and the post-execution masker can't reach the inner fields (the row arrives as a tuple or a JSON blob).

Adds a recursive AST walker that rejects whole-row references for tables with any blocked columns. Covers WITH-clause CTEs and FROM-clause subqueries.

Regression coverage in gs.service.spec.ts — 13 bypass patterns blocked, one allow-case (table with no blocked columns) still passes.

tsc --noEmit clean. All 44 spec tests pass.

Per-column blocklist + post-execution masker can be bypassed by any
expression that yields a whole row from a table:

- `to_jsonb(t.*)` / `to_json(t)` / `row_to_json(t)` / `jsonb_agg(t)`
- `array_to_json(array_agg(t))`
- bare `SELECT t FROM ... t` (alias as row reference)
- `t::text` / `CAST(t AS text)`
- the same wrapped in a subquery or CTE

`columnList` surfaces these as either a wildcard `(.*)` (skipped by
`findBlockedColumnInQuery`) or as the alias/table name (not recognized
as a column on any table). Once the row reaches the executor as a tuple
or JSON blob the post-execution masker can no longer reach the inner
fields — it only walks top-level result keys.

Add a recursive AST walker that reports any whole-row reference (via
wildcard or via a bare identifier matching a FROM-clause alias / table
name) for tables that have any blocked columns. Walks WITH-clause CTEs
and FROM-clause subqueries too.

Regression coverage: 13 bypass patterns + one allow-case (`asset`, no
blocked columns).
Adversarial review of the first pass found three patterns that still let
row-shaped data past the masker:

1. Quoted/aliased table identifiers (`SELECT t.* FROM "user" t`) crashed
   the walker because `node.table` is `{ type, value }` for quoted refs,
   not a plain string. Normalize via `identifierName()` so both shapes
   resolve to the same alias key.

2. FROM-clause function expressions, e.g. `LATERAL to_jsonb(user_data)`,
   were never visited — the walker only inspected `stmt.columns`. Walk
   `stmt.from` items whose `expr` is a function/expression too, using
   the outer scope's row identifiers.

3. Chained CTEs (`WITH a AS (SELECT * FROM T), b AS (SELECT to_jsonb(a.*)
   FROM a) SELECT * FROM b`) bypassed the check because `a` is a CTE
   name, not a base-table name, so `tableHasBlockedColumns('a')` was
   false. Propagate CTE-name → base-table via a `cteSources` map walked
   alongside the AST recursion. `resolveSubqueryRowSource` also takes
   the map so it can recognize transitive forwarders.

Regression coverage: 3 new spec cases, all pass. Total 50 tests green.
davidleomay added a commit that referenced this pull request Jun 26, 2026
POST /gs/debug now takes a JSON description of the query (table + select
+ where + group/order/limit) instead of a SQL string. The service
translates the DTO into SQL with identifiers pulled from a per-table
column allowlist (DebugAllowedColumns) and values bound as parameters
via TypeORM. No raw SQL is parsed, walked, or interpolated.

Replaces the parser-walker approach (PR #3967 / commits 74085a9 +
1fc0be9) entirely. The walker had at least six documented bypasses
(simple-CASE operand never walked, aggregate FILTER/OVER ignored, LIMIT
substring scan beat by string literals and trailing line comments, LIMIT
cap bypassed by arithmetic expressions) — closed by construction here
because no parser is involved.

DTO (src/subdomains/generic/gs/dto/debug-query.dto.ts):
- select items: column | jsonb | aggregate, each with discriminator + regex/enum
- where tree: leaf | and | or | not, recursive with depth cap 5 / 50 predicates
- jsonb path: dot-separated, each segment regex-validated server-side
- limits: identifier regex, IN-list cap, string-value length cap

Service (gs.service.ts):
- ~660 lines of AST walkers + blocklist + post-execution masker deleted
- ~210 lines of structured emitter added
- defense-in-depth re-checks for the four DTO-only gates the security
  review flagged (`as` regex, `aggregate` enum, `op` enum, `direction`)

Allowlist (dto/gs.dto.ts):
- DebugAllowedColumns covers ~85 tables with explicit column lists
  including FK IDs (recommenderId, userDataId, etc.) so the script's
  --referral-* modes work end-to-end
- DebugBlockedCols / DebugBlockedSchemas / DebugDangerousFunctions removed
- node-sql-parser dependency no longer used by GsService

db-debug.sh:
- every predefined mode (--anomalies / --balance / --stats /
  --asset-history / --referral-chain / --referral-tree / default)
  migrated to JSON payloads built with `jq -n`
- custom-SQL pass-through removed (was the script's only raw-SQL surface)

Tests: 134 specs covering allowlist, jsonb, WHERE composition,
GROUP/ORDER/LIMIT, the eight query shapes db-debug.sh issues, SQL-
injection probes against every string field, and the four service-side
defense-in-depth gates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant