Skip to content

feat(gs): replace raw-SQL /gs/debug with structured allowlist endpoint#3971

Draft
davidleomay wants to merge 7 commits into
developfrom
feat/gs-debug-structured
Draft

feat(gs): replace raw-SQL /gs/debug with structured allowlist endpoint#3971
davidleomay wants to merge 7 commits into
developfrom
feat/gs-debug-structured

Conversation

@davidleomay

Copy link
Copy Markdown
Member

Summary

  • POST /gs/debug now takes a structured JSON body (table + select + where + groupBy/orderBy/limit) instead of a SQL string. The service emits SQL via TypeORM with identifiers pulled from DebugAllowedColumns and values bound as parameters.
  • Replaces the parser-walker approach (fix(gs): reject whole-row references in debug SQL #3967, commits 74085a920 + 1fc0be9ed) entirely. The walker is gone; so are DebugBlockedCols, DebugBlockedSchemas, DebugDangerousFunctions, the post-execution masker, the node-sql-parser dependency in this module, and ~660 lines of recursive AST code.
  • db-debug.sh is migrated: every predefined mode (--anomalies, --balance, --stats, --asset-history, --referral-chain, --referral-tree, default) now sends JSON. The custom-SQL pass-through is dropped.

Why this design

The previous approach was "parse the SQL, walk the AST, reject anything that looks dangerous." The independent adversarial review on the row-bypass branch found six exploitable bypasses (simple-CASE operand never walked, aggregate FILTER/OVER ignored, LIMIT substring scan beat by string literals and trailing -- comments, LIMIT cap bypassed by LIMIT 9000+9000). Any one parser field the walker forgot opened a hole.

The structured DTO closes that surface by construction:

  • No SQL is accepted, parsed, or interpolated.
  • Identifiers come from server-side enums or per-table allowlist arrays.
  • Values flow through pg parameter binding.
  • Adding a new shape (CASE, window funcs, OR with nested NOT, …) requires explicit code in both the DTO schema and the emitter — not just a missing branch in a walker.

A second independent adversarial review on this branch found no exploitable bypass. It did flag four DTO-only gates (as/aggregate/op/direction) where the service trusted ValidationPipe without re-checking; this PR includes service-side re-checks for all four plus a regression test per check.

What's exposed

DebugAllowedColumns (in dto/gs.dto.ts) covers ~85 tables. Each entry lists the columns selectable / filterable / sortable on that table. PII / secrets / free-form text are not in the allowlist; safe columns (IDs, FK IDs, timestamps, enums, status discriminators, numeric amounts, public on-chain refs) are. log.message is the only column flagged for jsonb path access — needed for the financial-balance debugging the script does.

FK ID columns (recommenderId, recommendedId, userDataId, assetId, etc.) are included so the script's referral-chain and referral-tree walkers work end-to-end.

Allowed query language (after this PR)

  • one table from DebugAllowedColumns
  • select items: plain column / jsonb path / aggregate (count, sum, min, max, avg over an explicit column)
  • WHERE: tree of leaf / and / or / not, depth cap 5, predicate cap 50
  • leaf ops: = != < <= > >= IN NOT IN LIKE ILIKE IS NULL IS NOT NULL
  • GROUP BY allowlist columns or select aliases
  • ORDER BY allowlist columns or select aliases, ASC/DESC
  • LIMIT (cap 10 000) + optional OFFSET (cap 1 000 000)

Anything outside that — JOINs, subqueries, CTEs, UNION, INTO, window functions, CASE, arbitrary functions, jsonb on non-jsonb columns — is unrepresentable in the DTO.

Migration

db-debug.sh continues to work transparently for the existing user-facing flags. Direct API users who were posting {"sql": "..."} need to switch to the structured body — there is a curl example in the script's --help output.

Test plan

  • npx tsc --noEmit clean
  • npx jest src/subdomains/generic/gs/__tests__/gs.service.spec.ts — 134 / 134 pass (includes 8 db-debug.sh-shape smoke tests, full WhereOp coverage, SQL-injection probes against every string field, defense-in-depth regressions for the four hardened gates)
  • db-debug.sh --stats against a non-prod environment
  • db-debug.sh --anomalies 5 against a non-prod environment
  • db-debug.sh --referral-chain <id> against a non-prod environment
  • Manual curl with a body that should reject (PII column, unknown table, unsupported WhereOp) → 400
  • Independent reviewer eyeballs the allowlist for any column that shouldn't be exposed

Comment thread src/subdomains/generic/gs/dto/debug-query.dto.ts Fixed
Comment thread src/subdomains/generic/gs/__tests__/gs.service.spec.ts Fixed
POST /gs/debug now takes a JSON description of the query (table + select
+ where + group/order/limit) instead of a SQL string. The service
translates the DTO into SQL with identifiers pulled from a per-table
column allowlist (DebugAllowedColumns) and values bound as parameters
via TypeORM. No raw SQL is parsed, walked, or interpolated.

Replaces the parser-walker approach (PR #3967 / commits 74085a9 +
1fc0be9) entirely. The walker had at least six documented bypasses
(simple-CASE operand never walked, aggregate FILTER/OVER ignored, LIMIT
substring scan beat by string literals and trailing line comments, LIMIT
cap bypassed by arithmetic expressions) — closed by construction here
because no parser is involved.

DTO (src/subdomains/generic/gs/dto/debug-query.dto.ts):
- select items: column | jsonb | aggregate, each with discriminator + regex/enum
- where tree: leaf | and | or | not, recursive with depth cap 5 / 50 predicates
- jsonb path: dot-separated, each segment regex-validated server-side
- limits: identifier regex, IN-list cap, string-value length cap

Service (gs.service.ts):
- ~660 lines of AST walkers + blocklist + post-execution masker deleted
- ~210 lines of structured emitter added
- defense-in-depth re-checks for the four DTO-only gates the security
  review flagged (`as` regex, `aggregate` enum, `op` enum, `direction`)

Allowlist (dto/gs.dto.ts):
- DebugAllowedColumns covers ~85 tables with explicit column lists
  including FK IDs (recommenderId, userDataId, etc.) so the script's
  --referral-* modes work end-to-end
- DebugBlockedCols / DebugBlockedSchemas / DebugDangerousFunctions removed
- node-sql-parser dependency no longer used by GsService

db-debug.sh:
- every predefined mode (--anomalies / --balance / --stats /
  --asset-history / --referral-chain / --referral-tree / default)
  migrated to JSON payloads built with `jq -n`
- custom-SQL pass-through removed (was the script's only raw-SQL surface)

Tests: 134 specs covering allowlist, jsonb, WHERE composition,
GROUP/ORDER/LIMIT, the eight query shapes db-debug.sh issues, SQL-
injection probes against every string field, and the four service-side
defense-in-depth gates.
…ghtening

Adversarial review surfaced two real correctness bugs and several worth-fixing
defense-in-depth gaps. Allowlist auditors flagged a handful of columns that
slipped through. All fixed.

Service correctness:
- Prototype-chain table names (`__proto__`, `constructor`, …) used to return
  `Object.prototype` from the allowlist lookup, pass the truthy guard, then
  TypeError downstream — 500 instead of 400. Use `Object.hasOwn`.
- `{kind: 'jsonb'}` without `jsonbPath` used to TypeError in
  `defaultDebugSelectAlias`'s `split('.')` — 500 instead of 400. Push kind-
  specific required-field checks ahead of alias derivation.

Defense in depth:
- LIMIT also floored at 1 (was only upper-clamped). DTO `@Min(1)` is the
  primary check; the service now never emits `LIMIT 0` or `LIMIT -n` even if
  ValidationPipe is ever bypassed.
- OFFSET floored at 0 for the same reason.
- Audit log now redacts WHERE leaf `value` fields. Bound parameters keep PII
  out of the SQL string; the verbose-channel audit was undoing that by
  serializing the raw DTO. Structure (table/select/where/op) stays for
  forensics; values become `<scalar>` / `<array:N>`.

Allowlist:
- Removed: `payment_link_payment.note`, `trading_order.errorMessage`,
  `user.apiFilterCT`, `fiat_output.creditInstitution`, `fiat_output.info`,
  `liquidity_management_order.errorMessage`, `fiat.ibanCountryConfig`.
- Fixed: `bank_tx.subfamilyCode` → `subFamilyCode` (typo).
- Fixed: `liquidity_management_order.eagerId` → `pipelineId` (misparsed FK
  from `@ManyToOne({ eager: true })`).

DTO:
- Dropped unused `IsBoolean` / `IsNotEmpty` imports.

Tests: 141 pass (was 134 + 7 new). New regressions pin the prototype-key
behavior, the kind-without-required-field behavior, the limit floor, and
the WHERE-value redaction.
Three sibling scripts (compare-balance-logs.sh, sum-asset-balances.sh,
inspect-asset-balance.sh) still POSTed `{sql: ...}` to /gs/debug — they
would have silently 400'd after merge. Migrated each to the structured DTO
shape with `jq -n` payload builders and a `{keys, rows}` reshape to keep
the downstream jq filters unchanged.

Allowlist:
- Removed `payment_link_payment.deviceCommand` — `text` column for free-form
  merchant-supplied POS commands; same risk class as the already-removed
  `note`.
- Documented `setting.key` inclusion (knob NAMES revealed; `value` stays out).

Test coverage:
- Added regression specs for the two db-debug.sh shapes that were missing
  from the smoke-test suite: default (`SELECT id, name, blockchain FROM
  asset ...`) and the asset resolver used by `--asset-history Blockchain/Name`.
- Expanded the "rejects sensitive column" `it.each` from 9 pairs to ~130,
  covering every entry from the previous `DebugBlockedCols` plus the
  columns the auditors flagged this round. A future migration that
  accidentally surfaces e.g. `totpSecret`, `apiKeyCT`, or `recipientMail`
  now fails CI before it ships.

Test count: 268 (was 141).
Flagged by github-code-quality on commit e6e92fe. The test asserts on
`result.keys` directly; the captured spy handle was dead.
@davidleomay davidleomay force-pushed the feat/gs-debug-structured branch from 9a289db to bc76ab6 Compare June 26, 2026 08:37
Final adversarial review surfaced three actionable items and an allowlist
gap. All addressed.

- Audit log moved BEFORE emit/validate so a malformed/probing request is
  still recorded. Previously a DEBUG-role attacker could enumerate the
  column allowlist via 400s without leaving a trace in App Insights.
- New per-level cap `DebugQueryMaxAndOrChildren = 10` replaces the prior
  `@ArrayMaxSize(DebugQueryMaxPredicates = 50)` on `where.children`. With
  depth 5 the validator can no longer be made to walk 50^5 internal nodes
  on a single oversized body; combined with the leaf-count cap of 50 this
  bounds class-validator CPU work tightly.
- Dropped `node-sql-parser` from package.json + lockfile; no remaining
  src/ references.
- `support_note.userDataId` added — the FK was missing from the existing
  allowlist entry, which left the table debuggable but blind to ownership.

Tests: 273 (was 272). New regression pins that the audit fires even when
emit throws — a refactor that pushed it back below emit would let probing
become invisible again.
…SAGE

Two things, packaged together so CI is green and the data-leak finding is
closed in one commit.

1) Prettier reformat. CI's `format:check` step required the
   `DebugAllowedColumns` arrays to wrap (the single-line-per-table style I
   used was not prettier-compatible). Reformatted automatically; no
   semantic changes, just whitespace.

2) Final data-leak review surfaced a real (medium) cross-user audit-log
   leak: a DEBUG user can call /gs/debug/logs with
   `template: 'traces-by-message', messageFilter: 'Debug-query by 0xother'`
   and read another DEBUG user's full debug-query audit history (shape,
   tables, columns, ops — values are already redacted in the audit).
   The existing `ALL_TRACES` template already filtered the
   `LogQueryAuditPrefix`; mirror the same pattern for `TRACES_BY_MESSAGE`
   and `TRACES_BY_OPERATION`, and extend all three to also filter the new
   `DebugQueryAuditPrefix` constant.

   Service uses `DebugQueryAuditPrefix` for both the success-path and
   failure-path audit lines. New regression spec pins the
   service-emits-prefix and templates-filter-prefix invariants together —
   so a future refactor that desynchronizes them fails CI.

Tests: 274 (was 273).
Three items from the final 5-angle review round. All non-blocking on their
own, packaged together to keep the PR diff compact.

- DoS reviewer flagged that `DebugQueryMaxAndOrChildren = 10` combined with
  depth 5 lets class-validator instantiate up to 100k internal nodes before
  the service caps fire. Tightening to 5 brings worst case to 3125 — still
  far more branching than any realistic debug query needs.

- Integration reviewer flagged `scripts/sync-prod-logs.js` still posting
  `{sql: …}` to /gs/debug; it would 400 the moment this PR merges.
  Migrated to the structured DTO with a `{keys, rows}` → array-of-objects
  reshape so the rest of the script's `row.id` / `row.message` access
  stays unchanged.

- Updated `scripts/DEBUG-ENDPOINT.md` to describe the structured DTO,
  remove the stale raw-SQL examples, document the killswitch + denylist
  ops paths the auth reviewer's report covered, and refresh the
  troubleshooting section.

Tests: 274 (unchanged — the cap change moves a project-wide DoS knob, not
behavior covered by unit tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant