feat(gs): replace raw-SQL /gs/debug with structured allowlist endpoint by davidleomay · Pull Request #3971 · DFXswiss/api

davidleomay · 2026-06-25T12:31:49Z

Summary

POST /gs/debug now takes a structured JSON body (table + select + where + groupBy/orderBy/limit) instead of a SQL string. The service emits SQL via TypeORM with identifiers pulled from DebugAllowedColumns and values bound as parameters.
Replaces the parser-walker approach (fix(gs): reject whole-row references in debug SQL #3967, commits 74085a920 + 1fc0be9ed) entirely. The walker is gone; so are DebugBlockedCols, DebugBlockedSchemas, DebugDangerousFunctions, the post-execution masker, the node-sql-parser dependency in this module, and ~660 lines of recursive AST code.
db-debug.sh is migrated: every predefined mode (--anomalies, --balance, --stats, --asset-history, --referral-chain, --referral-tree, default) now sends JSON. The custom-SQL pass-through is dropped.

Why this design

The previous approach was "parse the SQL, walk the AST, reject anything that looks dangerous." The independent adversarial review on the row-bypass branch found six exploitable bypasses (simple-CASE operand never walked, aggregate FILTER/OVER ignored, LIMIT substring scan beat by string literals and trailing -- comments, LIMIT cap bypassed by LIMIT 9000+9000). Any one parser field the walker forgot opened a hole.

The structured DTO closes that surface by construction:

No SQL is accepted, parsed, or interpolated.
Identifiers come from server-side enums or per-table allowlist arrays.
Values flow through pg parameter binding.
Adding a new shape (CASE, window funcs, OR with nested NOT, …) requires explicit code in both the DTO schema and the emitter — not just a missing branch in a walker.

A second independent adversarial review on this branch found no exploitable bypass. It did flag four DTO-only gates (as/aggregate/op/direction) where the service trusted ValidationPipe without re-checking; this PR includes service-side re-checks for all four plus a regression test per check.

What's exposed

DebugAllowedColumns (in dto/gs.dto.ts) covers ~85 tables. Each entry lists the columns selectable / filterable / sortable on that table. PII / secrets / free-form text are not in the allowlist; safe columns (IDs, FK IDs, timestamps, enums, status discriminators, numeric amounts, public on-chain refs) are. log.message is the only column flagged for jsonb path access — needed for the financial-balance debugging the script does.

FK ID columns (recommenderId, recommendedId, userDataId, assetId, etc.) are included so the script's referral-chain and referral-tree walkers work end-to-end.

Allowed query language (after this PR)

one table from DebugAllowedColumns
select items: plain column / jsonb path / aggregate (count, sum, min, max, avg over an explicit column)
WHERE: tree of leaf / and / or / not, depth cap 5, predicate cap 50
leaf ops: = != < <= > >= IN NOT IN LIKE ILIKE IS NULL IS NOT NULL
GROUP BY allowlist columns or select aliases
ORDER BY allowlist columns or select aliases, ASC/DESC
LIMIT (cap 10 000) + optional OFFSET (cap 1 000 000)

Anything outside that — JOINs, subqueries, CTEs, UNION, INTO, window functions, CASE, arbitrary functions, jsonb on non-jsonb columns — is unrepresentable in the DTO.

Migration

db-debug.sh continues to work transparently for the existing user-facing flags. Direct API users who were posting {"sql": "..."} need to switch to the structured body — there is a curl example in the script's --help output.

Test plan

npx tsc --noEmit clean
npx jest src/subdomains/generic/gs/__tests__/gs.service.spec.ts — 134 / 134 pass (includes 8 db-debug.sh-shape smoke tests, full WhereOp coverage, SQL-injection probes against every string field, defense-in-depth regressions for the four hardened gates)
db-debug.sh --stats against a non-prod environment
db-debug.sh --anomalies 5 against a non-prod environment
db-debug.sh --referral-chain <id> against a non-prod environment
Manual curl with a body that should reject (PII column, unknown table, unsupported WhereOp) → 400
Independent reviewer eyeballs the allowlist for any column that shouldn't be exposed

POST /gs/debug now takes a JSON description of the query (table + select + where + group/order/limit) instead of a SQL string. The service translates the DTO into SQL with identifiers pulled from a per-table column allowlist (DebugAllowedColumns) and values bound as parameters via TypeORM. No raw SQL is parsed, walked, or interpolated. Replaces the parser-walker approach (PR #3967 / commits 74085a9 + 1fc0be9) entirely. The walker had at least six documented bypasses (simple-CASE operand never walked, aggregate FILTER/OVER ignored, LIMIT substring scan beat by string literals and trailing line comments, LIMIT cap bypassed by arithmetic expressions) — closed by construction here because no parser is involved. DTO (src/subdomains/generic/gs/dto/debug-query.dto.ts): - select items: column | jsonb | aggregate, each with discriminator + regex/enum - where tree: leaf | and | or | not, recursive with depth cap 5 / 50 predicates - jsonb path: dot-separated, each segment regex-validated server-side - limits: identifier regex, IN-list cap, string-value length cap Service (gs.service.ts): - ~660 lines of AST walkers + blocklist + post-execution masker deleted - ~210 lines of structured emitter added - defense-in-depth re-checks for the four DTO-only gates the security review flagged (`as` regex, `aggregate` enum, `op` enum, `direction`) Allowlist (dto/gs.dto.ts): - DebugAllowedColumns covers ~85 tables with explicit column lists including FK IDs (recommenderId, userDataId, etc.) so the script's --referral-* modes work end-to-end - DebugBlockedCols / DebugBlockedSchemas / DebugDangerousFunctions removed - node-sql-parser dependency no longer used by GsService db-debug.sh: - every predefined mode (--anomalies / --balance / --stats / --asset-history / --referral-chain / --referral-tree / default) migrated to JSON payloads built with `jq -n` - custom-SQL pass-through removed (was the script's only raw-SQL surface) Tests: 134 specs covering allowlist, jsonb, WHERE composition, GROUP/ORDER/LIMIT, the eight query shapes db-debug.sh issues, SQL- injection probes against every string field, and the four service-side defense-in-depth gates.

…ghtening Adversarial review surfaced two real correctness bugs and several worth-fixing defense-in-depth gaps. Allowlist auditors flagged a handful of columns that slipped through. All fixed. Service correctness: - Prototype-chain table names (`__proto__`, `constructor`, …) used to return `Object.prototype` from the allowlist lookup, pass the truthy guard, then TypeError downstream — 500 instead of 400. Use `Object.hasOwn`. - `{kind: 'jsonb'}` without `jsonbPath` used to TypeError in `defaultDebugSelectAlias`'s `split('.')` — 500 instead of 400. Push kind- specific required-field checks ahead of alias derivation. Defense in depth: - LIMIT also floored at 1 (was only upper-clamped). DTO `@Min(1)` is the primary check; the service now never emits `LIMIT 0` or `LIMIT -n` even if ValidationPipe is ever bypassed. - OFFSET floored at 0 for the same reason. - Audit log now redacts WHERE leaf `value` fields. Bound parameters keep PII out of the SQL string; the verbose-channel audit was undoing that by serializing the raw DTO. Structure (table/select/where/op) stays for forensics; values become `<scalar>` / `<array:N>`. Allowlist: - Removed: `payment_link_payment.note`, `trading_order.errorMessage`, `user.apiFilterCT`, `fiat_output.creditInstitution`, `fiat_output.info`, `liquidity_management_order.errorMessage`, `fiat.ibanCountryConfig`. - Fixed: `bank_tx.subfamilyCode` → `subFamilyCode` (typo). - Fixed: `liquidity_management_order.eagerId` → `pipelineId` (misparsed FK from `@ManyToOne({ eager: true })`). DTO: - Dropped unused `IsBoolean` / `IsNotEmpty` imports. Tests: 141 pass (was 134 + 7 new). New regressions pin the prototype-key behavior, the kind-without-required-field behavior, the limit floor, and the WHERE-value redaction.

Three sibling scripts (compare-balance-logs.sh, sum-asset-balances.sh, inspect-asset-balance.sh) still POSTed `{sql: ...}` to /gs/debug — they would have silently 400'd after merge. Migrated each to the structured DTO shape with `jq -n` payload builders and a `{keys, rows}` reshape to keep the downstream jq filters unchanged. Allowlist: - Removed `payment_link_payment.deviceCommand` — `text` column for free-form merchant-supplied POS commands; same risk class as the already-removed `note`. - Documented `setting.key` inclusion (knob NAMES revealed; `value` stays out). Test coverage: - Added regression specs for the two db-debug.sh shapes that were missing from the smoke-test suite: default (`SELECT id, name, blockchain FROM asset ...`) and the asset resolver used by `--asset-history Blockchain/Name`. - Expanded the "rejects sensitive column" `it.each` from 9 pairs to ~130, covering every entry from the previous `DebugBlockedCols` plus the columns the auditors flagged this round. A future migration that accidentally surfaces e.g. `totpSecret`, `apiKeyCT`, or `recipientMail` now fails CI before it ships. Test count: 268 (was 141).

Flagged by github-code-quality on commit e6e92fe. The test asserts on `result.keys` directly; the captured spy handle was dead.

Final adversarial review surfaced three actionable items and an allowlist gap. All addressed. - Audit log moved BEFORE emit/validate so a malformed/probing request is still recorded. Previously a DEBUG-role attacker could enumerate the column allowlist via 400s without leaving a trace in App Insights. - New per-level cap `DebugQueryMaxAndOrChildren = 10` replaces the prior `@ArrayMaxSize(DebugQueryMaxPredicates = 50)` on `where.children`. With depth 5 the validator can no longer be made to walk 50^5 internal nodes on a single oversized body; combined with the leaf-count cap of 50 this bounds class-validator CPU work tightly. - Dropped `node-sql-parser` from package.json + lockfile; no remaining src/ references. - `support_note.userDataId` added — the FK was missing from the existing allowlist entry, which left the table debuggable but blind to ownership. Tests: 273 (was 272). New regression pins that the audit fires even when emit throws — a refactor that pushed it back below emit would let probing become invisible again.

…SAGE Two things, packaged together so CI is green and the data-leak finding is closed in one commit. 1) Prettier reformat. CI's `format:check` step required the `DebugAllowedColumns` arrays to wrap (the single-line-per-table style I used was not prettier-compatible). Reformatted automatically; no semantic changes, just whitespace. 2) Final data-leak review surfaced a real (medium) cross-user audit-log leak: a DEBUG user can call /gs/debug/logs with `template: 'traces-by-message', messageFilter: 'Debug-query by 0xother'` and read another DEBUG user's full debug-query audit history (shape, tables, columns, ops — values are already redacted in the audit). The existing `ALL_TRACES` template already filtered the `LogQueryAuditPrefix`; mirror the same pattern for `TRACES_BY_MESSAGE` and `TRACES_BY_OPERATION`, and extend all three to also filter the new `DebugQueryAuditPrefix` constant. Service uses `DebugQueryAuditPrefix` for both the success-path and failure-path audit lines. New regression spec pins the service-emits-prefix and templates-filter-prefix invariants together — so a future refactor that desynchronizes them fails CI. Tests: 274 (was 273).

Three items from the final 5-angle review round. All non-blocking on their own, packaged together to keep the PR diff compact. - DoS reviewer flagged that `DebugQueryMaxAndOrChildren = 10` combined with depth 5 lets class-validator instantiate up to 100k internal nodes before the service caps fire. Tightening to 5 brings worst case to 3125 — still far more branching than any realistic debug query needs. - Integration reviewer flagged `scripts/sync-prod-logs.js` still posting `{sql: …}` to /gs/debug; it would 400 the moment this PR merges. Migrated to the structured DTO with a `{keys, rows}` → array-of-objects reshape so the rest of the script's `row.id` / `row.message` access stays unchanged. - Updated `scripts/DEBUG-ENDPOINT.md` to describe the structured DTO, remove the stale raw-SQL examples, document the killswitch + denylist ops paths the auth reviewer's report covered, and refresh the troubleshooting section. Tests: 274 (unchanged — the cap change moves a project-wide DoS knob, not behavior covered by unit tests).

github-code-quality Bot found potential problems Jun 25, 2026

View reviewed changes

Comment thread src/subdomains/generic/gs/dto/debug-query.dto.ts Fixed

Comment thread src/subdomains/generic/gs/__tests__/gs.service.spec.ts Fixed

davidleomay added 4 commits June 26, 2026 09:35

fix(gs): drop unused q capture in alias-collision test

bc76ab6

Flagged by github-code-quality on commit e6e92fe. The test asserts on `result.keys` directly; the captured spy handle was dead.

davidleomay force-pushed the feat/gs-debug-structured branch from 9a289db to bc76ab6 Compare June 26, 2026 08:37

davidleomay added 3 commits June 26, 2026 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(gs): replace raw-SQL /gs/debug with structured allowlist endpoint#3971

feat(gs): replace raw-SQL /gs/debug with structured allowlist endpoint#3971
davidleomay wants to merge 7 commits into
developfrom
feat/gs-debug-structured

davidleomay commented Jun 25, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

davidleomay commented Jun 25, 2026

Summary

Why this design

What's exposed

Allowed query language (after this PR)

Migration

Test plan

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant