Skip to content

design: analyzer is a pure graph provider — taint/slicing move to the frontend SDK#7

Open
rahlk wants to merge 1 commit into
mainfrom
design/graph-provider-taint-frontend
Open

design: analyzer is a pure graph provider — taint/slicing move to the frontend SDK#7
rahlk wants to merge 1 commit into
mainfrom
design/graph-provider-taint-frontend

Conversation

@rahlk

@rahlk rahlk commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Establishes a design principle across the dataflow skillset, first instantiated in codeanalyzer-java (SDG at -a 3; SUMMARY-edge substrate as the next rung) and now amended onto the sibling level-3 epics (python#67, typescript#2, clang#2, go#3, rust#25).

The principle

The analyzer is a pure graph provider. Level 3 emits the dependence-graph substrate — program_graphs (CFG/PDG/SDG) with transitive HRB SUMMARY edges — and stops. Client analyses (taint, slicing, reachability) are frontend SDK queries over that graph, not analyzer features. The analyzer emits no taint_flows section, ingests no sources/sinks/sanitizers policy, and runs no slice.

SUMMARY edges are the exception that proves the rule: keyed on data dependence (not any taint policy), they're reusable across every config, so they stay analyzer-side — and they're exactly what make the frontend's queries context-sensitive.

Why: a taint result is keyed on a policy that changes far faster than the graph. Keeping the query (and its taint_flows output + model packs) in the SDK means a policy edit re-runs a cheap traversal instead of re-emitting the universal graph. This is Joern's factoring — the CPG stores the dependence substrate; reachableBy is a query, not materialized all-pairs taint edges.

Changes

codeanalyzer-backend (builds the graph only):

  • dataflow-graphs.md: new provider/client boundary; drop taint_flows from the emitted JSON; reframe "Client analyses" as frontend-owned; split the verification gates (SDG/SUMMARY stay; Slice/Taint become frontend gates).
  • dataflow-construction.md: Stage 8 is now CPG-only; slicing/taint explicitly not an analyzer stage.
  • dataflow-issue-template.md: retitled (drop "and taint analysis"), goals/PART 3/PR ladder/DoD rescoped to substrate + SUMMARY edges; taint/slicing called out as a separate SDK ladder.
  • SKILL.md: level-3 section states the boundary.

cldk-sdk-frontend (owns the queries):

  • SKILL.md: new "Client analyses are the SDK's job" section — slicing/taint over program_graphs, model packs as data, TaintFlow/slice-result models, facade methods, over-approximation surfacing.
  • sdk-testing.md: § 3b Slice/Taint (+ context-sensitivity) gates.

Reference instantiation: codellm-devkit/codeanalyzer-java#171 (decision #11) and #173 (the SUMMARY-edge substrate rung).

… frontend SDK

Establishes the provider/client boundary across the dataflow skillset:

- codeanalyzer-backend emits the graph substrate only — program_graphs
  (CFG/PDG/SDG) with transitive HRB SUMMARY edges — and never a taint_flows
  section, never a sources/sinks/sanitizers policy, never a slice. SUMMARY
  edges are the exception that proves the rule: keyed on data dependence (not
  any taint policy), they are reusable substrate and stay analyzer-side, and
  are what make frontend queries context-sensitive.
- cldk-sdk-frontend owns slicing and taint as reachability queries over the
  emitted graph: the SDK holds the model packs, produces taint_flows/slice
  results, and carries the Slice/Taint gates (sdk-testing.md § 3b).

Rationale: a taint result is keyed on a policy that changes far faster than
the graph; keeping the query in the SDK means a policy edit re-runs a cheap
traversal instead of re-emitting the universal graph. Mirrors Joern's
factoring (CPG stores the substrate; reachableBy is a query, not materialized
all-pairs taint edges). First instantiated in codeanalyzer-java (SDG at -a 3;
SUMMARY-edge substrate as the next rung); the sibling level-3 epics are
amended to match.

Touches: dataflow-graphs.md (contract + boundary + gates split),
dataflow-construction.md (Stage 8), dataflow-issue-template.md (goals, PART 3,
PR ladder, DoD, title), backend SKILL.md (level-3 section), frontend SKILL.md
(new Client analyses section) and sdk-testing.md (Slice/Taint gates).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant