Establishes the SDK side of the analyzer/SDK boundary now standard across the codeanalyzer-* family (cldk-forge PR #7; reference instantiation codellm-devkit/codeanalyzer-java#171). The analyzers are pure graph providers — at -a 3 they emit program_graphs (CFG/PDG/SDG with transitive HRB SUMMARY edges) and nothing more. Client analyses live here, in the SDK.
This issue is the shared, cross-language client-analysis engine — built once over the shared ProgramGraphs models, reused by every backend that emits level-3 graphs (Java, Python, C/C++, Go, Rust as they land). It is the destination for the slicing/taint work removed from the analyzer epics: codeanalyzer-python#67, codeanalyzer-clang#2, codeanalyzer-go#3, codeanalyzer-rust#25.
Scope
- Shared graph models (if not already present):
ProgramGraphs / FunctionGraphs / GraphNode / GraphEdge / SDGEdge, validating analysis.json's program_graphs section. Modeled once, not per-language (the parity clause holds across analyzers).
- Backward/forward slicing as a reachability query: reverse reachability over
CDG ∪ DDG ∪ PARAM_* ∪ SUMMARY from a (signature, node_id) criterion, context-sensitive via the two-phase HRB traversal (ascend PARAM_IN/CALL, then descend PARAM_OUT; SUMMARY edges carry across calls without re-descending). Exact expected-set gate on a fixture.
- Taint as a labeled reachability query: seed at sources, propagate along dependence edges, block/flag at sanitizers on the path, report when a source label reaches a matching sink. Witness paths reconstructed lazily as
(signature, node_id) chains with the model id per hop.
- Sources/sinks/sanitizers/library models as data — a JSON spec validated against a JSON Schema, precedence built-in pack < config file < caller-supplied. Per-language model packs (e.g. Python
flask/os/subprocess; C libc; Go net/http/os/exec) ship as data alongside the shared engine.
- Client-result models —
TaintFlow / slice-result ({ source, sink, rule, sanitized, path }) — SDK models, not analyzer output.
- Facade methods on the query surface:
get_backward_slice(...), get_taint_flows(spec=...), etc.
- Surface graph over-approximations in results rather than absorbing them: ENTRY-anchored
PARAM_IN (argument arity collapsed until the analyzer ships per-argument PARAM nodes, e.g. codeanalyzer-java#173), missing SUMMARY edges before that analyzer PR lands, heap flows only under the analyzer's heap-dependence mode.
Gates (frontend)
- Slice: backward slice of a named fixture variable equals the hand-computed node set — exact, not "non-empty"; and does not contain callee-internal nodes a naive phase-1 walk would leak (proves
SUMMARY edges are used).
- Taint: one known source→sink flow found; the same flow with a sanitizer interposed reported
sanitized; witness path names every hop.
Contract references: cldk-forge cldk-sdk-frontend (SKILL.md § Client analyses, sdk-testing.md § 3b). Related: #228 (Java backend adoption of -a 3, whose ask #6 this generalizes).
Establishes the SDK side of the analyzer/SDK boundary now standard across the
codeanalyzer-*family (cldk-forge PR #7; reference instantiation codellm-devkit/codeanalyzer-java#171). The analyzers are pure graph providers — at-a 3they emitprogram_graphs(CFG/PDG/SDG with transitive HRBSUMMARYedges) and nothing more. Client analyses live here, in the SDK.This issue is the shared, cross-language client-analysis engine — built once over the shared
ProgramGraphsmodels, reused by every backend that emits level-3 graphs (Java, Python, C/C++, Go, Rust as they land). It is the destination for the slicing/taint work removed from the analyzer epics: codeanalyzer-python#67, codeanalyzer-clang#2, codeanalyzer-go#3, codeanalyzer-rust#25.Scope
ProgramGraphs/FunctionGraphs/GraphNode/GraphEdge/SDGEdge, validatinganalysis.json'sprogram_graphssection. Modeled once, not per-language (the parity clause holds across analyzers).CDG ∪ DDG ∪ PARAM_* ∪ SUMMARYfrom a(signature, node_id)criterion, context-sensitive via the two-phase HRB traversal (ascendPARAM_IN/CALL, then descendPARAM_OUT;SUMMARYedges carry across calls without re-descending). Exact expected-set gate on a fixture.(signature, node_id)chains with the model id per hop.flask/os/subprocess; Clibc; Gonet/http/os/exec) ship as data alongside the shared engine.TaintFlow/ slice-result ({ source, sink, rule, sanitized, path }) — SDK models, not analyzer output.get_backward_slice(...),get_taint_flows(spec=...), etc.PARAM_IN(argument arity collapsed until the analyzer ships per-argument PARAM nodes, e.g. codeanalyzer-java#173), missingSUMMARYedges before that analyzer PR lands, heap flows only under the analyzer's heap-dependence mode.Gates (frontend)
SUMMARYedges are used).sanitized; witness path names every hop.Contract references: cldk-forge
cldk-sdk-frontend(SKILL.md § Client analyses,sdk-testing.md § 3b). Related: #228 (Java backend adoption of-a 3, whose ask #6 this generalizes).