Follow-up on #171 (full SDG at -a 3). Substrate enhancement, analyzer-side — not a client analysis (see #171 decision #11: taint/slicing are frontend queries; this only enriches the graph they query).
Problem
The level-3 SDG emitted today stitches PDGs at call sites with CALL / PARAM_IN / PARAM_OUT edges, but has no SUMMARY edges — the Horwitz–Reps–Binkley transitive-flow edges. Two consequences for any frontend reachability query (taint, slicing) over the graph:
- Context-insensitive. Plain reachability over
DDG ∪ PARAM_* admits unrealizable paths: taint can enter a callee from call site A and exit at call site B. HRB SUMMARY edges + the two-phase (up-then-down) traversal are what restore context sensitivity without the frontend re-descending into callees.
- Expensive queries. Without summaries, every frontend slice/taint walk re-traverses callee bodies at each call site. A
SUMMARY edge lets the walk short-circuit across a call.
SUMMARY edges are keyed on data dependence, not any source/sink policy, so they are reusable across every taint config and belong in the analyzer graph (unlike taint_flows, which never will — #171 decision #11).
Prerequisite: per-argument PARAM nodes
The current encoding collapses PARAM_IN/PARAM_OUT to the callee ENTRY/EXIT node (argument arity is lost — taint into any argument taints all parameters). SUMMARY edges are actual-in → actual-out at the same call site, so they need distinct per-argument actual/formal nodes:
- actual-in node per argument at the call site; actual-out per return value / mutated out-param;
- formal-in per parameter at the callee; formal-out per returned/mutated value.
WALA's slicer already exposes these — ParamCaller/ParamCallee/NormalReturnCaller/HeapParamCaller statements carry the value number and call index we currently discard when anchoring at ENTRY/EXIT. So this is a refinement of the existing statementNode() mapping, not new analysis.
Scope
PR A — per-argument PARAM nodes. Refine SystemDependencyGraph.statementNode() + the sdg_edges emission so PARAM_IN/PARAM_OUT reference per-argument (signature, node_id) actual/formal nodes instead of collapsing to ENTRY/EXIT. Record the new node kinds (actual_in, actual_out, formal_in, formal_out) in .claude/SCHEMA_DECISIONS.md. Gate: PARAM_IN/PARAM_OUT arity matches the callee's parameter count on the fixture; no dangling endpoints.
PR B — SUMMARY edges. Emit SUMMARY edges (actual-in → actual-out at a call site) into sdg_edges from the transitive intraprocedural flow. Two viable sources, decide with a spike:
- read them off WALA's slicer, which computes HRB summaries lazily during a slice (cheapest if the API exposes them cleanly), or
- compute them ourselves as a bottom-up composition over the SCC-condensation DAG of the call graph (Tarjan → wavefront), per the skillset's
dataflow-construction.md Stage 6–7.
Gate (dataflow-graphs.md § Verification gates): at least one SUMMARY edge for a known transitive flow in the fixture (getName() result flows to helloString()'s return via the concat); a backward slice using summaries equals the hand-computed node set; no dangling endpoints; program_graphs still validates against the SDK models.
Explicitly out of scope
- Taint/slicing clients themselves — frontend (python-sdk#228).
- k-limiting / interprocedural fixpoint tuning beyond what termination requires (only relevant if we compute summaries ourselves rather than reading WALA's).
- CPG projection of the new node kinds into Neo4j — a separate follow-up once the JSON shape is settled.
Fixture
call-graph-test already exercises a two-hop value flow (getName() → helloString() return). Add one multi-argument callee so per-argument PARAM arity and a SUMMARY edge through a specific argument are both assertable.
Parent: #171. Precision posture unchanged (sound-leaning, over-approximate). -a 1/-a 2 unaffected.
Follow-up on #171 (full SDG at
-a 3). Substrate enhancement, analyzer-side — not a client analysis (see #171 decision #11: taint/slicing are frontend queries; this only enriches the graph they query).Problem
The level-3 SDG emitted today stitches PDGs at call sites with
CALL/PARAM_IN/PARAM_OUTedges, but has noSUMMARYedges — the Horwitz–Reps–Binkley transitive-flow edges. Two consequences for any frontend reachability query (taint, slicing) over the graph:DDG ∪ PARAM_*admits unrealizable paths: taint can enter a callee from call site A and exit at call site B. HRBSUMMARYedges + the two-phase (up-then-down) traversal are what restore context sensitivity without the frontend re-descending into callees.SUMMARYedge lets the walk short-circuit across a call.SUMMARYedges are keyed on data dependence, not any source/sink policy, so they are reusable across every taint config and belong in the analyzer graph (unliketaint_flows, which never will — #171 decision #11).Prerequisite: per-argument PARAM nodes
The current encoding collapses
PARAM_IN/PARAM_OUTto the calleeENTRY/EXITnode (argument arity is lost — taint into any argument taints all parameters).SUMMARYedges areactual-in → actual-outat the same call site, so they need distinct per-argument actual/formal nodes:WALA's slicer already exposes these —
ParamCaller/ParamCallee/NormalReturnCaller/HeapParamCallerstatements carry the value number and call index we currently discard when anchoring at ENTRY/EXIT. So this is a refinement of the existingstatementNode()mapping, not new analysis.Scope
PR A — per-argument PARAM nodes. Refine
SystemDependencyGraph.statementNode()+ thesdg_edgesemission soPARAM_IN/PARAM_OUTreference per-argument(signature, node_id)actual/formal nodes instead of collapsing toENTRY/EXIT. Record the new node kinds (actual_in,actual_out,formal_in,formal_out) in.claude/SCHEMA_DECISIONS.md. Gate:PARAM_IN/PARAM_OUTarity matches the callee's parameter count on the fixture; no dangling endpoints.PR B —
SUMMARYedges. EmitSUMMARYedges (actual-in → actual-outat a call site) intosdg_edgesfrom the transitive intraprocedural flow. Two viable sources, decide with a spike:dataflow-construction.mdStage 6–7.Gate (
dataflow-graphs.md § Verification gates): at least oneSUMMARYedge for a known transitive flow in the fixture (getName()result flows tohelloString()'s return via the concat); a backward slice using summaries equals the hand-computed node set; no dangling endpoints;program_graphsstill validates against the SDK models.Explicitly out of scope
Fixture
call-graph-testalready exercises a two-hop value flow (getName()→helloString()return). Add one multi-argument callee so per-argument PARAM arity and aSUMMARYedge through a specific argument are both assertable.Parent: #171. Precision posture unchanged (sound-leaning, over-approximate).
-a 1/-a 2unaffected.