Skip to content

Level-3 substrate: per-argument PARAM nodes + HRB SUMMARY edges (context-sensitive frontend queries) #173

Description

@rahlk

Follow-up on #171 (full SDG at -a 3). Substrate enhancement, analyzer-side — not a client analysis (see #171 decision #11: taint/slicing are frontend queries; this only enriches the graph they query).

Problem

The level-3 SDG emitted today stitches PDGs at call sites with CALL / PARAM_IN / PARAM_OUT edges, but has no SUMMARY edges — the Horwitz–Reps–Binkley transitive-flow edges. Two consequences for any frontend reachability query (taint, slicing) over the graph:

  1. Context-insensitive. Plain reachability over DDG ∪ PARAM_* admits unrealizable paths: taint can enter a callee from call site A and exit at call site B. HRB SUMMARY edges + the two-phase (up-then-down) traversal are what restore context sensitivity without the frontend re-descending into callees.
  2. Expensive queries. Without summaries, every frontend slice/taint walk re-traverses callee bodies at each call site. A SUMMARY edge lets the walk short-circuit across a call.

SUMMARY edges are keyed on data dependence, not any source/sink policy, so they are reusable across every taint config and belong in the analyzer graph (unlike taint_flows, which never will — #171 decision #11).

Prerequisite: per-argument PARAM nodes

The current encoding collapses PARAM_IN/PARAM_OUT to the callee ENTRY/EXIT node (argument arity is lost — taint into any argument taints all parameters). SUMMARY edges are actual-in → actual-out at the same call site, so they need distinct per-argument actual/formal nodes:

  • actual-in node per argument at the call site; actual-out per return value / mutated out-param;
  • formal-in per parameter at the callee; formal-out per returned/mutated value.

WALA's slicer already exposes these — ParamCaller/ParamCallee/NormalReturnCaller/HeapParamCaller statements carry the value number and call index we currently discard when anchoring at ENTRY/EXIT. So this is a refinement of the existing statementNode() mapping, not new analysis.

Scope

PR A — per-argument PARAM nodes. Refine SystemDependencyGraph.statementNode() + the sdg_edges emission so PARAM_IN/PARAM_OUT reference per-argument (signature, node_id) actual/formal nodes instead of collapsing to ENTRY/EXIT. Record the new node kinds (actual_in, actual_out, formal_in, formal_out) in .claude/SCHEMA_DECISIONS.md. Gate: PARAM_IN/PARAM_OUT arity matches the callee's parameter count on the fixture; no dangling endpoints.

PR B — SUMMARY edges. Emit SUMMARY edges (actual-in → actual-out at a call site) into sdg_edges from the transitive intraprocedural flow. Two viable sources, decide with a spike:

  • read them off WALA's slicer, which computes HRB summaries lazily during a slice (cheapest if the API exposes them cleanly), or
  • compute them ourselves as a bottom-up composition over the SCC-condensation DAG of the call graph (Tarjan → wavefront), per the skillset's dataflow-construction.md Stage 6–7.
    Gate (dataflow-graphs.md § Verification gates): at least one SUMMARY edge for a known transitive flow in the fixture (getName() result flows to helloString()'s return via the concat); a backward slice using summaries equals the hand-computed node set; no dangling endpoints; program_graphs still validates against the SDK models.

Explicitly out of scope

  • Taint/slicing clients themselves — frontend (python-sdk#228).
  • k-limiting / interprocedural fixpoint tuning beyond what termination requires (only relevant if we compute summaries ourselves rather than reading WALA's).
  • CPG projection of the new node kinds into Neo4j — a separate follow-up once the JSON shape is settled.

Fixture

call-graph-test already exercises a two-hop value flow (getName()helloString() return). Add one multi-argument callee so per-argument PARAM arity and a SUMMARY edge through a specific argument are both assertable.

Parent: #171. Precision posture unchanged (sound-leaning, over-approximate). -a 1/-a 2 unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions