Skip to content

Java backend: adopt codeanalyzer analysis level 3 (full SDG) — default to -a 3, model program_graphs, adapt SCIP indexing #228

Description

@rahlk

Context

codellm-devkit/codeanalyzer-java#171 adds analysis level 3 to the Java analyzer: the full system dependency graph (control + data dependence) from WALA's slicer. At -a 3 the analyzer emits two new analysis.json sections:

  • system_dependency_graph — method-level dependence edges. This already validates against the existing JApplication.system_dependency_graph: List[JGraphEdges] model with zero model changes (verified against this repo's models on the analyzer's call-graph-test fixture).
  • program_graphs — statement-level graphs per the CLDK level-3 dataflow contract: per-callable cfg and pdg keyed by (signature, node_id) (ENTRY = 0, SSA instructions in order, EXIT = last), plus cross-function sdg_edges (CALL, PARAM_IN, PARAM_OUT), schema_version'd. Example:
"program_graphs": {
  "schema_version": "1.0.0",
  "data_dependence": "no-heap",
  "functions": {
    "org.example.User.helloString()": {
      "signature": "helloString()",
      "type_declaration": "org.example.User",
      "file_path": "...",
      "cfg": { "nodes": [ { "id": 0, "kind": "entry", "start_line": -1, "end_line": -1 }, ... ],
               "edges": [ { "source": 1, "target": 2, "kind": "fallthrough" }, ... ] },
      "pdg": { "edges": [ { "source": 0, "target": 1, "type": "CDG", "label": "CONTROL_DEP" }, ... ] }
    }
  },
  "sdg_edges": [
    { "source": { "signature": "org.example.User.helloString()", "node": 1 },
      "target": { "signature": "org.example.User.log()", "node": 0 },
      "type": "CALL", "label": "CONTROL_DEP" }
  ]
}

Asks

  1. Default the Java backend to analysis level 3, dialing down on request: JCodeanalyzer should invoke the binary with --analysis-level=3 by default and honor an explicit lower analysis_level (symbol table / call graph) when the caller asks for less.
  2. Make get_system_dependency_graph() real. It currently warns "System dependency graph is not yet implemented. Returning the call graph instead." — it should return the actual system_dependency_graph edges (and keep the call-graph fallback only for old analysis files).
  3. Model program_graphs once, shared across languages per the level-3 parity clause: ProgramGraphs, FunctionGraphs, GraphNode, GraphEdge, SDGEdge — not per-language copies. (Language analyzers may add node/edge kinds additively.)
  4. Adapt the SCIP indexing to the new schema so statement-level nodes/edges from program_graphs participate in indexing alongside the symbol table and call graph.
  5. Pin the minimum codeanalyzer-java version that emits level 3 once it is released.

Notes

  • -a 1 / -a 2 output is unchanged (verified byte-identical on the fixture), so defaulting to 3 is purely additive from the SDK's perspective — but level 3 is slower (WALA slicer), which is the reason for the dial-down knob.
  • SUMMARY edges and taint/slicing clients are analyzer-side follow-ups, out of scope here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions