Why
CLDK backends are written in the language they analyze — the analyzer lives in that language's
own ecosystem to reach its best tooling (Java→JVM+WALA, Python→Python+Jedi, TS→Node+ts-morph, Go→Go).
The initial codeanalyzer-clang (v0.1.0) was bootstrapped in Python via the libclang bindings
(clang.cindex) as a one-time exception to move fast; the correct long-term backend for C/C++ is a
native C++ binary built on Clang LibTooling / the full Clang AST.
This issue tracks that rewrite. (Being taken on as a hands-on exercise by @rahlk.)
Target architecture
- Runtime: native clang toolchain; a self-contained binary (the standard packaging rule —
SDK users need no runtime installed).
- Structural + resolution (Tier 1): full Clang AST via LibTooling /
RecursiveASTVisitor /
ASTMatchers + Sema — stronger than libclang's cursor API for template instantiation and
overload/virtual resolution.
- Framework backend (Tier 2, optional): LLVM-IR + SVF/Phasar points-to (already scaffolded as
a stub in the Python version under semantic_analysis/svf/).
- Build/deps:
compile_commands.json compilation database (as today).
- Packaging: per-target native build matrix (LLVM can't cross-compile cleanly) → GitHub Release
binaries + a Homebrew formula reusing those assets; the SDK invokes it as a subprocess (not
in-process — that changes the SDK facade from the Python-only in-process path to the subprocess
path).
Contract to preserve (must not drift)
The rewrite must emit the byte-identical analysis.json contract the Python version already
produces and validates against:
- Root
ClangApplication { symbol_table: {path: ClangModule}, call_graph: [ClangCallEdge] },
identity-only edges, snake_case keys.
- One
signatureOf() canonicalizer producing the same human-readable, overload-disambiguated,
fully-qualified ids (e.g. app::Point::add(int)), from cursor.canonical on both sides.
- Every field and node kind in
.claude/SCHEMA_DECISIONS.md (record_kind, enums, typedefs, macros,
virtual/pure_virtual/const/static/variadic flags, access specifier, templates, namespaces, USR tag,
callsite flags).
- The same Neo4j schema contract (
neo4j/schema.py → --emit schema), enforced by a conformance test.
Definition of done
Reference
- The Python implementation is the behavioral spec + fixture to match (commit b83d3f5).
- Skill guidance:
cldk-forge:codeanalyzer-backend — tooling-menu.md § "C++ (the clang/libclang
path)" and § "Packaging".
Why
CLDK backends are written in the language they analyze — the analyzer lives in that language's
own ecosystem to reach its best tooling (Java→JVM+WALA, Python→Python+Jedi, TS→Node+ts-morph, Go→Go).
The initial
codeanalyzer-clang(v0.1.0) was bootstrapped in Python via the libclang bindings(
clang.cindex) as a one-time exception to move fast; the correct long-term backend for C/C++ is anative C++ binary built on Clang LibTooling / the full Clang AST.
This issue tracks that rewrite. (Being taken on as a hands-on exercise by @rahlk.)
Target architecture
SDK users need no runtime installed).
RecursiveASTVisitor/ASTMatchers +
Sema— stronger than libclang's cursor API for template instantiation andoverload/virtual resolution.
a stub in the Python version under
semantic_analysis/svf/).compile_commands.jsoncompilation database (as today).binaries + a Homebrew formula reusing those assets; the SDK invokes it as a subprocess (not
in-process — that changes the SDK facade from the Python-only in-process path to the subprocess
path).
Contract to preserve (must not drift)
The rewrite must emit the byte-identical
analysis.jsoncontract the Python version alreadyproduces and validates against:
ClangApplication { symbol_table: {path: ClangModule}, call_graph: [ClangCallEdge] },identity-only edges, snake_case keys.
signatureOf()canonicalizer producing the same human-readable, overload-disambiguated,fully-qualified ids (e.g.
app::Point::add(int)), fromcursor.canonicalon both sides..claude/SCHEMA_DECISIONS.md(record_kind, enums, typedefs, macros,virtual/pure_virtual/const/static/variadic flags, access specifier, templates, namespaces, USR tag,
callsite flags).
neo4j/schema.py→--emit schema), enforced by a conformance test.Definition of done
testdata/fixture/(symbol-table, call-graph, caching, flag-validation, Neo4j conformance).
ClangApplicationmodel — no dangling edges.release.yml.from in-process to subprocess (coordinate with cldk-sdk-frontend).
Reference
cldk-forge:codeanalyzer-backend—tooling-menu.md§ "C++ (the clang/libclangpath)" and § "Packaging".