A C/C++ static-analysis toolkit — the CLDK backend that emits a canonical symbol table and call graph, as analysis.json or a Neo4j property graph, using LLVM/Clang.
canclang is a static analyzer for C and C++ built on LLVM/Clang
via libclang. It produces the canonical CodeLLM-DevKit (CLDK)
analysis.json — a symbol table plus a call graph — and can project that same analysis into a
Neo4j property graph. It is the C/C++ backend behind
CLDK, mirroring its
Python (canpy),
TypeScript (cants), and
Java siblings — so output-shape parity with
them is a first-class concern.
Because libclang is the Clang front end, structural parsing and type/overload/virtual-dispatch
resolution come from one tool: the level-1 call graph is resolved directly from the Clang AST
(cursor.referenced), not a shallow name match.
- Features
- Architecture & Tooling
- Installation
- Usage
- Analysis levels
- Output targets
- Output schema
- SDK integration
- Development
- License
- Symbol table — translation units, classes/structs/unions, methods, free functions,
constructors/destructors, fields, globals, enums, typedefs, macros,
#includes, and doc comments, with precise source spans and rich C/C++ flags (virtual,pure_virtual,const,static,inline,variadic, storage class, access specifier, templates, namespaces). - Call graph — resolved directly from the Clang AST: identity-only edges whose endpoints are
real symbol-table signatures, with
provenance=["clang"]. Constructors, member/virtual dispatch, and cross-file (out-of-line) method definitions are handled; indirect (function-pointer) calls are flagged and skipped. - Neo4j output — project the analysis into a labeled property graph: a self-contained
graph.cyphersnapshot, or an incremental push to a live database over Bolt. - Versioned schema — a machine-readable, version-stamped Neo4j schema contract (
--emit schema). - Caching — a content-hash per-file cache under
.codeanalyzer-clang/, so re-analysis only touches what changed.
These are the load-bearing backend decisions for this analyzer (see
.claude/SCHEMA_DECISIONS.md for the full node-by-node schema rationale):
codeanalyzer-clang — architecture & tooling
depth: level 1 — symbol table + libclang resolver call graph; level 2 (SVF) stubbed
runtime: Python (libclang bindings), invoked in-process by the SDK
structural: libclang / Clang AST (clang.cindex)
resolution: libclang / Clang AST — SAME tool (cursor.referenced resolves callees,
incl. C++ overloads and virtual dispatch)
framework (L2): LLVM-IR + SVF points-to — OFF by default, behind --svf, stubbed in this build
build/deps: optional compile_commands.json compilation database (accurate include paths/flags);
degrades to a language default (-x c/-x c++ -std=…) when absent
packaging: pip package `codeanalyzer-clang`, invoked in-process (the Python-analyzer
exception to the self-contained-binary rule); `pip install libclang` bundles
the native lib, so SDK users need no separate LLVM
extra nodes: struct/union (record_kind), enum, typedef, macro, namespace (as signature scope)
Rationale for the non-default choices. The self-contained-binary packaging rule is waived for
one reason: this analyzer is written in Python (mirroring codeanalyzer-python), so it ships as a
pip package invoked in-process — no subprocess, no cross-compiled binary. The heavy level-2 backend
is SVF (Andersen/Steensgaard points-to over LLVM bitcode), which is stronger than RTA and is the
one new-language case (like Java/WALA) with a true heavyweight builder available; it is scaffolded
but stubbed, since level 1 is the default depth.
- Python ≥ 3.9 (tested on 3.14).
- libclang — the native Clang library. The
libclangPyPI wheel (a declared dependency) bundles it, sopip install codeanalyzer-clangis normally self-sufficient. If you prefer a system LLVM:- macOS:
brew install llvm - Debian/Ubuntu:
apt-get install libclang-dev
- macOS:
- (optional) a
compile_commands.jsoncompilation database for accurate include paths — generate it withcmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON …or Bear. - (optional,
--emit neo4j --neo4j-uri) theneo4jdriver:pip install 'codeanalyzer-clang[neo4j]'.
pip install codeanalyzer-clang
# with the live Neo4j push extra:
pip install 'codeanalyzer-clang[neo4j]'brew tap codellm-devkit/tap
brew install codeanalyzer-clanggit clone https://github.com/codellm-devkit/codeanalyzer-clang.git
cd codeanalyzer-clang
python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev,neo4j]"
canclang --version# symbol table only (level 1, default), JSON to a temp dir
canclang -i /path/to/project -o /tmp/out -a 1
# symbol table + resolver call graph (level 2)
canclang -i /path/to/project -o /tmp/out -a 2 -c ~/.cldk/clang-cache
# single-file incremental analysis, compact JSON to stdout
canclang -i /path/to/project --file-name src/main.cpp
# with an accurate compilation database
canclang -i /path/to/project --compile-commands build -a 2 -o /tmp/out
# project into a Neo4j graph snapshot
canclang -i /path/to/project -a 2 --emit neo4j -o /tmp/out # writes graph.cypherUsage: canclang [OPTIONS]
Static analysis for C and C++ using LLVM/Clang (libclang).
Options:
--version Show the canclang version and exit.
-i, --input PATH Path to the C/C++ project root (not required for --emit schema).
-o, --output PATH Output directory for artifacts. Omit to print compact JSON to stdout.
-f, --format [json|msgpack] Output format for --emit json (default: json).
--emit [json|neo4j|schema] Output target (default: json).
-a, --analysis-level INT 1..2 1 = symbol table only; 2 = + libclang resolver call graph.
--svf / --no-svf Add the heavy level-2 SVF points-to call graph (stubbed).
-t, --target-files PATH Restrict analysis to these files (relative to --input).
--file-name PATH Analyze only this single file (relative to --input).
--skip-tests / --include-tests Skip test trees (default: skip).
--compile-commands PATH Directory containing compile_commands.json (auto-detected).
--std TEXT Override the C/C++ standard (e.g. c11, c++20).
--eager / --lazy Force a clean rebuild vs reuse cache (default: lazy).
-c, --cache-dir PATH Directory for the analysis cache.
--clear-cache / --keep-cache Clear cache after analysis (default: keep).
--app-name TEXT :ClangApplication anchor name (default: input dir name).
--neo4j-uri TEXT Live Bolt push target; omit to write graph.cypher. [env: NEO4J_URI]
--neo4j-user TEXT Neo4j username. [env: NEO4J_USERNAME]
--neo4j-password TEXT Neo4j password. [env: NEO4J_PASSWORD]
--neo4j-database TEXT Neo4j database name. [env: NEO4J_DATABASE]
-v Increase verbosity: -v, -vv.
--help Show this message and exit.
# emit the static Neo4j schema contract (no project needed)
canclang --emit schema -o /tmp/out # writes schema.json
# push into a live Neo4j incrementally
NEO4J_URI=bolt://localhost:7687 NEO4J_PASSWORD=secret \
canclang -i /path/to/project -a 2 --emit neo4j- Level 1 (
-a 1, default) — the symbol table only. Call sites are recorded on each callable withcallee_signature == null;call_graphis[]. - Level 2 (
-a 2) — also the resolver-based call graph: the same Clang AST resolves each call site, backfillscallee_signaturein place, and emits identity-only edges. Still cheap (the resolver is already loaded). --svf— the heavy, framework-based backend (LLVM-IR + SVF points-to). Stubbed in this build: the seams (semantic_analysis/svf/) exist and the flag is wired, but no extra edges are produced yet. Level-1 edges are unaffected.
analysis.json(default) — the canonical symbol table + call graph. Written to-o, or printed as compact JSON to stdout when-ois omitted (the SDK facade contract).- Neo4j graph (
--emit neo4j) — a self-containedgraph.cyphersnapshot, or a live Bolt push with--neo4j-uri. An alternative projection of the same in-memory IR, not an ingestion of the JSON. - Schema contract (
--emit schema) — the machine-readable, version-stamped Neo4j schema (schema.json).
The output validates against the CLDK ClangApplication contract:
{ symbol_table: { <relative file path>: ClangModule }, call_graph: [ClangCallEdge], ... }, with
identity-only edges whose source/target byte-match a real ClangCallable.signature. Signatures
are human-readable, fully-qualified, and overload-disambiguated (app::Point::add(int)); one
signature_of() canonicalizer produces every id. See .claude/SCHEMA_DECISIONS.md for the
field-by-field contract and the C/C++-specific extensions.
The CLDK SDKs bind this analyzer — in the Python SDK via CLDK.clang(project_path=...) (with the
legacy CLDK(language="clang").analysis(...) shim), and the other SDKs as they come online. Because
this analyzer is a Python package, the Python SDK invokes it in-process (imports Codeanalyzer /
AnalysisOptions, calls .analyze(), gets the ClangApplication back with no JSON round-trip). The
SDK wiring is done by the cldk-sdk-frontend skill.
pip install -e ".[dev,neo4j]"
pytest # symbol-table, call-graph, caching, CLI, and Neo4j-conformance gates
canclang -i testdata/fixture -a 2 -o /tmp/out && cat /tmp/out/analysis.jsonThe analyzer is a modular package mirroring codeanalyzer-python: a delegating core.analyze(), a
node-kind-split syntactic_analysis/symbol_table_builder.py, an isolated semantic_analysis/svf/
level-2 subpackage, a pluggable analysis/ (pass registry) + frameworks/ (entrypoint-finder base)
layer, and the neo4j/ projection.
Apache-2.0. See LICENSE.