codeanalyzer-clang (`canclang`)

A C/C++ static-analysis toolkit — the CLDK backend that emits a canonical symbol table and call graph, as analysis.json or a Neo4j property graph, using LLVM/Clang.

canclang is a static analyzer for C and C++ built on LLVM/Clang via libclang. It produces the canonical CodeLLM-DevKit (CLDK) analysis.json — a symbol table plus a call graph — and can project that same analysis into a Neo4j property graph. It is the C/C++ backend behind CLDK, mirroring its Python (canpy), TypeScript (cants), and Java siblings — so output-shape parity with them is a first-class concern.

Because libclang is the Clang front end, structural parsing and type/overload/virtual-dispatch resolution come from one tool: the level-1 call graph is resolved directly from the Clang AST (cursor.referenced), not a shallow name match.

Features

Symbol table — translation units, classes/structs/unions, methods, free functions, constructors/destructors, fields, globals, enums, typedefs, macros, #includes, and doc comments, with precise source spans and rich C/C++ flags (virtual, pure_virtual, const, static, inline, variadic, storage class, access specifier, templates, namespaces).
Call graph — resolved directly from the Clang AST: identity-only edges whose endpoints are real symbol-table signatures, with provenance=["clang"]. Constructors, member/virtual dispatch, and cross-file (out-of-line) method definitions are handled; indirect (function-pointer) calls are flagged and skipped.
Neo4j output — project the analysis into a labeled property graph: a self-contained graph.cypher snapshot, or an incremental push to a live database over Bolt.
Versioned schema — a machine-readable, version-stamped Neo4j schema contract (--emit schema).
Caching — a content-hash per-file cache under .codeanalyzer-clang/, so re-analysis only touches what changed.

Architecture & Tooling

These are the load-bearing backend decisions for this analyzer (see .claude/SCHEMA_DECISIONS.md for the full node-by-node schema rationale):

codeanalyzer-clang — architecture & tooling
  depth:          level 1 — symbol table + libclang resolver call graph; level 2 (SVF) stubbed
  runtime:        Python (libclang bindings), invoked in-process by the SDK
  structural:     libclang / Clang AST (clang.cindex)
  resolution:     libclang / Clang AST — SAME tool (cursor.referenced resolves callees,
                  incl. C++ overloads and virtual dispatch)
  framework (L2): LLVM-IR + SVF points-to — OFF by default, behind --svf, stubbed in this build
  build/deps:     optional compile_commands.json compilation database (accurate include paths/flags);
                  degrades to a language default (-x c/-x c++ -std=…) when absent
  packaging:      pip package `codeanalyzer-clang`, invoked in-process (the Python-analyzer
                  exception to the self-contained-binary rule); `pip install libclang` bundles
                  the native lib, so SDK users need no separate LLVM
  extra nodes:    struct/union (record_kind), enum, typedef, macro, namespace (as signature scope)

Rationale for the non-default choices. The self-contained-binary packaging rule is waived for one reason: this analyzer is written in Python (mirroring codeanalyzer-python), so it ships as a pip package invoked in-process — no subprocess, no cross-compiled binary. The heavy level-2 backend is SVF (Andersen/Steensgaard points-to over LLVM bitcode), which is stronger than RTA and is the one new-language case (like Java/WALA) with a true heavyweight builder available; it is scaffolded but stubbed, since level 1 is the default depth.

Installation

Prerequisites

Python ≥ 3.9 (tested on 3.14).
libclang — the native Clang library. The libclang PyPI wheel (a declared dependency) bundles it, so pip install codeanalyzer-clang is normally self-sufficient. If you prefer a system LLVM:
- macOS: brew install llvm
- Debian/Ubuntu: apt-get install libclang-dev
(optional) a compile_commands.json compilation database for accurate include paths — generate it with cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON … or Bear.
(optional, --emit neo4j --neo4j-uri) the neo4j driver: pip install 'codeanalyzer-clang[neo4j]'.

Install via pip (PyPI)

pip install codeanalyzer-clang
# with the live Neo4j push extra:
pip install 'codeanalyzer-clang[neo4j]'

Install via Homebrew

brew tap codellm-devkit/tap
brew install codeanalyzer-clang

Build from source

git clone https://github.com/codellm-devkit/codeanalyzer-clang.git
cd codeanalyzer-clang
python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev,neo4j]"
canclang --version

Usage

# symbol table only (level 1, default), JSON to a temp dir
canclang -i /path/to/project -o /tmp/out -a 1

# symbol table + resolver call graph (level 2)
canclang -i /path/to/project -o /tmp/out -a 2 -c ~/.cldk/clang-cache

# single-file incremental analysis, compact JSON to stdout
canclang -i /path/to/project --file-name src/main.cpp

# with an accurate compilation database
canclang -i /path/to/project --compile-commands build -a 2 -o /tmp/out

# project into a Neo4j graph snapshot
canclang -i /path/to/project -a 2 --emit neo4j -o /tmp/out    # writes graph.cypher

Options

Usage: canclang [OPTIONS]

  Static analysis for C and C++ using LLVM/Clang (libclang).

Options:
  --version                       Show the canclang version and exit.
  -i, --input PATH                Path to the C/C++ project root (not required for --emit schema).
  -o, --output PATH               Output directory for artifacts. Omit to print compact JSON to stdout.
  -f, --format [json|msgpack]     Output format for --emit json (default: json).
  --emit [json|neo4j|schema]      Output target (default: json).
  -a, --analysis-level INT 1..2   1 = symbol table only; 2 = + libclang resolver call graph.
  --svf / --no-svf                Add the heavy level-2 SVF points-to call graph (stubbed).
  -t, --target-files PATH         Restrict analysis to these files (relative to --input).
  --file-name PATH                Analyze only this single file (relative to --input).
  --skip-tests / --include-tests  Skip test trees (default: skip).
  --compile-commands PATH         Directory containing compile_commands.json (auto-detected).
  --std TEXT                      Override the C/C++ standard (e.g. c11, c++20).
  --eager / --lazy                Force a clean rebuild vs reuse cache (default: lazy).
  -c, --cache-dir PATH            Directory for the analysis cache.
  --clear-cache / --keep-cache    Clear cache after analysis (default: keep).
  --app-name TEXT                 :ClangApplication anchor name (default: input dir name).
  --neo4j-uri TEXT                Live Bolt push target; omit to write graph.cypher. [env: NEO4J_URI]
  --neo4j-user TEXT               Neo4j username. [env: NEO4J_USERNAME]
  --neo4j-password TEXT           Neo4j password. [env: NEO4J_PASSWORD]
  --neo4j-database TEXT           Neo4j database name. [env: NEO4J_DATABASE]
  -v                              Increase verbosity: -v, -vv.
  --help                          Show this message and exit.

Examples

# emit the static Neo4j schema contract (no project needed)
canclang --emit schema -o /tmp/out          # writes schema.json

# push into a live Neo4j incrementally
NEO4J_URI=bolt://localhost:7687 NEO4J_PASSWORD=secret \
  canclang -i /path/to/project -a 2 --emit neo4j

Analysis levels

Level 1 (-a 1, default) — the symbol table only. Call sites are recorded on each callable with callee_signature == null; call_graph is [].
Level 2 (-a 2) — also the resolver-based call graph: the same Clang AST resolves each call site, backfills callee_signature in place, and emits identity-only edges. Still cheap (the resolver is already loaded).
--svf — the heavy, framework-based backend (LLVM-IR + SVF points-to). Stubbed in this build: the seams (semantic_analysis/svf/) exist and the flag is wired, but no extra edges are produced yet. Level-1 edges are unaffected.

Output targets

analysis.json (default) — the canonical symbol table + call graph. Written to -o, or printed as compact JSON to stdout when -o is omitted (the SDK facade contract).
Neo4j graph (--emit neo4j) — a self-contained graph.cypher snapshot, or a live Bolt push with --neo4j-uri. An alternative projection of the same in-memory IR, not an ingestion of the JSON.
Schema contract (--emit schema) — the machine-readable, version-stamped Neo4j schema (schema.json).

Output schema

The output validates against the CLDK ClangApplication contract: { symbol_table: { <relative file path>: ClangModule }, call_graph: [ClangCallEdge], ... }, with identity-only edges whose source/target byte-match a real ClangCallable.signature. Signatures are human-readable, fully-qualified, and overload-disambiguated (app::Point::add(int)); one signature_of() canonicalizer produces every id. See .claude/SCHEMA_DECISIONS.md for the field-by-field contract and the C/C++-specific extensions.

SDK integration

The CLDK SDKs bind this analyzer — in the Python SDK via CLDK.clang(project_path=...) (with the legacy CLDK(language="clang").analysis(...) shim), and the other SDKs as they come online. Because this analyzer is a Python package, the Python SDK invokes it in-process (imports Codeanalyzer / AnalysisOptions, calls .analyze(), gets the ClangApplication back with no JSON round-trip). The SDK wiring is done by the cldk-sdk-frontend skill.

Development

pip install -e ".[dev,neo4j]"
pytest                       # symbol-table, call-graph, caching, CLI, and Neo4j-conformance gates
canclang -i testdata/fixture -a 2 -o /tmp/out && cat /tmp/out/analysis.json

The analyzer is a modular package mirroring codeanalyzer-python: a delegating core.analyze(), a node-kind-split syntactic_analysis/symbol_table_builder.py, an isolated semantic_analysis/svf/ level-2 subpackage, a pluggable analysis/ (pass registry) + frameworks/ (entrypoint-finder base) layer, and the neo4j/ projection.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
.github/workflows		.github/workflows
codeanalyzer_clang		codeanalyzer_clang
packaging/homebrew		packaging/homebrew
testdata/fixture		testdata/fixture
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

codeanalyzer-clang (`canclang`)

Table of Contents

Features

Architecture & Tooling

Installation

Prerequisites

Install via pip (PyPI)

Install via Homebrew

Build from source

Usage

Options

Examples

Analysis levels

Output targets

Output schema

SDK integration

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

codeanalyzer-clang (canclang)

Table of Contents

Features

Architecture & Tooling

Installation

Prerequisites

Install via pip (PyPI)

Install via Homebrew

Build from source

Usage

Options

Examples

Analysis levels

Output targets

Output schema

SDK integration

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

codeanalyzer-clang (`canclang`)

Packages