Skip to content

[FEATURE] Architecture analysis metrics — modularity, fan-in/out, inheritance, change coupling, hotspots #61

Description

@Wolfvin

Summary

Add 7 architecture analysis metrics from Emerge + UA: Louvain modularity (community detection), TF-IDF keyword extraction, whitespace complexity (fast proxy), fan-in/fan-out, inheritance graph, change coupling (git history), code churn + author diversity. Plus architectural layer detection from UA.

Worker consensus (2 reports — mostly Emerge, with UA additions)

Worker Source Contribution
Emerge update!/CodeLens_Upgrade_Issues_from_Emerge.md CL-024 Louvain modularity — python-louvain library, 5x optimization runs, resolution 1.5. 3 commands: modularity, clusters, ball-of-mud. Run on dependency/call/inheritance/complete graph.
Emerge same file CL-025 TF-IDF semantic keyword extraction per file — scikit-learn TfidfVectorizer, 12 language-specific stopword sets. 2 commands: keywords, semantic-search.
Emerge same file CL-026 Whitespace complexity metric — counts indentation per-line, 10-100x faster than cyclomatic. --quick mode skips AST.
Emerge same file CL-027 Fan-in / fan-out graph metrics — avg_fan_in, avg_fan_out, max_fan_in_name, max_fan_out_name. New command fan-in-out. Integrate with impact.
Emerge same file CL-028 Inheritance graph — extend parsers to extract class + parent. New graph type inheritance_graph. 2 commands: inheritance, god-class (>20 children or >5 depth).
Emerge same file CL-029 Change coupling graph (git history) — PyDriller traverses commit history, files committed together = coupled. 3 commands: change-coupling, shotgun-surgery, coupled-with.
Emerge same file CL-030 Git code churn & author diversity — extend ownership_engine.py with multi-commit churn. 2 commands: hotspot, bus-factor.
UnderstandAnything update!/CodeLens_vs_UnderstandAnything_Upgrade_Analysis.md U2 Architectural layer detection — 9 layers (API, Service, Data, UI, Middleware, External, Background, Utility, Test) via directory-path heuristic. New command layers. Update summary + impact.

Proposed scope (P2, 6-10 weeks total — can be split across multiple PRs)

Each metric is independent and can ship separately:

Metric 1 — Whitespace complexity (P2, 3 days, quick win)

  • Copy emerge/metrics/whitespace/whitespace.py (81 LOC, MIT)
  • Add --metric ws|cyclomatic|cognitive|all to complexity command
  • Quick mode codelens complexity --quick --top 20 skips AST, <5s for 5000 files

Metric 2 — Fan-in / fan-out (P1, 3 days, quick win)

  • Add calculate_fan_in_out() to callgraph_engine.py
  • New command codelens fan-in-out [workspace] [--name FN] [--top N]
  • Update impact_engine.py to include fan-in/out in output

Metric 3 — Louvain modularity (P2, 1 week)

  • Adapt emerge/metrics/modularity/modularity.py (188 LOC, MIT)
  • Add python-louvain dependency
  • 3 commands: modularity, clusters, ball-of-mud
  • MCP tools: codelens_modularity, codelens_clusters, codelens_ball_of_mud
  • Benchmark: 5000-file dependency graph <30s

Metric 4 — TF-IDF keyword extraction (P2, 1 week)

  • Adapt emerge/metrics/tfidf/tfidf.py (118 LOC, MIT)
  • Add scikit-learn dependency
  • 2 commands: keywords, semantic-search
  • Cache at .codelens/keywords_cache.json
  • MCP tools: codelens_keywords, codelens_semantic_search

Metric 5 — Inheritance graph (P2, 2-3 weeks)

  • Extend parsers (Python, JS, TS, TSX, Rust, Vue, Svelte + fallback for Java/Kotlin/Swift/C++/C#/PHP/Ruby) to extract class + parent
  • New graph type inheritance_graph in callgraph_engine.py
  • 2 commands: inheritance [workspace] [--class NAME] [--depth N], god-class [workspace]
  • MCP tools: codelens_inheritance, codelens_god_class

Metric 6 — Change coupling (P1, 1-2 weeks, high impact unique feature)

  • Add pydriller dependency
  • Adapt emerge/metrics/git/git.py (234 LOC)
  • 3 commands: change-coupling, shotgun-surgery, coupled-with <file>
  • Update impact_engine.py to include coupled files
  • Performance: 1000 commits <60s

Metric 7 — Code churn & author diversity (P2, 1 week, depends on Metric 6)

  • Refactor ownership_engine.py to traverse git history (not just git blame)
  • 2 commands: hotspot, bus-factor
  • Extend ownership command output with code_churn_30d, code_churn_90d, number_authors, top_contributors

Metric 8 — Architectural layer detection (P2, 1 week)

  • New scripts/layer_detector.py with 9 layer patterns (port from UA layer-detector.ts)
  • Directory-path heuristic, first-match-wins
  • New command codelens layers [workspace]
  • Update summary to include layer breakdown
  • Update impact to show affected layer

Acceptance criteria

  • Each metric has 10+ test cases
  • codelens complexity --quick <5s for 5000 files
  • codelens modularity produces stable results across 5 runs (Louvain non-determinism controlled)
  • codelens change-coupling correctly identifies files often committed together
  • codelens layers correctly classifies files into 9 architectural layers

Files

  • New: scripts/{ws_complexity,modularity,tfidf,change_coupling,hotspot,layer_detector}_engine.py, scripts/commands/{fan_in_out,modularity,clusters,ball_of_mud,keywords,semantic_search,inheritance,god_class,change_coupling,shotgun_surgery,coupled_with,hotspot,bus_factor,layers}.py
  • Update: scripts/{callgraph,ownership,impact,summary}_engine.py, scripts/{python,js_backend,ts_backend,tsx,rust,vue,svelte}_parser.py

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions