Summary
Add persistent memory system so AI agents can record and recall project context across sessions. Three complementary layers: Serena-style markdown memory (mem: references), RepoAudit 3-tier registry (syntactic / semantic / report), and UnderstandAnything wiki/knowledge-base ingestion.
Worker consensus (4 reports — complementary, not competing)
| Worker |
Source |
Contribution |
| Serena |
update!/CodeLens_vs_Serena_Upgrade_Analysis.md S5 |
Markdown memory at .codelens/memories/ (project) + ~/.codelens/memories/global/ (global). mem:NAME reference convention with referential integrity check + auto-fix. 6 memory commands. Read-only pattern protects global. |
| Serena |
same file S6 |
Onboarding process — auto-generate project_overview.md, project_structure.md, key_modules.md, conventions.md, suggested_commands.md on first run. |
| RepoAudit |
update!/CodeLens_Upgrade_Issues_from_RepoAudit.md CL-041 |
3-tier memory: SyntacticMemory (AST-derived immutable, persisted .codelens/memory/syntactic.pkl), SemanticMemory (per-agent ephemeral, thread-safe), ReportMemory (immutable append-only findings, persisted .codelens/memory/report.json). Refactor registry.py to delegate. |
| UnderstandAnything |
update!/CodeLens_vs_UnderstandAnything_Upgrade_Analysis.md U8 |
Knowledge base analysis (Karpathy-pattern wiki) — 5-phase pipeline: DETECT → SCAN → ANALYZE (LLM) → MERGE → SAVE. Dispatches article-analyzer subagent per batch. Output knowledge-graph.json with kind: "knowledge". |
| OpenTaint |
update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.md E2 |
Documentation pattern — update references/agent-integration.md with multi-skill orchestrator pattern, state tracking, resource limits. (Not a feature — just docs.) |
Proposed phased scope
Phase 1 — Serena-style markdown memory (P2, 2-3 weeks)
- Memory at
.codelens/memories/ (project) + ~/.codelens/memories/global/ (global)
- Markdown files (human-readable, versionable)
- Topic via
/ in name maps to subdirectory
mem:NAME reference convention with referential integrity check (codelens memory check)
- Auto-fix (
codelens memory check --fix)
- Reference rename propagation
- 6 commands:
codelens memory write/read/edit/delete/rename/list/check
- Read-only pattern (
read_only_memory_patterns regex) protects global memory
- Seed
memory_maintenance.md at first run describing convention
- 6 memory commands exposed as MCP tools
- New files:
scripts/memories/memory_manager.py, scripts/memories/memory_reference_analysis.py, scripts/commands/memory.py
Phase 2 — Onboarding process (P2, 1-2 weeks, depends on Phase 1)
- Detect first-time run via empty
.codelens/memories/
- Auto-generate:
project_overview.md (reuse detect), project_structure.md (reuse outline), key_modules.md (reuse entrypoints + api-map), conventions.md (reuse smell + complexity), suggested_commands.md
--skip-onboarding and --re-onboard flags
codelens init auto-triggers onboarding
- New file:
scripts/onboarding_engine.py, scripts/commands/onboarding.py
Phase 3 — 3-tier registry refactor (P3, 2-3 weeks, optional — large refactor)
- Split monolithic
registry.py (440 LOC) into 3 tiers:
SyntacticMemory — AST-derived immutable facts, persisted .codelens/memory/syntactic.pkl
SemanticMemory — per-agent ephemeral, thread-safe with threading.RLock() per-field
ReportMemory — immutable append-only findings, persisted .codelens/memory/report.json
- Refactor
registry.py to delegate to SyntacticMemory (backward-compat API)
- Refactor
ast_taint_engine.py, dataflow_engine.py, crossfile_taint_engine.py to use SemanticMemory
- Refactor output formatters to consume
ReportMemory
codelens migrate-memory script converts old .codelens/registry.json to 3-tier
- New files:
scripts/memory/{syntactic,semantic,report}.py, scripts/commands/migrate_memory.py
Phase 4 — Knowledge base wiki ingestion (P3, 3-4 weeks, depends on LLM integration issue)
codelens knowledge [wiki-directory] command
- Detect
index.md + multiple .md files with [[wikilink]] syntax
- 5-phase pipeline: DETECT (format) → SCAN (article/source/topic node + wikilink/category edge) → ANALYZE (LLM: entity/claim/source node + cites/contradicts/builds_on edge) → MERGE (dedupe + normalize + layer/tour) → SAVE (validate + meta.json)
- Dispatch
article-analyzer subagent per batch (10-15 articles, up to 3 concurrent)
- Output
knowledge-graph.json with kind: "knowledge" → dashboard uses force-directed layout
- New files:
scripts/knowledge_base_parser.py, scripts/knowledge_graph_merger.py, scripts/agents/article_analyzer.md, scripts/commands/knowledge.py
Acceptance criteria
Relationship to #16
#16 (ADR via manage_adr MCP tool) is a narrower use case of this broader memory system. Once Phase 1 ships, ADR management can be implemented as a memory subdirectory (.codelens/memories/adr/) without a separate MCP tool.
License note
Serena is MIT — memory_manager.py logic can be ported directly. RepoAudit is Purdue Non-Commercial — design influenced, reimplement from scratch. UA is unspecified license — design only.
Summary
Add persistent memory system so AI agents can record and recall project context across sessions. Three complementary layers: Serena-style markdown memory (
mem:references), RepoAudit 3-tier registry (syntactic / semantic / report), and UnderstandAnything wiki/knowledge-base ingestion.Worker consensus (4 reports — complementary, not competing)
update!/CodeLens_vs_Serena_Upgrade_Analysis.mdS5.codelens/memories/(project) +~/.codelens/memories/global/(global).mem:NAMEreference convention with referential integrity check + auto-fix. 6 memory commands. Read-only pattern protects global.project_overview.md,project_structure.md,key_modules.md,conventions.md,suggested_commands.mdon first run.update!/CodeLens_Upgrade_Issues_from_RepoAudit.mdCL-041SyntacticMemory(AST-derived immutable, persisted.codelens/memory/syntactic.pkl),SemanticMemory(per-agent ephemeral, thread-safe),ReportMemory(immutable append-only findings, persisted.codelens/memory/report.json). Refactorregistry.pyto delegate.update!/CodeLens_vs_UnderstandAnything_Upgrade_Analysis.mdU8article-analyzersubagent per batch. Outputknowledge-graph.jsonwithkind: "knowledge".update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.mdE2references/agent-integration.mdwith multi-skill orchestrator pattern, state tracking, resource limits. (Not a feature — just docs.)Proposed phased scope
Phase 1 — Serena-style markdown memory (P2, 2-3 weeks)
.codelens/memories/(project) +~/.codelens/memories/global/(global)/in name maps to subdirectorymem:NAMEreference convention with referential integrity check (codelens memory check)codelens memory check --fix)codelens memory write/read/edit/delete/rename/list/checkread_only_memory_patternsregex) protects global memorymemory_maintenance.mdat first run describing conventionscripts/memories/memory_manager.py,scripts/memories/memory_reference_analysis.py,scripts/commands/memory.pyPhase 2 — Onboarding process (P2, 1-2 weeks, depends on Phase 1)
.codelens/memories/project_overview.md(reusedetect),project_structure.md(reuseoutline),key_modules.md(reuseentrypoints+api-map),conventions.md(reusesmell+complexity),suggested_commands.md--skip-onboardingand--re-onboardflagscodelens initauto-triggers onboardingscripts/onboarding_engine.py,scripts/commands/onboarding.pyPhase 3 — 3-tier registry refactor (P3, 2-3 weeks, optional — large refactor)
registry.py(440 LOC) into 3 tiers:SyntacticMemory— AST-derived immutable facts, persisted.codelens/memory/syntactic.pklSemanticMemory— per-agent ephemeral, thread-safe withthreading.RLock()per-fieldReportMemory— immutable append-only findings, persisted.codelens/memory/report.jsonregistry.pyto delegate toSyntacticMemory(backward-compat API)ast_taint_engine.py,dataflow_engine.py,crossfile_taint_engine.pyto useSemanticMemoryReportMemorycodelens migrate-memoryscript converts old.codelens/registry.jsonto 3-tierscripts/memory/{syntactic,semantic,report}.py,scripts/commands/migrate_memory.pyPhase 4 — Knowledge base wiki ingestion (P3, 3-4 weeks, depends on LLM integration issue)
codelens knowledge [wiki-directory]commandindex.md+ multiple.mdfiles with[[wikilink]]syntaxarticle-analyzersubagent per batch (10-15 articles, up to 3 concurrent)knowledge-graph.jsonwithkind: "knowledge"→ dashboard uses force-directed layoutscripts/knowledge_base_parser.py,scripts/knowledge_graph_merger.py,scripts/agents/article_analyzer.md,scripts/commands/knowledge.pyAcceptance criteria
codelens memory write/read/listworks;mem:NAMEreferences resolvecodelens memory checkdetects broken references;--fixrepairs themcodelens initauto-generates 5 onboarding memory filescodelens knowledgeingests a 50-article wiki and produces validknowledge-graph.jsonRelationship to #16
#16 (ADR via
manage_adrMCP tool) is a narrower use case of this broader memory system. Once Phase 1 ships, ADR management can be implemented as a memory subdirectory (.codelens/memories/adr/) without a separate MCP tool.License note
Serena is MIT —
memory_manager.pylogic can be ported directly. RepoAudit is Purdue Non-Commercial — design influenced, reimplement from scratch. UA is unspecified license — design only.