[FEATURE] Memory & knowledge system — Serena-style markdown memory + 3-tier registry + wiki ingestion

## Summary

Add persistent memory system so AI agents can record and recall project context across sessions. Three complementary layers: Serena-style markdown memory (`mem:` references), RepoAudit 3-tier registry (syntactic / semantic / report), and UnderstandAnything wiki/knowledge-base ingestion.

## Worker consensus (4 reports — complementary, not competing)

| Worker | Source | Contribution |
|---|---|---|
| Serena | `update!/CodeLens_vs_Serena_Upgrade_Analysis.md` S5 | Markdown memory at `.codelens/memories/` (project) + `~/.codelens/memories/global/` (global). `mem:NAME` reference convention with referential integrity check + auto-fix. 6 memory commands. Read-only pattern protects global. |
| Serena | same file S6 | Onboarding process — auto-generate `project_overview.md`, `project_structure.md`, `key_modules.md`, `conventions.md`, `suggested_commands.md` on first run. |
| RepoAudit | `update!/CodeLens_Upgrade_Issues_from_RepoAudit.md` CL-041 | 3-tier memory: `SyntacticMemory` (AST-derived immutable, persisted `.codelens/memory/syntactic.pkl`), `SemanticMemory` (per-agent ephemeral, thread-safe), `ReportMemory` (immutable append-only findings, persisted `.codelens/memory/report.json`). Refactor `registry.py` to delegate. |
| UnderstandAnything | `update!/CodeLens_vs_UnderstandAnything_Upgrade_Analysis.md` U8 | Knowledge base analysis (Karpathy-pattern wiki) — 5-phase pipeline: DETECT → SCAN → ANALYZE (LLM) → MERGE → SAVE. Dispatches `article-analyzer` subagent per batch. Output `knowledge-graph.json` with `kind: "knowledge"`. |
| OpenTaint | `update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.md` E2 | Documentation pattern — update `references/agent-integration.md` with multi-skill orchestrator pattern, state tracking, resource limits. (Not a feature — just docs.) |

## Proposed phased scope

**Phase 1 — Serena-style markdown memory (P2, 2-3 weeks)**
- Memory at `.codelens/memories/` (project) + `~/.codelens/memories/global/` (global)
- Markdown files (human-readable, versionable)
- Topic via `/` in name maps to subdirectory
- `mem:NAME` reference convention with referential integrity check (`codelens memory check`)
- Auto-fix (`codelens memory check --fix`)
- Reference rename propagation
- 6 commands: `codelens memory write/read/edit/delete/rename/list/check`
- Read-only pattern (`read_only_memory_patterns` regex) protects global memory
- Seed `memory_maintenance.md` at first run describing convention
- 6 memory commands exposed as MCP tools
- New files: `scripts/memories/memory_manager.py`, `scripts/memories/memory_reference_analysis.py`, `scripts/commands/memory.py`

**Phase 2 — Onboarding process (P2, 1-2 weeks, depends on Phase 1)**
- Detect first-time run via empty `.codelens/memories/`
- Auto-generate: `project_overview.md` (reuse `detect`), `project_structure.md` (reuse `outline`), `key_modules.md` (reuse `entrypoints` + `api-map`), `conventions.md` (reuse `smell` + `complexity`), `suggested_commands.md`
- `--skip-onboarding` and `--re-onboard` flags
- `codelens init` auto-triggers onboarding
- New file: `scripts/onboarding_engine.py`, `scripts/commands/onboarding.py`

**Phase 3 — 3-tier registry refactor (P3, 2-3 weeks, optional — large refactor)**
- Split monolithic `registry.py` (440 LOC) into 3 tiers:
  - `SyntacticMemory` — AST-derived immutable facts, persisted `.codelens/memory/syntactic.pkl`
  - `SemanticMemory` — per-agent ephemeral, thread-safe with `threading.RLock()` per-field
  - `ReportMemory` — immutable append-only findings, persisted `.codelens/memory/report.json`
- Refactor `registry.py` to delegate to `SyntacticMemory` (backward-compat API)
- Refactor `ast_taint_engine.py`, `dataflow_engine.py`, `crossfile_taint_engine.py` to use `SemanticMemory`
- Refactor output formatters to consume `ReportMemory`
- `codelens migrate-memory` script converts old `.codelens/registry.json` to 3-tier
- New files: `scripts/memory/{syntactic,semantic,report}.py`, `scripts/commands/migrate_memory.py`

**Phase 4 — Knowledge base wiki ingestion (P3, 3-4 weeks, depends on LLM integration issue)**
- `codelens knowledge [wiki-directory]` command
- Detect `index.md` + multiple `.md` files with `[[wikilink]]` syntax
- 5-phase pipeline: DETECT (format) → SCAN (article/source/topic node + wikilink/category edge) → ANALYZE (LLM: entity/claim/source node + cites/contradicts/builds_on edge) → MERGE (dedupe + normalize + layer/tour) → SAVE (validate + meta.json)
- Dispatch `article-analyzer` subagent per batch (10-15 articles, up to 3 concurrent)
- Output `knowledge-graph.json` with `kind: "knowledge"` → dashboard uses force-directed layout
- New files: `scripts/knowledge_base_parser.py`, `scripts/knowledge_graph_merger.py`, `scripts/agents/article_analyzer.md`, `scripts/commands/knowledge.py`

## Acceptance criteria

- [ ] Phase 1: `codelens memory write/read/list` works; `mem:NAME` references resolve
- [ ] Phase 1: `codelens memory check` detects broken references; `--fix` repairs them
- [ ] Phase 2: first-time `codelens init` auto-generates 5 onboarding memory files
- [ ] Phase 3: 3-tier registry passes existing test suite (no regression)
- [ ] Phase 4: `codelens knowledge` ingests a 50-article wiki and produces valid `knowledge-graph.json`

## Relationship to #16

#16 (ADR via `manage_adr` MCP tool) is a narrower use case of this broader memory system. Once Phase 1 ships, ADR management can be implemented as a memory subdirectory (`.codelens/memories/adr/`) without a separate MCP tool.

## License note

Serena is MIT — `memory_manager.py` logic can be ported directly. RepoAudit is Purdue Non-Commercial — design influenced, reimplement from scratch. UA is unspecified license — design only.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Memory & knowledge system — Serena-style markdown memory + 3-tier registry + wiki ingestion #60

Summary

Worker consensus (4 reports — complementary, not competing)

Proposed phased scope

Acceptance criteria

Relationship to #16

License note

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Worker	Source	Contribution
Serena	`update!/CodeLens_vs_Serena_Upgrade_Analysis.md` S5	Markdown memory at `.codelens/memories/` (project) + `~/.codelens/memories/global/` (global). `mem:NAME` reference convention with referential integrity check + auto-fix. 6 memory commands. Read-only pattern protects global.
Serena	same file S6	Onboarding process — auto-generate `project_overview.md`, `project_structure.md`, `key_modules.md`, `conventions.md`, `suggested_commands.md` on first run.
RepoAudit	`update!/CodeLens_Upgrade_Issues_from_RepoAudit.md` CL-041	3-tier memory: `SyntacticMemory` (AST-derived immutable, persisted `.codelens/memory/syntactic.pkl`), `SemanticMemory` (per-agent ephemeral, thread-safe), `ReportMemory` (immutable append-only findings, persisted `.codelens/memory/report.json`). Refactor `registry.py` to delegate.
UnderstandAnything	`update!/CodeLens_vs_UnderstandAnything_Upgrade_Analysis.md` U8	Knowledge base analysis (Karpathy-pattern wiki) — 5-phase pipeline: DETECT → SCAN → ANALYZE (LLM) → MERGE → SAVE. Dispatches `article-analyzer` subagent per batch. Output `knowledge-graph.json` with `kind: "knowledge"`.
OpenTaint	`update!/CodeLens_vs_OpenTaint_Upgrade_Analysis.md` E2	Documentation pattern — update `references/agent-integration.md` with multi-skill orchestrator pattern, state tracking, resource limits. (Not a feature — just docs.)

[FEATURE] Memory & knowledge system — Serena-style markdown memory + 3-tier registry + wiki ingestion #60

Description

Summary

Worker consensus (4 reports — complementary, not competing)

Proposed phased scope

Acceptance criteria

Relationship to #16

License note

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions