Skip to content

GottZ/ctx

Repository files navigation

ctx — The memory your LLM pretends to have.

Knowledge store with weighted 4-way RRF retrieval, multi-tenant scope isolation, multi-dimensional cyclic temporal gravity, and autonomous cross-referencing. Built for AI workflows that need to remember.

Release Go License PostgreSQL

What it does

ctx gives your LLM a persistent, searchable memory. Store knowledge blocks, query them with hybrid retrieval (semantic + bilingual fulltext + trigram), then rerank with multi-dimensional cyclic gravity — each temporal cycle (weekday, month, quarter, week, monthday, seasonal, daily) scored as its own Gaussian field. Queries like "immer dienstags" or "Weihnachten" activate specific dimensions; "Meeting am Dienstag, Ergebnis am Mittwoch" still pulls the Wednesday block (just weaker).

Multiple anchors per block: every block carries dimensions from both its content (dates mentioned in text) AND its created_at timestamp. A block about "Meeting am Dienstag" written on a Friday gets weekday=2 (content anchor) AND weekday=5 (meta anchor). Both signals contribute independently — "immer dienstags" queries find the content anchor; "Freitags-Arbeit" finds the meta anchor. Same principle for monthday, seasonal, daily, etc.

Dream Mode runs as a continuous background loop — autonomously discovering relationships between blocks, marking outdated information, and promoting high-quality content. Supports a separate model for evaluation (e.g. a larger model for better causal/supersedes reasoning). Parallel workers (CTX_DREAM_PARALLELISM, default 1) with atomic FOR UPDATE SKIP LOCKED block-claim — safe under contention. Your knowledge base grows, self-organizes, and stays current.

How LLMs use ctx

ctx is designed to be the persistent memory layer for LLM agents. Five primitives, composable:

Use case Tool When
Retrieve prior knowledge before answering ctx query "question" Whenever the answer might depend on past sessions, project state, or stored decisions
Persist a new finding ctx save <category> <title> - <content> After non-obvious discoveries, architecture decisions, resolved bugs, config changes
Update an existing block ctx save with same <category> <title> category+title is upsert key — re-saving replaces
Browse without LLM cost ctx search [category] [query:text] Listing, sanity-checking, lightweight lookups
Inspect a specific block ctx get <block-id> Following an id from query sources or another block

Categories (semantic, not enforced)

infrastructure, decisions, projects, reference, learnings, agent-briefing, index. Pick by intent: one fact per block, precise title, tags for cross-cutting. ~1-1.5k chars max — split, don't grow.

Access paths (in order of preference for LLM agents)

  1. MCPclaude.ai ctx server (Streamable HTTP transport). Tools: query, store, search, get, recent. JSON-schemas, no shell-quoting. Use this in Claude Code / claude.ai sessions.
  2. CLI/usr/local/bin/ctx — shell pipelines, cron, scripts. Config in ~/.config/ctx/config.
  3. HTTPPOST /api/{query,store,search,manage} direct — fallback when MCP/CLI unavailable.

Multi-Tenant Architecture

scope column on context_blocks (private | work | shared | additional tenant scopes), enforced via API-key home_scope. Each LLM/tenant key sees:

  • All blocks in its own scope
  • All blocks in shared (cross-tenant knowledge layer)
  • Nothing from other tenants' private scopes

API-key provisioning (v2.0.0+): ctx keys create <label> --home <scope>--home is required, no implicit default. Scope names starting with _ are rejected (the underscore namespace is system-reserved; _global anchors the server-global settings identity in context_settings).

Admin tier (BREAKING, migration 052)

Keys carry an is_admin flag (default false, no key is auto-promoted). The following /api/manage actions now require an admin key — BREAKING for previously-working non-admin keys: api-key-create, api-key-list, api-key-delete, mcp-client-create, mcp-client-list, mcp-client-delete, and dream-mode when mutating (reading the current mode stays open). Rationale: before this gate, ANY valid key of any home_scope could mint keys for arbitrary scopes — read access to foreign tenants — and the upcoming settings/secrets API must not inherit that model.

Admin bootstrap (one-time, host access required). Promote by id, never by label — label has no UNIQUE constraint and an UPDATE by label would escalate every same-named key, including inactive ones:

# 1. Inspect candidates:
docker exec -e PGPASSWORD="$CONTEXT_DB_PASSWORD" n8n-db-1 \
  psql -U "$CONTEXT_DB_USER" -d "$CONTEXT_DB" \
  -c "SELECT id, label, active, home_scope, is_admin FROM context_api_keys;"
# 2. Promote EXACTLY one key by id:
docker exec -e PGPASSWORD="$CONTEXT_DB_PASSWORD" n8n-db-1 \
  psql -U "$CONTEXT_DB_USER" -d "$CONTEXT_DB" \
  -c "UPDATE context_api_keys SET is_admin = true WHERE id = '<uuid>';"

Admin-key hygiene: the OAuth/MCP flow hands the API key ITSELF out as the bearer token — a key used as an MCP remote token circulates through claude.ai/Cloudflare and is stored in external connector storage. Create a dedicated admin key that is never used as an MCP/OAuth token; the claude.ai MCP key stays non-admin. Test/eval script keys stay non-admin too (least privilege).

Sealed secrets & break-glass

Provider credentials live AES-256-GCM-sealed in context_secrets (encrypted in Go — never via pgcrypto, the master key must not cross the SQL wire). The AAD binds each ciphertext to its name+scope row identity, so a ciphertext copied onto another row fails authentication. The secrets API/CLI waves activate the write paths; the crypto, the decrypt mode and the host script ship first so the recovery path exists before the first secret does.

Master key setup (one-time):

# generate and append to .env:
echo "CTX_SECRETS_KEY=$(openssl rand -hex 32)" >> .env

Mandatory: copy CTX_SECRETS_KEY into your password manager when you set it. backup.sh archives only the pg_dumps — the ciphertexts are in every dump, the master key is in none (deliberate: the key stays spatially separated from the ciphertexts it opens, so disaster recovery needs both places). Key loss = total loss of all sealed secrets, by design. No recovery mechanism; re-enter the provider keys instead.

Master-key rotation: generate a new key, move the old value to CTX_SECRETS_KEY_PREV, put the new one in CTX_SECRETS_KEY, restart ctx. The boot sweep (settings bootstrap wave) re-seals every secret it can open with the previous key (key_version bump, log line per name); afterwards remove CTX_SECRETS_KEY_PREV from .env. Secrets that open with neither key are left untouched (WARN per name, no boot abort).

Break-glass extraction (host access; works even when the ctx container crash-loops — the decrypt mode reads ONLY env + stdin, no DB):

./break-glass.sh secret <name> [scope]     # prints the plaintext
./break-glass.sh reset-settings [key]      # factory-reset settings overrides (audited via DB trigger)

openssl enc cannot do AES-GCM, so extraction pipes the row through the ctxd binary itself: psql -At … | docker run --rm -i -e CTX_SECRETS_KEY -e CTX_SECRETS_KEY_PREV n8n-ctx -secret-decrypt. PostgreSQL's encode(bytea,'base64') is MIME (RFC 2045) and wraps every 76 chars — the script strips the wraps SQL-side, and the decrypt mode additionally reads stdin to EOF and strips CR/LF, so every realistic provider-key length survives the pipe (negatively probed: a line-based reader fails on exactly those records).

Using ctx effectively

Installing ctx gives an agent memory. Using it well takes discipline — because a memory shared across sessions has a failure mode a single chat doesn't: drift.

Why stored memory drifts

Each time an LLM reads a note and re-saves or summarizes it, it re-interprets it through its own training biases. That isn't random noise — it's a directional filter that pushes the same way every pass: more conservative, more absolute, less attributed. Observations harden into recommendations, recommendations into rules, rules into dogma — and the certainty becomes untraceable.

A stored block is also a point-in-time observation, not live state. A note that was true when written ("we migrated off X") can stay true and still drive a wrong action (deleting X's still-running sibling service) — because the scope shifted and the note never said so. The note tells you where to look, not what's true right now.

Discipline — put this in your agent's instructions

  • Load conventions into context before working — don't just file them away. Effectiveness ranks training-weights > file-instructions > in-context anchors: only an anchor in the current context reliably overrides a trained default. A discipline doc that's never loaded gets silently re-undermined by each new session. (ctx query your project conventions at session start.)
  • Trace every stored claim to a source. Save quote + date; keep verified user statements separate from your own interpretation. An interpretation re-saved as fact is how a "probably" disappears across three persistence layers.
  • Cross-check stored claims against live state before acting. Before a destructive or status-dependent step, verify against the authoritative source — live config, a test, the actual file — not the note.
  • Don't gate on self-reported confidence. Models are often just as sure when wrong. Gate on external truth: a test, the source, observed behavior.
  • Prefer external signals over self-reminders. Naming a failure mode as a rule ("don't forget the tests") tends to re-evoke it; build a check instead — a test script, a grep on the output, a verifier against the raw data.

Calibration

LLM defaults are tuned for a median user who must be protected from uninformed decisions. For an experienced operator with a defined target, the same training produces systematic distortion: judging against the current state instead of the target ("good enough for now"), preferring the familiar over the better option, asking permission on obvious next steps while making user-facing decisions unprompted, and presenting trained caution as judgement ("that's overkill") with no concrete risk named.

Compensating it is a one-time setup the agent should drive:

  1. Store the calibration as a block. Have the agent write your conventions and observed failure modes into ctx — a dedicated "RLHF warnings" block is a good seed — so every future session can retrieve them instead of relearning them.
  2. Point your durable instructions at that block. Your platform's personal-preference / custom-instruction field, or a project-level instruction file, should reference it. This is the step the agent should prompt you to do — it's the one layer the agent can't write for itself, and without it the block just sits there unread.
  3. Each session loads the anchor. The durable instruction tells the agent to ctx query that block before working, so the calibration lands in context — the only layer that reliably overrides a trained default — instead of staying filed away.

State the desired behavior rather than the unwanted one (naming the bad behavior re-evokes it). This isn't about disabling safety — it's about re-aiming a calibration meant for someone else, and keeping that aim across sessions.

Quick Install

# Binary (Linux/macOS/Windows)
curl -fsSL https://github.com/GottZ/ctx/releases/latest/download/ctx-$(uname -s | tr A-Z a-z)-$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/') -o /usr/local/bin/ctx && chmod +x /usr/local/bin/ctx

# Or with Go
go install github.com/GottZ/ctx/cmd/ctx@latest

Setup

1. Configure endpoint

# Linux/macOS
mkdir -p ~/.config/ctx
cat > ~/.config/ctx/config << 'EOF'
CTX_BASE_URL=https://your-ctx-host.example
CTX_KEY=your-api-key-here
EOF
Windows (PowerShell)
New-Item -ItemType Directory -Force "$env:APPDATA\ctx"
@"
CTX_BASE_URL=https://your-ctx-host.example
CTX_KEY=your-api-key-here
"@ | Set-Content "$env:APPDATA\ctx\config"

2. Verify

ctx health    # DB + Ollama connectivity
ctx stats     # Block count, categories, storage

3. Claude Code integration (optional)

Statusline — live block count, health, and rate limits:

{ "statusLine": { "type": "command", "command": "ctx statusline" } }

Slash commands — add to ~/.claude/settings.json:

{
  "customSlashCommands": [
    { "name": "ctx",        "command": "ctx query \"$PROMPT\"" },
    { "name": "ctx-save",   "command": "ctx save $PROMPT" },
    { "name": "ctx-browse", "command": "ctx search $PROMPT" },
    { "name": "ctx-stats",  "command": "ctx stats" }
  ]
}

Agent hooks — automatic project briefing for subagents:

{
  "hooks": {
    "SubagentStart": [{ "hooks": [{ "type": "command", "command": "ctx brief --hook" }] }],
    "SubagentStop":  [{ "hooks": [{ "type": "command", "command": "ctx persist --hook" }] }]
  }
}

CLI

Command Description
ctx query question Hybrid search + LLM synthesis (formatted, --json for raw)
ctx save <cat> <title> - <content> Upsert knowledge block
ctx save --tag tag1,tag2 <cat> <title> Upsert with tags
ctx search [category] [query:text] Compact search (no LLM)
ctx get <id> Fetch full block
ctx delete <id> Soft-delete (archive)
ctx categories List all categories
ctx stats Database statistics + Dream backlog (dream_queue: pickable/cooldown/incoming-forecast)
ctx health Healthcheck
ctx guard [list|stats|resolve] Write Guard management
ctx dream [stats|review] Dream Mode stats — mode, queue (backlog + incoming forecast), backoff (per-eval-count maturity distribution: how far each block has cooled off + effective cooldown); human-readable on a TTY, JSON when piped + link review
ctx dream enable|disable|throttle Runtime dream mode control (on/off/throttled)
ctx brief Project briefing from store
ctx persist Persist [PERSIST:cat:title] markers
ctx ingest <path> Ingest Obsidian vault
ctx digest Rebuild topic map
ctx statusline Claude Code status bar
ctx mcp [add|list|delete] Manage MCP OAuth client registrations
ctx keys create <label> --home <scope> Provision API key (v2.0.0: --home required, no default scope; admin key required since 052)
ctx keys [list|delete] List / revoke provisioned API keys (admin key required since 052)
ctx version Print version

Architecture

Query ──► Parse Temporal ──► Embed ──► 4-Way RRF ──► Gravity Boost ──► Graph Expand ──► filterSuperseded ──► LLM Synthesis
          │                            ├─ Semantic (0.45)    │
          │                            ├─ EN-FTS   (0.25)    ├─ Linear (Power-Law, content_times)
          │                            ├─ DE-FTS   (0.20)    └─ Cyclic (Gaussian, EAV dimensions)
          │                            └─ Trigram  (0.10)       ├─ weekday σ=0.07  ┌─────────────────────────────┐
          │                                                     ├─ month   σ=0.10  │  Dream Mode (continuous)     │
          └─► DimensionWeights                                  ├─ quarter σ=0.12  │  N workers (PARALLELISM=N)   │
              {weekday:1.0}  "immer dienstags"                  ├─ week    σ=0.08  │  atomic claim (SKIP LOCKED)  │
              {month:0.4, seasonal:0.6}  "Weihnachten"          ├─ monthday σ=0.10 │  Pick → Keywords → RRF       │
              {monthday:1.0}  "Monatsanfang"                    ├─ seasonal σ=0.08 │  → LLM Eval → Links          │
              {daily:1.0}    "morgens"                          └─ daily   σ=0.08  │  → ApplySupersedes           │
                                                                                   │  → PromoteToCanonical        │
                                                                                   └─────────────────────────────┘

Store ──► Extract Times ──► Hash NOOP ──────────────► Guard (async, 60s)
          (content + created_at)          │           ├─ ≥0.98: auto-archive
          │                               │           ├─ 0.92-0.98: flag needs_review
          │                               │           └─ <0.92: clean
          │                               └─► Embed (async, scheduler backfill, tx-wrapped)
          └─► Dimensions = Union(content anchors ∪ meta anchor)
              • Content: dates mentioned in text (semantic)
              • Meta: created_at timestamp (every block, always)
              • ON CONFLICT dedups overlapping timestamps

Stack: Go 1.26, PostgreSQL 18 + pgvector 0.8.2, 52 SQL migrations. Dual-protocol inference (Ollama native or OpenAI-compatible) via any provider — per-pipeline configurable via CTX_*_PROTOCOL, CTX_EMBED_*, CTX_CHAT_*, CTX_DREAM_* env vars.

Key environment variables

Runtime overrides on top of env are provisioned in the DB (context_settings + sealed context_secrets, migration 051, with a trigger-fed audit trail in context_settings_audit) — the settings API/CLI waves activate them; until then env remains the only live source.

Var Default Purpose
CTX_BASE_URL / CTX_KEY CLI client config (~/.config/ctx/config)
CONTEXT_DB / CONTEXT_DB_USER / CONTEXT_DB_PASSWORD Database (separate from inference)
CTX_SECRETS_KEY / CTX_SECRETS_KEY_PREV Master key for AES-256-GCM-sealed context_secrets (64 hex chars, openssl rand -hex 32); _PREV only while a rotation sweep is pending. Env-only by design — copy into your password manager, key loss = total loss (see Sealed secrets & break-glass)
CTX_EMBED_HOST / _PROTOCOL / _MODEL / _DIMS ollama / – / 1024 Embedding pipeline (e.g. qwen3-embedding:8b)
CTX_CHAT_HOST / _PROTOCOL / _MODEL / _THINK / _NUM_CTX ollama / – / false / 0 Generator pipeline (RRF synthesis); _NUM_CTX (0=model default) applies to all chat-model calls (translate / temporal-fallback / rerank / synthesis) — set equal to the dream _NUM_CTX to share a single Ollama runner
CTX_CHAT_FALLBACK_HOST / _PROTOCOL / _API_KEY / _TIMEOUT empty (off) / openai / – / 420 Emergency chat backend for query-path synthesis only, engaged when the primary is unreachable at transport level (host down, connection died) — never on HTTP errors or slow responses. _TIMEOUT in seconds, sized for CPU inference (27B ≈ 4.5–5.5 min/answer; the body heartbeat keeps proxies alive). See the llama-cpu compose service. Translate stays fail-open, dream waits for its scheduler retry
CTX_DREAM_ENABLED false Toggle continuous Dream loop
CTX_DREAM_PARALLELISM 1 Concurrent Dream workers — race-safe via atomic claim
CTX_DREAM_HOST / _PROTOCOL / _MODEL / _NUM_CTX inherits chat Separate Dream model (e.g. larger, slower)
CTX_DREAM_EMBED_* inherits embed Separate embedding endpoint for Dream (e.g. CPU sidecar)
CTX_DREAM_IDLE_WAIT 20 (s) Backoff when no pending blocks
CTX_DREAM_BACKOFF_MODE / _FACTOR / _MIN / _GRACE / _CAP / _INERT_OFFSET exp / 1.6 / 12h / 0 / 45d / 7 Re-dream back-off by eval count (exp/log/linear/off). Cooldown grows from MIN (n=0) to CAP: fresh blocks re-dream sub-day to catch new links, mature blocks back off to the cap. _MIN/_CAP take a duration with a unit suffix — h hours, d days, w weeks, m months (30d), y years (365d), e.g. 12h, 45d, 1w (bare number = hours). _INERT_OFFSET starts a no-links cycle further up the curve
CTX_PROMPT_VERSION v5.2 Generator-prompt version (v5.2 default, v6 opt-in graded confidence)
CTX_TIMEZONE Europe/Berlin Cyclic-temporal phase calculation
CTX_CONFIDENT_THRESHOLD 0.008 Generator-side refusal threshold (RRF score below → "I don't know")
CTX_READ_SCOPES scope-derived API key's effective read-scope set (v2.0.0+ scheduler config)
CTX_GRAPH_EXPAND_ENABLED / _* true Query-time Dream-graph traversal (Wave 1): 1-hop confidence/type-gated expansion of inferred links, fused post-gravity / pre-rerank. Default-on since Wave 3 (only arm that moves the recall ceiling, ~0s; magnitude partly circular vs the link-derived eval gold). Fail-open. Knobs: _DIRECTED / _HOP_DEPTH / _SEED_COUNT / _SEED_SCORE_FLOOR / _PER_SEED_CAP / _MAX_INJECTED / _MIN_CONFIDENCE(_RECURRENT) / _BOOST_WEIGHT / _HUB_DAMPING / _WEIGHT_{TOPICAL,FACTUAL,CAUSAL,RECURRENT} / _NEW_PLACEMENT_FRAC
CTX_RERANK_ENABLED / _HOST / _* true Post-RRF rerank (fail-open). Default-on since Wave 3.5: the surface-gold counter-probe (judge-annotated real-user queries) showed the cross-encoder is where it earns its keep (nDCG@10 +0.164, MRR +0.169) while blend 0.5 keeps it neutral on latent gold — graph+ce-bw0.5 is the best arm on both gold sets; the ~80-90s query path stays proxy-safe via the body heartbeat. _HOST empty → LLM-as-judge on the chat model; default http://ctx-rerank:8082 → local bge-reranker-v2-m3 cross-encoder sidecar (Wave 2, cohere-style /v1/rerank, all-local/$0). Knobs: _MODEL / _MAX_DOCS (default 50; CPU ≈1s/doc, latency not gated) / _BLEND_WEIGHT (default 0.5; 1.0 = pure cross-encoder, lower mixes RRF back in — Wave-3: pure hurts on latent-relevance gold and is destructive as final arbiter over graph neighbors) / _API_KEY. See docker-compose.yml for the sidecar service.

Boot-time validation & config dump

ctxd parses all CTX_*/CONTEXT_* env vars through a typed registry (internal/config) and logs one config: effective record at startup: every setting with its origin (env or default — a var you set in the shell but forgot to declare in compose shows up as default), secrets masked (api_keys render a short sha256 fingerprint so key rotation is provable from logs without leaking the value; the DB password renders presence-only).

Invalid configurations abort the boot after logging every finding with field + reason — fix the named fields in .env and restart. Beyond the long-standing fatal parses (malformed ints, unknown timezone, missing DB password), these previously-booting-but-broken-at-runtime states are now startup errors: unknown _PROTOCOL values (used to silently select the Ollama wire path → 404 on llama.cpp), malformed host URLs / trailing slashes / embedded user:pass@ credentials (use _API_KEY instead), CTX_SCORE_THRESHOLD above CTX_CONFIDENT_THRESHOLD, out-of-range knobs (_BLEND_WEIGHT outside [0,1], negative rate limits), and cross-host credential inheritance in the CTX_DREAM_EMBED_* fallback chain. Malformed values on tolerant knobs keep their defaults as before, but now log a WARN instead of failing silently.

Key features:

  • GottZ 4-Way RRF — reciprocal rank fusion across semantic, bilingual fulltext, and trigram channels; block_role-aware (4-class enum: system-meta hard-excluded incl. digest-generated topic-maps via Welle-44 hook, audit-trail/reference/knowledge full-pass — uniform damping shown ineffective in Welle 40, query-aware damping pending Folge-Welle 41+)
  • GottZ Scope Model — multi-tenant isolation (private/work/shared) via API key scoping
  • GottZ Guard — async deduplication via PG LISTEN/NOTIFY + HNSW similarity
  • GottZ Cyclic Phase Model — 7 cyclic temporal dimensions (weekday/month/quarter/week/monthday/seasonal/daily) with normalized phase [0,1) and per-dimension Gaussian decay. Queries route to dimensions via parser (18-matcher deterministic engine). Timezone-aware via CTX_TIMEZONE.
  • Forward Telescoping — older blocks get a wider linear gravity well (effective power scaled by 1 / (1 + 0.3·ln(1+age/30))) so a 6-month-old block isn't drowned out by a 1-week-old block when the user asks about a date in that window. Future dates keep their 1.2× sharper cutoff. Matches Rubin & Baddeley 1989's age-dependent recall imprecision.
  • GottZ Temporal Dimension Table — EAV storage with partial B-Tree indexes, O(log n) dimension lookups at 1M+ scale. Every block carries multiple anchors: content-mentioned times (semantic) + created_at (meta) as independent signals.
  • Dream Mode — continuous autonomous cross-referencing with dual-model support (v5 prompt for qwen3.6:27b non-thinking sampler, dream pipeline version 5 with recurrent relationship class detected via context_temporal+title-similarity Phase 1 + LLM Phase 2), adaptive cooldown, supersedes detection, temporal validation, hard-cap of 5 links per cycle with type-diversity tie-break, replace-semantics with snapshot revert, and runtime mode control (on/throttled/off via API). Throttled mode pauses between GPU-intensive steps for thermal management. Parallel workers (CTX_DREAM_PARALLELISM, default 1) using atomic FOR UPDATE SKIP LOCKED block-claim — race-condition-safe under contention. Robust LLM-output parsing: tolerates array-form, single-object, fenced-array, and compact-multi-key-object link formats from heterogeneous LLM outputs. Config: CTX_DREAM_IDLE_WAIT (seconds, default 20)
  • Supersedes Filtering — temporal-gated removal of outdated blocks from query results
  • Dream-Graph Traversal (Wave 1, default-on since Wave 3, CTX_GRAPH_EXPAND_ENABLED) — query-time 1-hop expansion of the Dream-inferred link graph (topical/factual/causal/recurrent), confidence/type-gated + hub-damped, fused as a scale-invariant post-gravity boost before rerank. Turns the inferred links into positive recall instead of write-only metadata; fully parameterized for A/B sweeps, fail-open
  • Transport Retry — all inference HTTP calls (chat ollama/openai, embed, rerank) retry exactly once on transient transport failures (connection reset / EOF before any response bytes) via internal/httpx. Covers the keep-alive race with llama.cpp's cpp-httplib servers (~5s idle close vs Go connection reuse); HTTP status errors and context deadlines are never retried. Inference POSTs are stateless, so a replay is safe
  • CPU Synthesis Fallback — when the primary chat backend is unreachable at transport level, query-path synthesis replays the identical prompt against CTX_CHAT_FALLBACK_HOST (the llama-cpu sidecar: same GGUF, CPU speed) with its own long timeout. "Es sollte immer ein Weg zu finden sein" — answers degrade to minutes, never to errors
  • Streaming Tool-Call Wire (llm.ChatStream) — streaming OpenAI-compatible chat with function calling, the wire layer for the upcoming web-chat harness (no consumer yet). Multi-turn message arrays, per-delta events, index-keyed tool-call assembly, arguments normalisation (llama.cpp JSON-string fragments and whole-object form yield identical calls), hardened against OpenRouter SSE comment frames and mid-stream error events inside HTTP-200 streams; usage falls back to llama.cpp timings incl. MTP draft-acceptance
  • Embed Cache — content-hash-keyed embedding cache (context_embed_cache) to avoid re-embedding identical text across pipelines
  • LLM Log — per-call request/response capture (context_llm_log) with input/output token counts (Ollama + OpenAI), dream-pipeline version tagging, and parse-format drift tagging (metadata.parse_format: array | object | fenced-array | fenced-object) for pipeline debugging + offline benchmark replay
  • MCP Remote — Streamable HTTP transport with OAuth 2.1 PKCE for claude.ai/Claude Code integration. Tools: query, store, search, get, recent. Client registration via ctx mcp add. Tool handlers return Content[].text (no structured output) — tested in test.sh T17/T18

API

All endpoints under /api/*. Auth via X-Context-Key header or Authorization: Bearer token.

Endpoint Description
POST /api/query 4-Way RRF + LLM synthesis (auto-backfills pending embeddings; optional categories_exclude / block_roles_exclude arrays filter slot-stealers). With the cross-encoder reranker engaged (~80s/query) the response commits 200 up front and streams a whitespace keepalive every 25s so buffering reverse proxies don't hit their read timeout; the body stays valid JSON (leading whitespace, RFC 8259) and a late synthesis failure reports success:false inside the 200 body
POST /api/store Upsert (embedding async via scheduler)
POST /api/search Lightweight search (no LLM)
GET /api/graph/ego Scope-filtered k-hop ego subgraph over dream links (read-only, no LLM — see Graph API)
POST /api/manage CRUD, Guard API, stats, API-key management (api-key-create requires home_scope; key/MCP-client management and mutating dream-mode require an admin key since 052 — see Admin tier)
POST /api/digest Topic map generation
POST /api/ingest Obsidian vault ingestion
POST /api/blob/* Binary storage (store/fetch/search/manage)
GET /health DB + Ollama connectivity
POST|GET|DELETE /mcp MCP Streamable HTTP (remote tool server)
GET /authorize OAuth 2.1 authorization (PKCE)
POST /token OAuth 2.1 token exchange
GET / (unregistered paths) Embedded admin SPA (Svelte 5 + Vite, served from the binary). History-API fallback answers HTML navigations (Accept: text/html) only — mistyped API URLs stay 404 for JSON clients. Hashed /assets/* are immutable-cached and pre-compressed (.br/.gz); binaries built without the frontend (plain go install) serve a 503 placeholder while all APIs stay functional — the Docker image is the channel that ships the real UI

Graph API

GET /api/graph/ego?block=<uuid> returns the k-hop ego subgraph of a focus block over the dream-link graph — the server side of the graph viewer. Designed for 1M+ blocks: the server only ever ships budgeted subgraphs, never the full graph.

GET /api/graph/ego?block=<uuid>&hops=2&per_node_cap=25&limit=500
                  &min_confidence=0.5&link_class=topical,causal
                  &category=learnings&created_after=2026-01-01T00:00:00Z
                  &edge_limit=4000
Param Default Range Meaning
block — (required) full UUID focus node (hop 0)
hops 1 1–3 BFS depth
per_node_cap 25 1–100 top-N edges per frontier node by raw_confidence — slots count only visible, filter-passing edges
limit 500 1–5000 total node budget (truncation: closer hop wins, then higher confidence, then id)
min_confidence 0 0–1 gate on weighted confidence (traversal + displayed edges)
link_class all 5 topical,factual,causal,recurrent,supersedes supersedes is display-only, never traversed
category all CSV filter on neighbor blocks (focus always included)
created_after / created_before open RFC3339 window on neighbor created_at
edge_limit 4000 1–20000 budget for edges within the node set, strongest first

Out-of-range values are a 400, never silently clamped. Response: nodes (id, title capped at 120 chars, category, scope, visible degree — capped at 201, rendered "200+" — and hop), edges as compact index tuples [srcIdx, dstIdx, relIdx, confidence] into nodes/rels, and stats (nodes, edges, truncated, elapsed_ms). The payload never contains block content (load it lazily via manage get).

Security semantics: the visibility triple (not archived, not system-meta, scope readable by the key) is applied inside every hop and inside the per-node cap legs — a node reachable only through a foreign private bridge is never delivered, and invisible edges never consume cap slots. degree counts only visible neighbors (scan budget 1000 raw edges/direction). "Does not exist" and "not visible" answer with an identical 404 (no existence oracle), and only successful calls write an access-log row (action='graph', block_id=NULL — graph browsing never feeds access-count ranking).

Building

go build -o ctx ./cmd/ctx/           # CLI
go build -o ctxd ./cmd/ctxd/         # Daemon
go test ./... -short                  # Unit tests

Web UI (Svelte 5 + TypeScript + Vite, Bun)

The admin SPA lives in go/web/ and is embedded into the ctxd binary via go:embed. The Docker image builds it in its own stage (oven/bun:1.3-alpine, bun install --frozen-lockfile, svelte-check gate) — docker compose build ctx is the channel that ships the real UI. Plain go build / go install .../cmd/ctxd need no Bun and produce a binary that serves a 503 placeholder instead of the UI; the CLI (cmd/ctx) never depends on the frontend at all.

cd go/web
bun install                           # once; bun.lock is committed
bun run dev                           # Vite on :5173, proxies /api → ctxd
bun run check && bun run build        # typecheck + production build into dist/

The dev proxy targets http://localhost:8080; the compose ctx service publishes no ports by default — add a local port mapping (see docker-compose.override.yml.example) and override with CTX_DEV_PROXY=http://127.0.0.1:<port> if you map a different port.

License

MPL-2.0 — By GottZ

About

The memory your LLM pretends to have.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors