Self-hosted memory for AI agents. Extracts facts, detects contradictions, builds user profiles. Bring your own LLM. Your data never leaves your server.
Why the name? Named after Grey Matter from Ben 10 — the tiniest alien in the universe, but the smartest being in existence. Also a nod to grey matter in the brain, where intelligence actually lives. Small footprint. Quietly powerful.
Every AI agent forgets everything when the conversation ends.
The obvious fix is memory. But every solution — Supermemory, Mem0 — stores your data on their cloud. You're trading one problem for another.
greymemory runs entirely on your server:
Your data → your machine → your LLM → stays with you. Always.
Hospitals, banks, factories, defence — entire industries are locked out of AI memory because every solution requires trusting a third party with their most sensitive data. greymemory is built for them.
Tested against LongMemEval — the standard benchmark for long-term memory systems. 90 questions across 6 categories, compared against Supermemory (funded startup, cloud infrastructure).
| Category | greymemory | Supermemory | Gap |
|---|---|---|---|
| single-session-user | 93.3% | 97.1% | -3.8% |
| single-session-assistant | 93.3% | 96.4% | -3.1% |
| knowledge-update | 80.0% | 88.5% | -8.5% |
| temporal-reasoning | 73.3% | 76.7% | -3.4% |
| single-session-preference | 66.7% | 70.0% | -3.3% |
| multi-session | 66.7% | 71.4% | -4.7% |
| Overall | 80.0% | 83.4% | -3.4% |
80% of a funded startup's accuracy. Zero cloud dependency. $0.013 per session ingestion cost. SQLite on your own machine.
- Benchmarked — LongMemEval integration with reproducible benchmark runner. 6 categories, 15 questions each, automated scoring.
- Temporal reasoning — pre-computed timeline injection extracts
event_datevalues, sorts chronologically, and injects into the answering context. Improved temporal-reasoning by 13.3% (60% → 73.3%). - State change detection — new extraction rule catches casual mid-sentence updates to quantities, frequencies, locations, and durations that were previously missed.
- Chunk date fix — chunks now store the session's
document_dateinstead of the ingestion timestamp. Critical for temporal queries and asOf time-travel. - asOf time-travel —
memory.search()accepts anasOfparameter to query memory state at any point in time. End-of-day rounding ensures same-day sessions are visible. - Source provenance — every memory tracks
source_role(user vs assistant) as a first-class field. Enables filtering by who said what. - Batch embedder — batches multiple embedding calls within a time window for efficient API usage.
- Retry with backoff — exponential backoff for rate-limited API calls.
The bundled greymemory-console renders your memory graph live over the same SQLite — nodes coloured by type (fact / preference / episode / raw chunk), edges by relation (EXTENDS, UPDATES, DERIVES, source-chunk), plus a time scrubber and live hybrid search. The capture above is a real 218-memory container from a LongMemEval run.
cd greymemory-console && npm run install:all && npm run dev
# → http://localhost:5173npm install greymemory
npx greymemory initThe CLI asks a few questions and generates a ready-to-use config file:
✦ greymemory — private memory for AI agents
? Extraction provider: Anthropic
? Extraction model: claude-haiku-4-5-20251001 (fast, cheap — recommended)
? Anthropic API key: ****
? Embedding provider: Ollama (free, local)
? Embedding model: mxbai-embed-large (recommended)
? Storage directory: .greymemory
? Container name: default
✔ greymemory.config.js created
✔ .env updated
.env added to .gitignore
✔ @anthropic-ai/sdk, dotenv installed
✦ Ready. Add to your project:
import memory from './greymemory.config.js'
await memory.add(messages)
await memory.search('query')
import memory from './greymemory.config.js'
// add a conversation — facts extracted, chunks stored, relationships detected
await memory.add([
{ role: 'user', content: 'My name is Arun. I work at Barbell Cartel as a product designer in Bangalore.' },
{ role: 'assistant', content: 'Got it!' }
])
// search — returns memory + source chunk paired together
const results = await memory.search('where does Arun work')
// [
// {
// memory: 'Arun works at Barbell Cartel as a product designer',
// chunk: 'user: My name is Arun. I work at Barbell Cartel...',
// memory_type: 'fact',
// confidence: 1.0,
// document_date: '2026-04-08',
// event_date: null,
// relation_type: null,
// source_role: 'user'
// }
// ]
// time-travel — query memory state at a specific date
const pastResults = await memory.search('where does Arun work', {
asOf: '2026-01-15'
})
// inject into your agent via profile
const { profile } = await memory.getProfile()
const systemPrompt = `You are a helpful assistant.
About this user:
${profile.static.join('\n')}
Current context:
${profile.dynamic.join('\n')}`npm install greymemory dotenvimport 'dotenv/config'
import GreyMemory from 'greymemory'
import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })
const memory = new GreyMemory({
// extractor receives a built prompt string, returns raw string
extractor: async (prompt) => {
const res = await anthropic.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 4096,
messages: [{ role: 'user', content: prompt }]
})
return res.content[0].text
},
// embedder converts text to a vector
embedder: async (text) => {
const res = await fetch('http://localhost:11434/api/embeddings', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: 'mxbai-embed-large', prompt: text })
})
return (await res.json()).embedding
},
// tell greymemory what to index and who this memory belongs to
filterPrompt: 'Index: decisions, preferences, projects. Skip: small talk.',
entityContext: 'Memory for Arun, a product designer based in Bangalore.',
})new GreyMemory({
extractor: async (prompt: string) => string, // required
embedder: async (text: string) => number[], // required
dir?: string, // storage directory, default: ".greymemory"
container?: string, // namespace isolation, default: "default"
filterPrompt?: string, // what to index and skip (org-level)
entityContext?: string, // who this memory belongs to (per-container)
db?: Database // existing better-sqlite3 connection
})Extracts memories, detects relationships, stores chunks with provenance.
// conversation
await memory.add([
{ role: 'user', content: 'I now work at Stripe as a PM' },
{ role: 'assistant', content: 'Congratulations!' }
])
// plain text
await memory.add('Arun is building greymemory, an open source memory library.')
// with date (for historical data ingestion)
await memory.add(messages, { date: '2026-01-15T10:30' })Hybrid BM25 + vector search. Returns atomic memories paired with source chunks.
// basic
const results = await memory.search('where does Arun work')
// with options
const results = await memory.search('investor meeting', {
topN: 3,
memoryTypes: ['episode'],
afterDate: '2026-04-01',
beforeDate: '2026-04-30',
asOf: '2026-04-15', // time-travel to this date
})Search options:
| Option | Type | Default | Description |
|---|---|---|---|
topN |
number |
5 |
Number of results |
memoryTypes |
string[] |
null |
Filter by type: fact, preference, episode |
afterDate |
string |
null |
Filter by event_date >= date |
beforeDate |
string |
null |
Filter by event_date <= date |
asOf |
string |
null |
Time-travel: only return facts that existed at this date |
includeHistory |
boolean |
false |
Include superseded facts |
includeExpired |
boolean |
false |
Include expired episodes |
Returns static/dynamic user profile for system prompt injection.
// profile only
const { profile } = await memory.getProfile()
// profile.static → ['Arun prefers TypeScript', 'Arun works at Stripe']
// profile.dynamic → ['Arun is building greymemory v0.4']
// profile + search in one call
const { profile, results } = await memory.getProfile({ q: 'current project' })
// inject into system prompt
const systemPrompt = `You are a helpful assistant.
About this user:
${profile.static.join('\n')}
Current context:
${profile.dynamic.join('\n')}`Classification:
static— preferences (always) + facts older than 7 daysdynamic— facts from the last 7 days + current episodes
Returns the current version of a fact via semantic search.
const current = await memory.getCurrent('where does Arun work')
// { id: 3, value: 'Arun works at Stripe', memory_type: 'fact', ... }Returns the full version chain for a fact, newest first.
const history = await memory.getHistory('where has Arun worked')
// [
// { value: 'Arun works at Stripe', is_latest: true },
// { value: 'Arun worked at Google', is_latest: false }
// ]Soft-delete a memory via semantic search. Disappears immediately from all queries. Preserved in database.
const forgotten = await memory.forget('investor demo')
// → 'Arun has an investor demo on Friday April 10th at 3pm'Infers second-order conclusions by combining existing memories. Call after add(), on a schedule, or before important queries.
await memory.add(messages)
await memory.runDerivations() // last 7 days
await memory.runDerivations({ sinceDays: 1, topK: 5 }) // just todayReturns all current memories as full row objects.
const memories = memory.getMemories()
// [{ id, key, value, memory_type, confidence, document_date, ... }]Alias for getMemories(). Kept for v0.2.x backward compatibility.
Deletes all facts, chunks, and embeddings for this container. Other containers untouched.
import Database from 'better-sqlite3'
import GreyMemory from 'greymemory'
const db = new Database('/home/user/.devlog/devlog.db')
const memory = new GreyMemory({ extractor, embedder, db, container: 'memory' })greymemory creates its own tables inside your existing database. Your existing tables are untouched.
const userA = new GreyMemory({ container: 'user-123', ...options })
const userB = new GreyMemory({ container: 'user-456', ...options })| Component | Cost per session | Monthly (1 session/day) |
|---|---|---|
| Extraction (Haiku) | ~$0.008 | ~$0.24 |
| Embedding (Voyage) | ~$0.004 | ~$0.12 |
| Embedding (Ollama) | free | free |
| Total (cloud embeddings) | ~$0.013 | ~$0.39 |
| Total (local embeddings) | ~$0.008 | ~$0.24 |
Query cost: ~$0.001 per search (embedding only). Storage: SQLite, zero cost.
Conversation
↓
Save chunks — one per message, with embeddings + source_role
↓
extractor()
Resolves ambiguity → classifies memory type → extracts atomic memories
STATE CHANGE RULE: captures casual mid-sentence updates
↓
For each memory:
_detectRelationship() → UPDATES | EXTENDS | NEW
saveFact() → stored with chunk_id, relation_type, event_date
supersedeFact() → if UPDATES, marks old fact is_latest=0
saveEmbedding() → each fact version gets its own embedding
↓
Optional: runDerivations() → second-order inferences stored as DERIVES
Query
↓
BM25 search + vector search (facts + chunks)
RRF fusion with confidence weighting
asOf filtering for time-travel queries
For each result: fetch source chunk via chunk_id
↓
{ memory, chunk, memory_type, confidence, source_role, document_date, event_date, ... }
| Provider | Extractor | Embedder |
|---|---|---|
| Anthropic | ✅ Claude Haiku, Sonnet, Opus | ❌ |
| OpenAI | ✅ GPT-4o-mini, GPT-4o | ✅ text-embedding-3-small/large |
| Voyage | ❌ | ✅ voyage-3, voyage-3-lite |
| Ollama | ✅ llama3, mistral, any model | ✅ mxbai-embed-large, nomic-embed-text |
| Cohere | ❌ | ✅ embed-english-v3.0 |
| Custom | ✅ any function | ✅ any function |
greymemory v0.4 is backward compatible with v0.3. No breaking changes.
New features (asOf, source_role, batch embedder) work automatically on existing databases. The source_role column is added via automatic migration on first use.
- Node.js 18+
- Ollama (if using local models) → ollama.com
brew install ollama
ollama pull mxbai-embed-large- SQLite storage
- Hybrid BM25 + vector search
- Raw chunk storage + dual retrieval
- Model-agnostic LLM interface
- Container isolation
- TypeScript types
- CLI setup wizard
- Existing SQLite database support
- Memory types — fact, preference, episode
- Relationship detection — UPDATES, EXTENDS, DERIVES
- Knowledge graph — getCurrent(), getHistory()
- User profiles — getProfile()
- Soft delete — forget()
- filterPrompt + entityContext
- Source provenance — source_role tracking
- Temporal reasoning — event_date extraction + timeline
- asOf time-travel queries
- Batch embedder
- LongMemEval benchmark (80.0% vs Supermemory 83.4%)
- Memory graph — fact_relations table with typed edges (UPDATES, EXTENDS, SIMILAR_TO, SIBLING)
- Graph traversal — cluster retrieval, supersession chain following, sibling expansion
- 3-layer context — static profile + dynamic profile + reranked search results
- Question-aware reranking — boost preferences/temporal/current-state by question type
- Query decomposition — split compound "A or B?" queries
- Memory expiration — configurable TTL per memory type
- MCP server —
npx greymemory-mcpfor Claude Code, Cursor, Windsurf - Fine-tuned extraction model — local 7B model, zero API dependency
- Community detection — topic clustering in memory graph
- Python SDK
Arunkumar — building AI agents in public.
Follow the journey: github.com/arun-dev-des
Apache 2.0 — see LICENSE for details.
