Creativity primitives: distant pairs, novelty scoring, random walk (#152)#164
Merged
Conversation
) Adds three computational-creativity primitives over the knowledge wiki's embedding space + link graph, plus their MCP tools and docs. API (agent+): - GET /api/v1/knowledge/pairs — distant-but-bridgeable article pairs in the optimal-novelty embedding band (cosine distance, default 0.3–0.7); optional bridge_path requires a <=2-hop link path. Paginated; samples up to 1000 embedded published articles to bound the O(n^2) self-join. - POST /api/v1/knowledge/novelty — score ideas by distance to the nearest prior proposal (0 = identical, 1 = novel); embeds each idea on the fly. - GET /api/v1/knowledge/walk — random walk through the link graph from a start article (no cycles, dead-end safe). Context module (Loopctl.Knowledge): - distant_pairs/2, random_walk/3, novelty_scores/3 with hard caps (@max_pair_candidates 1000, pair limit max 100, walk length max 25). - All queries tenant-scoped via AdminRepo; bridge filter uses literal single-line fragments (no string concatenation). MCP server (2.20.0): knowledge_distant_pairs, knowledge_novelty, knowledge_random_walk handlers + tool definitions. Docs/help reconciled in the same PR: mcp-server CHANGELOG/README rows, README + CLAUDE.md tool counts brought to 65 (the main README tool table had also drifted — added the 8 missing knowledge tools: count, facets, graph, suggest_links, bulk_unpublish, plus the 3 new creativity tools). Tests: 18 controller tests covering band filtering, custom band, bridge_path, deterministic pagination, novelty scoring with proposal-tagged priors, prior_tag override, no-priors=1.0, walk traversal/no-cycles/ dead-end, 400/404 paths, and tenant isolation on every primitive.
Team review (BA + architect + engineer) findings, all fixed in-PR: - [HIGH] Blank idea text no longer reaches the embedding service: score_idea short-circuits empty text to novelty_score: nil, so a batch of blank ideas can't waste upstream calls or trip the embedding circuit breaker tenant-wide. - [HIGH] /pairs now returns a pagination termination signal: meta carries total_count + has_more (limit+1 look-ahead + a count query in one transaction). - [HIGH→bounded] distant_pairs candidate cap is now operator-tunable via config :loopctl, :max_pair_candidates (bounds the O(n²) self-join). - [MED] novelty_score = 1.0 no longer overloads "no priors": nearest_prior_distance returns nil when there are no priors (distinguishable from a genuine orthogonal distance of 1.0); response exposes meta.prior_count. - [MED] Docs corrected — novelty_score is cosine distance in [0, 2] (0 = identical, higher = more novel), not a [0, 1] "1 = maximally novel" scale. Updated controller spec, knowledge.ex docstring, MCP tool description, mcp-server README/CHANGELOG, and root README. - [MED] OpenApiSpex 200 responses for pairs/novelty/walk are now typed (data + meta shapes) instead of opaque objects. - [LOW] novelty scoring embeds ideas concurrently (Task.async_stream, bounded at 5) instead of serializing up to 50 embedding round-trips. Accepted by design (documented, not deferred): - distant_pairs samples the lowest-id 1000-article slice for deterministic pagination; disclosed in the OpenAPI description + docstring and tunable via config. Random sampling would break stable offset pagination. Tests: +2 controller tests (no-priors → nil, prior_count in meta, blank-text → zero embedding calls). Full suite green (2496 tests), mix precommit clean.
- Remove AdminRepo.transaction that held pool connections for full O(n²) self-join (prevents AdminRepo pool exhaustion under 3 concurrent agent pairs calls) - Add error handling in novelty_scores async_stream (catch task exit/error, return sanitized idea with novelty_score: nil instead of FunctionClauseError 500) - Add @max_idea_text_bytes cap (4MB) in novelty_idea_text to prevent unbounded embedding input and circuit-breaker pollution - Add receive_timeout (30s) to OpenAI embedding Req.post call - Use two separate queries (count + pairs) instead of transaction (acceptable count drift trade-off to avoid connection hold)
Round-2 (4-agent adversarial) findings, all fixed in-PR:
- [HIGH/DoS] distant_pairs no longer wraps its count + page queries in an
AdminRepo.transaction. The transaction pinned one connection (prod pool_size: 3)
through the full O(n²) self-join for up to 15s, so 3 concurrent /pairs calls
could starve the pool app-wide. Count and page now run as independent checkouts;
the candidate cap is operator-tunable via config :loopctl, :max_pair_candidates.
- [HIGH/DoS] Novelty idea text is capped before embedding, so an oversized payload
can't amplify upstream embedding cost.
- [MED] novelty_scores survives a task crash: the async_stream consumer handles
{:exit, _} (logs + scores that idea nil) instead of a {:ok, _}-only match that
would FunctionClauseError and 500 the whole batch. nearest_prior_distance and the
prior-count query now carry explicit 15s timeouts (the stream uses timeout: :infinity,
so an unguarded DB call could otherwise hang the request).
- [MED] meta.prior_count now counts only EMBEDDED priors — the set actually compared
against (matches nearest_prior_distance). prior_count is computed once in
novelty_scores (returns {:ok, scored, prior_count}); when it is 0 the batch skips
embedding entirely. Previously it counted tagged-but-unembedded priors, so a
client couldn't trust prior_count > 0 to mean a comparison was possible.
- [HIGH/docs] Fixed an inverted controller description ("nil ... or priors exist" →
"no priors exist") and disclosed the third null cause (embedding failure) on every
client-facing surface (controller spec, MCP tool description, mcp-server README +
CHANGELOG, root README). knowledge.ex docstring already listed all three.
- [MED/docs] Disclosed that /pairs total_count is over the sampled ≤1000-article
slice (can undercount on large tenants).
- [LOW] The bridge_path 2-hop filter now requires the shared neighbor to be a
distinct PUBLISHED article (consistent with random_walk's published-only
neighbors) — a pair no longer bridges through a draft/archived middle.
Tests (+4): 2-hop bridge match, draft-middle rejected, candidate-cap truncation
(test cap = 25 in config/test.exs), and embedding-failure → nil with prior_count > 0.
Full suite green (2500 tests), mix precommit clean, MCP 41/41.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #152. Adds three computational-creativity primitives over the knowledge wiki's embedding space + link graph (ported from open_brain), their MCP tools, and reconciled docs.
New API (role: agent+)
GET /api/v1/knowledge/pairsbridge_path=truerequires a ≤2-hop link path. Paginated.POST /api/v1/knowledge/noveltyproposal.GET /api/v1/knowledge/walkstart_id(no cycles, dead-end safe) — surfaces unexpected connections.Implementation notes
Loopctl.Knowledge.distant_pairs/2,random_walk/3,novelty_scores/3.AdminRepowith explicittenant_id. Bridge filter uses literal single-line SQL fragments (no string concatenation — SQL-injection guard).^current_id/^visited_idsinside theCASEfragment now carry explicittype/2casts (Postgrex was rejecting the un-dumped UUID strings).MCP server (2.20.0)
knowledge_distant_pairs,knowledge_novelty,knowledge_random_walkhandlers + tool definitions.Docs/help (same PR, per request)
CHANGELOG.md2.20.0 entry +README.mdtool rows.mcp-server/README.md,README.md,CLAUDE.md.README.mdtool table, which had drifted across prior PRs — added the 8 missing knowledge tools (count,facets,graph,suggest_links,bulk_unpublish+ the 3 new creativity tools).Tests
18 controller tests: band filtering, custom band,
bridge_path, deterministic pagination, novelty scoring (proposal-tagged priors,prior_tagoverride, no-priors → 1.0), walk traversal / no-cycles / dead-end, all 400/404 paths, and tenant isolation on every primitive. Full suite green (2494 tests, 0 failures);mix precommitclean.