Creativity primitives: distant pairs, novelty scoring, random walk (#152) by mkreyman · Pull Request #164 · mkreyman/loopctl

mkreyman · 2026-06-24T02:08:16Z

Summary

Closes #152. Adds three computational-creativity primitives over the knowledge wiki's embedding space + link graph (ported from open_brain), their MCP tools, and reconciled docs.

New API (role: agent+)

Endpoint	Purpose
`GET /api/v1/knowledge/pairs`	Distant-but-bridgeable article pairs in the optimal-novelty embedding band (cosine distance, default 0.3–0.7) — the creative sweet spot. `bridge_path=true` requires a ≤2-hop link path. Paginated.
`POST /api/v1/knowledge/novelty`	Score ideas by novelty = cosine distance to the nearest prior proposal (0 = identical, 1 = novel). Each idea embedded on the fly; priors default to articles tagged `proposal`.
`GET /api/v1/knowledge/walk`	Random walk through the link graph from `start_id` (no cycles, dead-end safe) — surfaces unexpected connections.

Implementation notes

Loopctl.Knowledge.distant_pairs/2, random_walk/3, novelty_scores/3.
Bounded by design: samples ≤1000 embedded published articles to cap the O(n²) self-join; pair limit max 100; walk length max 25; novelty ≤50 ideas/request.
All queries tenant-scoped via AdminRepo with explicit tenant_id. Bridge filter uses literal single-line SQL fragments (no string concatenation — SQL-injection guard).
Fixed a UUID-encoding bug in the walk neighbor query: ^current_id / ^visited_ids inside the CASE fragment now carry explicit type/2 casts (Postgrex was rejecting the un-dumped UUID strings).

MCP server (2.20.0)

knowledge_distant_pairs, knowledge_novelty, knowledge_random_walk handlers + tool definitions.

Docs/help (same PR, per request)

mcp-server CHANGELOG.md 2.20.0 entry + README.md tool rows.
Tool counts brought to 65 across mcp-server/README.md, README.md, CLAUDE.md.
Reconciled the main README.md tool table, which had drifted across prior PRs — added the 8 missing knowledge tools (count, facets, graph, suggest_links, bulk_unpublish + the 3 new creativity tools).

Tests

18 controller tests: band filtering, custom band, bridge_path, deterministic pagination, novelty scoring (proposal-tagged priors, prior_tag override, no-priors → 1.0), walk traversal / no-cycles / dead-end, all 400/404 paths, and tenant isolation on every primitive. Full suite green (2494 tests, 0 failures); mix precommit clean.

) Adds three computational-creativity primitives over the knowledge wiki's embedding space + link graph, plus their MCP tools and docs. API (agent+): - GET /api/v1/knowledge/pairs — distant-but-bridgeable article pairs in the optimal-novelty embedding band (cosine distance, default 0.3–0.7); optional bridge_path requires a <=2-hop link path. Paginated; samples up to 1000 embedded published articles to bound the O(n^2) self-join. - POST /api/v1/knowledge/novelty — score ideas by distance to the nearest prior proposal (0 = identical, 1 = novel); embeds each idea on the fly. - GET /api/v1/knowledge/walk — random walk through the link graph from a start article (no cycles, dead-end safe). Context module (Loopctl.Knowledge): - distant_pairs/2, random_walk/3, novelty_scores/3 with hard caps (@max_pair_candidates 1000, pair limit max 100, walk length max 25). - All queries tenant-scoped via AdminRepo; bridge filter uses literal single-line fragments (no string concatenation). MCP server (2.20.0): knowledge_distant_pairs, knowledge_novelty, knowledge_random_walk handlers + tool definitions. Docs/help reconciled in the same PR: mcp-server CHANGELOG/README rows, README + CLAUDE.md tool counts brought to 65 (the main README tool table had also drifted — added the 8 missing knowledge tools: count, facets, graph, suggest_links, bulk_unpublish, plus the 3 new creativity tools). Tests: 18 controller tests covering band filtering, custom band, bridge_path, deterministic pagination, novelty scoring with proposal-tagged priors, prior_tag override, no-priors=1.0, walk traversal/no-cycles/ dead-end, 400/404 paths, and tenant isolation on every primitive.

Team review (BA + architect + engineer) findings, all fixed in-PR: - [HIGH] Blank idea text no longer reaches the embedding service: score_idea short-circuits empty text to novelty_score: nil, so a batch of blank ideas can't waste upstream calls or trip the embedding circuit breaker tenant-wide. - [HIGH] /pairs now returns a pagination termination signal: meta carries total_count + has_more (limit+1 look-ahead + a count query in one transaction). - [HIGH→bounded] distant_pairs candidate cap is now operator-tunable via config :loopctl, :max_pair_candidates (bounds the O(n²) self-join). - [MED] novelty_score = 1.0 no longer overloads "no priors": nearest_prior_distance returns nil when there are no priors (distinguishable from a genuine orthogonal distance of 1.0); response exposes meta.prior_count. - [MED] Docs corrected — novelty_score is cosine distance in [0, 2] (0 = identical, higher = more novel), not a [0, 1] "1 = maximally novel" scale. Updated controller spec, knowledge.ex docstring, MCP tool description, mcp-server README/CHANGELOG, and root README. - [MED] OpenApiSpex 200 responses for pairs/novelty/walk are now typed (data + meta shapes) instead of opaque objects. - [LOW] novelty scoring embeds ideas concurrently (Task.async_stream, bounded at 5) instead of serializing up to 50 embedding round-trips. Accepted by design (documented, not deferred): - distant_pairs samples the lowest-id 1000-article slice for deterministic pagination; disclosed in the OpenAPI description + docstring and tunable via config. Random sampling would break stable offset pagination. Tests: +2 controller tests (no-priors → nil, prior_count in meta, blank-text → zero embedding calls). Full suite green (2496 tests), mix precommit clean.

- Remove AdminRepo.transaction that held pool connections for full O(n²) self-join (prevents AdminRepo pool exhaustion under 3 concurrent agent pairs calls) - Add error handling in novelty_scores async_stream (catch task exit/error, return sanitized idea with novelty_score: nil instead of FunctionClauseError 500) - Add @max_idea_text_bytes cap (4MB) in novelty_idea_text to prevent unbounded embedding input and circuit-breaker pollution - Add receive_timeout (30s) to OpenAI embedding Req.post call - Use two separate queries (count + pairs) instead of transaction (acceptable count drift trade-off to avoid connection hold)

Round-2 (4-agent adversarial) findings, all fixed in-PR: - [HIGH/DoS] distant_pairs no longer wraps its count + page queries in an AdminRepo.transaction. The transaction pinned one connection (prod pool_size: 3) through the full O(n²) self-join for up to 15s, so 3 concurrent /pairs calls could starve the pool app-wide. Count and page now run as independent checkouts; the candidate cap is operator-tunable via config :loopctl, :max_pair_candidates. - [HIGH/DoS] Novelty idea text is capped before embedding, so an oversized payload can't amplify upstream embedding cost. - [MED] novelty_scores survives a task crash: the async_stream consumer handles {:exit, _} (logs + scores that idea nil) instead of a {:ok, _}-only match that would FunctionClauseError and 500 the whole batch. nearest_prior_distance and the prior-count query now carry explicit 15s timeouts (the stream uses timeout: :infinity, so an unguarded DB call could otherwise hang the request). - [MED] meta.prior_count now counts only EMBEDDED priors — the set actually compared against (matches nearest_prior_distance). prior_count is computed once in novelty_scores (returns {:ok, scored, prior_count}); when it is 0 the batch skips embedding entirely. Previously it counted tagged-but-unembedded priors, so a client couldn't trust prior_count > 0 to mean a comparison was possible. - [HIGH/docs] Fixed an inverted controller description ("nil ... or priors exist" → "no priors exist") and disclosed the third null cause (embedding failure) on every client-facing surface (controller spec, MCP tool description, mcp-server README + CHANGELOG, root README). knowledge.ex docstring already listed all three. - [MED/docs] Disclosed that /pairs total_count is over the sampled ≤1000-article slice (can undercount on large tenants). - [LOW] The bridge_path 2-hop filter now requires the shared neighbor to be a distinct PUBLISHED article (consistent with random_walk's published-only neighbors) — a pair no longer bridges through a draft/archived middle. Tests (+4): 2-hop bridge match, draft-middle rejected, candidate-cap truncation (test cap = 25 in config/test.exs), and embedding-failure → nil with prior_count > 0. Full suite green (2500 tests), mix precommit clean, MCP 41/41.

mkreyman added 4 commits June 23, 2026 20:07

mkreyman merged commit 8864c36 into master Jun 24, 2026
6 checks passed

mkreyman deleted the feat/152-creativity-primitives branch June 24, 2026 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creativity primitives: distant pairs, novelty scoring, random walk (#152)#164

Creativity primitives: distant pairs, novelty scoring, random walk (#152)#164
mkreyman merged 4 commits into
masterfrom
feat/152-creativity-primitives

mkreyman commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mkreyman commented Jun 24, 2026

Summary

New API (role: agent+)

Implementation notes

MCP server (2.20.0)

Docs/help (same PR, per request)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant