Skip to content

Creativity primitives: distant pairs, novelty scoring, random walk (#152)#164

Merged
mkreyman merged 4 commits into
masterfrom
feat/152-creativity-primitives
Jun 24, 2026
Merged

Creativity primitives: distant pairs, novelty scoring, random walk (#152)#164
mkreyman merged 4 commits into
masterfrom
feat/152-creativity-primitives

Conversation

@mkreyman

Copy link
Copy Markdown
Owner

Summary

Closes #152. Adds three computational-creativity primitives over the knowledge wiki's embedding space + link graph (ported from open_brain), their MCP tools, and reconciled docs.

New API (role: agent+)

Endpoint Purpose
GET /api/v1/knowledge/pairs Distant-but-bridgeable article pairs in the optimal-novelty embedding band (cosine distance, default 0.3–0.7) — the creative sweet spot. bridge_path=true requires a ≤2-hop link path. Paginated.
POST /api/v1/knowledge/novelty Score ideas by novelty = cosine distance to the nearest prior proposal (0 = identical, 1 = novel). Each idea embedded on the fly; priors default to articles tagged proposal.
GET /api/v1/knowledge/walk Random walk through the link graph from start_id (no cycles, dead-end safe) — surfaces unexpected connections.

Implementation notes

  • Loopctl.Knowledge.distant_pairs/2, random_walk/3, novelty_scores/3.
  • Bounded by design: samples ≤1000 embedded published articles to cap the O(n²) self-join; pair limit max 100; walk length max 25; novelty ≤50 ideas/request.
  • All queries tenant-scoped via AdminRepo with explicit tenant_id. Bridge filter uses literal single-line SQL fragments (no string concatenation — SQL-injection guard).
  • Fixed a UUID-encoding bug in the walk neighbor query: ^current_id / ^visited_ids inside the CASE fragment now carry explicit type/2 casts (Postgrex was rejecting the un-dumped UUID strings).

MCP server (2.20.0)

knowledge_distant_pairs, knowledge_novelty, knowledge_random_walk handlers + tool definitions.

Docs/help (same PR, per request)

  • mcp-server CHANGELOG.md 2.20.0 entry + README.md tool rows.
  • Tool counts brought to 65 across mcp-server/README.md, README.md, CLAUDE.md.
  • Reconciled the main README.md tool table, which had drifted across prior PRs — added the 8 missing knowledge tools (count, facets, graph, suggest_links, bulk_unpublish + the 3 new creativity tools).

Tests

18 controller tests: band filtering, custom band, bridge_path, deterministic pagination, novelty scoring (proposal-tagged priors, prior_tag override, no-priors → 1.0), walk traversal / no-cycles / dead-end, all 400/404 paths, and tenant isolation on every primitive. Full suite green (2494 tests, 0 failures); mix precommit clean.

mkreyman added 4 commits June 23, 2026 20:07
)

Adds three computational-creativity primitives over the knowledge wiki's
embedding space + link graph, plus their MCP tools and docs.

API (agent+):
- GET  /api/v1/knowledge/pairs    — distant-but-bridgeable article pairs in
  the optimal-novelty embedding band (cosine distance, default 0.3–0.7);
  optional bridge_path requires a <=2-hop link path. Paginated; samples up
  to 1000 embedded published articles to bound the O(n^2) self-join.
- POST /api/v1/knowledge/novelty  — score ideas by distance to the nearest
  prior proposal (0 = identical, 1 = novel); embeds each idea on the fly.
- GET  /api/v1/knowledge/walk     — random walk through the link graph from a
  start article (no cycles, dead-end safe).

Context module (Loopctl.Knowledge):
- distant_pairs/2, random_walk/3, novelty_scores/3 with hard caps
  (@max_pair_candidates 1000, pair limit max 100, walk length max 25).
- All queries tenant-scoped via AdminRepo; bridge filter uses literal
  single-line fragments (no string concatenation).

MCP server (2.20.0): knowledge_distant_pairs, knowledge_novelty,
knowledge_random_walk handlers + tool definitions.

Docs/help reconciled in the same PR: mcp-server CHANGELOG/README rows,
README + CLAUDE.md tool counts brought to 65 (the main README tool table
had also drifted — added the 8 missing knowledge tools: count, facets,
graph, suggest_links, bulk_unpublish, plus the 3 new creativity tools).

Tests: 18 controller tests covering band filtering, custom band,
bridge_path, deterministic pagination, novelty scoring with proposal-tagged
priors, prior_tag override, no-priors=1.0, walk traversal/no-cycles/
dead-end, 400/404 paths, and tenant isolation on every primitive.
Team review (BA + architect + engineer) findings, all fixed in-PR:

- [HIGH] Blank idea text no longer reaches the embedding service: score_idea
  short-circuits empty text to novelty_score: nil, so a batch of blank ideas
  can't waste upstream calls or trip the embedding circuit breaker tenant-wide.
- [HIGH] /pairs now returns a pagination termination signal: meta carries
  total_count + has_more (limit+1 look-ahead + a count query in one transaction).
- [HIGH→bounded] distant_pairs candidate cap is now operator-tunable via
  config :loopctl, :max_pair_candidates (bounds the O(n²) self-join).
- [MED] novelty_score = 1.0 no longer overloads "no priors": nearest_prior_distance
  returns nil when there are no priors (distinguishable from a genuine orthogonal
  distance of 1.0); response exposes meta.prior_count.
- [MED] Docs corrected — novelty_score is cosine distance in [0, 2] (0 = identical,
  higher = more novel), not a [0, 1] "1 = maximally novel" scale. Updated controller
  spec, knowledge.ex docstring, MCP tool description, mcp-server README/CHANGELOG,
  and root README.
- [MED] OpenApiSpex 200 responses for pairs/novelty/walk are now typed (data + meta
  shapes) instead of opaque objects.
- [LOW] novelty scoring embeds ideas concurrently (Task.async_stream, bounded at 5)
  instead of serializing up to 50 embedding round-trips.

Accepted by design (documented, not deferred):
- distant_pairs samples the lowest-id 1000-article slice for deterministic
  pagination; disclosed in the OpenAPI description + docstring and tunable via config.
  Random sampling would break stable offset pagination.

Tests: +2 controller tests (no-priors → nil, prior_count in meta, blank-text → zero
embedding calls). Full suite green (2496 tests), mix precommit clean.
- Remove AdminRepo.transaction that held pool connections for full O(n²) self-join
  (prevents AdminRepo pool exhaustion under 3 concurrent agent pairs calls)
- Add error handling in novelty_scores async_stream (catch task exit/error, return
  sanitized idea with novelty_score: nil instead of FunctionClauseError 500)
- Add @max_idea_text_bytes cap (4MB) in novelty_idea_text to prevent unbounded
  embedding input and circuit-breaker pollution
- Add receive_timeout (30s) to OpenAI embedding Req.post call
- Use two separate queries (count + pairs) instead of transaction (acceptable
  count drift trade-off to avoid connection hold)
Round-2 (4-agent adversarial) findings, all fixed in-PR:

- [HIGH/DoS] distant_pairs no longer wraps its count + page queries in an
  AdminRepo.transaction. The transaction pinned one connection (prod pool_size: 3)
  through the full O(n²) self-join for up to 15s, so 3 concurrent /pairs calls
  could starve the pool app-wide. Count and page now run as independent checkouts;
  the candidate cap is operator-tunable via config :loopctl, :max_pair_candidates.
- [HIGH/DoS] Novelty idea text is capped before embedding, so an oversized payload
  can't amplify upstream embedding cost.
- [MED] novelty_scores survives a task crash: the async_stream consumer handles
  {:exit, _} (logs + scores that idea nil) instead of a {:ok, _}-only match that
  would FunctionClauseError and 500 the whole batch. nearest_prior_distance and the
  prior-count query now carry explicit 15s timeouts (the stream uses timeout: :infinity,
  so an unguarded DB call could otherwise hang the request).
- [MED] meta.prior_count now counts only EMBEDDED priors — the set actually compared
  against (matches nearest_prior_distance). prior_count is computed once in
  novelty_scores (returns {:ok, scored, prior_count}); when it is 0 the batch skips
  embedding entirely. Previously it counted tagged-but-unembedded priors, so a
  client couldn't trust prior_count > 0 to mean a comparison was possible.
- [HIGH/docs] Fixed an inverted controller description ("nil ... or priors exist" →
  "no priors exist") and disclosed the third null cause (embedding failure) on every
  client-facing surface (controller spec, MCP tool description, mcp-server README +
  CHANGELOG, root README). knowledge.ex docstring already listed all three.
- [MED/docs] Disclosed that /pairs total_count is over the sampled ≤1000-article
  slice (can undercount on large tenants).
- [LOW] The bridge_path 2-hop filter now requires the shared neighbor to be a
  distinct PUBLISHED article (consistent with random_walk's published-only
  neighbors) — a pair no longer bridges through a draft/archived middle.

Tests (+4): 2-hop bridge match, draft-middle rejected, candidate-cap truncation
(test cap = 25 in config/test.exs), and embedding-failure → nil with prior_count > 0.
Full suite green (2500 tests), mix precommit clean, MCP 41/41.
@mkreyman mkreyman merged commit 8864c36 into master Jun 24, 2026
6 checks passed
@mkreyman mkreyman deleted the feat/152-creativity-primitives branch June 24, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Creativity support: distant pairs, novelty scoring, random walks (for idea-synthesizer)

1 participant