chore(deps): bump agent-eval 0.95.1 + agent-runtime 0.70.0 by drewstone · Pull Request #27 · tangle-network/agent-knowledge

drewstone · 2026-06-21T19:58:26Z

Propagates the new substrate floor: @tangle-network/agent-eval ^0.95.1 and @tangle-network/agent-runtime ^0.70.0. Raises the @tangle-network/sandbox peer floor to ^0.8.0. Resolved at install: agent-eval 0.95.1, agent-runtime 0.70.0, sandbox 0.8.2.

Clean

pnpm run typecheck — 0 errors
pnpm run build (tsup) — success (ESM + DTS)
pnpm test — 114 passed, 5 skipped (live-network sources-live only), 0 failures
pnpm run lint — 0 errors (3 pre-existing warnings in files this PR does not touch)
Branch merges cleanly into main (git merge-tree)

Drift fixed

agent-runtime 0.70.0 removed createDriver and the DriverDecision type from the /loops subpath (the new export surface is . /agent /intelligence /loops /profiles /mcp; the Driver interface itself is still exported).

src/profiles/researcher.ts — multiHarnessResearcherFanout was built on createDriver({ planner }). Rebuilt its single-fanout-then-stop topology as a direct Driver<ResearchTask, ResearchOutput, FanoutDecision> literal over the still-exported Driver interface:

name: 'dynamic', plan issues N task copies on round 0 then [], decide returns the kernel-terminal 'done' after the fanout round.
Added a local FanoutDecision = 'continue' | 'done' type to replace the removed DriverDecision.
This preserves the exact behavior the loop tests pin (driver.name === 'dynamic', N iterations, result.decision === 'done', winner selected by the kernel's defaultSelectWinner).

Remaining

None. Typecheck + build + tests + lint all green; no other call sites referenced the removed API.

Refresh the substrate floor to agent-eval ^0.95.1 and agent-runtime ^0.70.0, raising the sandbox peer floor to ^0.8.0. agent-runtime 0.70.0 drops the createDriver factory and DriverDecision type from the /loops subpath. Rebuild multiHarnessResearcherFanout's single-fanout-then-stop topology as a direct Driver literal over the still-exported Driver interface, preserving the name:'dynamic' / fanout-N / 'done'-terminal behavior the loop tests pin.

tangletools

✅ Auto-approved PR — `f0de4285`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-21T19:58:33Z}

tangletools

🟢 Value Audit — sound


Verdict	sound
Concerns	0 (none)
Heuristic	0.0s
Duplication	0.0s
Interrogation	418.9s (2 bridge agents)
Total	418.9s

💰 Value — sound

Bumps substrate deps to current floors and rewrites the single-fanout researcher driver as a literal after createDriver was removed from agent-runtime 0.70.0; behavior is preserved and tests are green.

What it does: Raises @tangle-network/agent-eval to ^0.95.1, @tangle-network/agent-runtime to ^0.70.0, and @tangle-network/sandbox to ^0.8.0. In src/profiles/researcher.ts it replaces the removed createDriver({ planner }) call with an explicit Driver<ResearchTask, ResearchOutput, FanoutDecision> literal that fans out N task copies on round 0 and returns 'done' afterwards, keeping the same `na
Goals it achieves: Keeps agent-knowledge on the supported substrate floor instead of drifting behind agent-runtime/agent-eval; preserves the existing multi-harness research fanout capability without changing consumer contracts.
Assessment: Good. The driver literal is a direct, minimal mapping of the old planner semantics onto the new runtime export surface. Typecheck, build, lint, and tests are reported green; the change is isolated to the one call site that needed it.
Better / existing approach: none — this is the right approach. I searched src/**/*.ts and tests/**/*.ts for createDriver, Driver<, and @tangle-network/agent-runtime/loops; src/profiles/researcher.ts is the only driver in the repo, and src/research-loop.ts is a separate knowledge-growth control-loop abstraction that does not do harness fanout. Because agent-runtime 0.70.0 removed createDriver from /loops, bu
Model: opencode/kimi-for-coding/k2p7
Bridge attempts: 1

🎯 Usefulness — sound

Clean drift fix: rebuilds the researcher's fanout driver as a direct Driver literal over the still-exported interface after agent-runtime 0.70.0 removed createDriver/DriverDecision; behavior is pinned by loop integration tests.

Integration: multiHarnessResearcherFanout is re-exported from src/profiles/index.ts:22 and exercises the real runLoop from agent-runtime 0.70.0 in tests/loops/researcher-integration.test.ts:92-117. The rebuilt driver (src/profiles/researcher.ts:214) implements the new Driver<Task,Output,Decision> interface verbatim — name:'dynamic', plan returns N task copies on empty history then [], decide returns 'done' (a
Fit with existing patterns: Direct Driver literal IS the idiomatic 0.70.0 pattern — the createDriver helper was removed, so the kernel-authoritative interface is the only seam. Higher-level helpers (fanout, loopUntil, defineStrategy) would add ceremony without fitting the N-harness one-shot-fanout topology. No competing pattern in this repo.
Real-world viability: Solid on the realistic paths. The kernel appends an Iteration even when a worker errors (Iteration.error set), so decide still sees length=N and returns 'done' — no stall. Harnesses array can't be empty (constructor defaults to 3 built-ins at src/profiles/researcher.ts:201). Winner falls through to defaultSelectWinner (best-valid, earliest-index) per the LoopResult contract. The 'continue' branch
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

No concerns — sound change, no better or existing approach found. ✅

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260621T201337Z}

tangletools · 2026-06-21T20:15:25Z

✅ No Blockers — `f0de4285`

Readiness 76/100 · Confidence 75/100 · 10 findings (1 medium, 9 low)

	opencode-kimi	glm	deepseek	aggregate
Readiness	89	83	76	76
Confidence	75	75	75	75
Correctness	89	83	76	76
Security	89	83	76	76
Testing	89	83	76	76
Architecture	89	83	76	76

Full multi-shot audit completed 3/3 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 3 changed files. Global verifier still owns final merge decision.

🟠 MEDIUM Duplicate agent-interface versions in resolution (0.8.0 vs 0.10.1) — pnpm-lock.yaml

sandbox@0.8.2 depends on @tangle-network/agent-interface@0.8.0 (L1283-1284) while agent-eval@0.95.1 and agent-runtime@0.70.0 depend on @tangle-network/agent-interface@0.10.1 (L1279-1280). pnpm resolves both into separate symlink targets, meaning the same logical interface type exists in two copies. If any code path passes an object instantiated through sandbox (using agent-interface@0.8.0 types) to agent-eval or agent-runtime (expecting agent-interface@0.10.1 types), instanceof checks, zod schemas, or type-only discriminators may fail silently at runtime. Verify sandbox-returned objects are not consumed directly by agent-eval/agent-runtime call sites in this repo.

🟡 LOW sandbox caret floor ^0.8.0 targets an unpublished version — package.json

package.json:78 declares "@tangle-network/sandbox": "^0.8.0". npm registry publishes 0.6.2 then 0.8.1 then 0.8.2 — there is no 0.8.0 release. The caret range still resolves correctly to 0.8.2 (verified in pnpm-lock.yaml) and satisfies agent-runtime@0.70.0's peer floor of >=0.8.0 <1.0.0, so this is purely cosmetic. Suggest aligning the floor to ^0.8.1 or ^0.8.2 to match an actual published tag and avoid implying a release that doesn't exist. No runtime impact.

🟡 LOW sandbox@0.8.0 exact version not published on npm — package.json

Line 78: @tangle-network/sandbox@^0.8.0 — version 0.8.0 does not exist on npm (published versions: 0.8.1, 0.8.2). The lockfile resolves to 0.8.2 so installs work, but the absent 0.8.0 base suggests a publish-revert or pre-release sequence. Non-blocking.

🟡 LOW Dual @tangle-network/agent-interface versions in lockfile — pnpm-lock.yaml

The lockfile resolves both @tangle-network/agent-interface@0.10.1 (agent-runtime 0.70.0 optional peer, agent-eval 0.95.1 dep) and @tangle-network/agent-interface@0.8.0 (sandbox 0.8.2 dep). Lines 466-469 and 1279-1283. This creates two isolated copies of the interface package. If agent-interface is a shared contract, passing typed objects between runtime/sandbox code paths could mismatch at runtime. Tests currently pass and this repo does not import agent-interface directly, so this is a nit, not a blocker. Fix: align sandbox's agent-interface dependency to ^0.10.0 or confirm the version split is safe.

🟡 LOW Dual zod instances (4.4.2 and 4.4.3) in the resolved tree — pnpm-lock.yaml

agent-interface@0.10.1 (new transitive dep of agent-eval@0.95.1) and agent-interface@0.8.0 (transitive dep of sandbox@0.8.2) both pin zod@4.4.3, while the root and agent-eval continue on zod@4.4.2. Lockfile lines: '+ zod@4.4.3:' (packages section) and 'zod: 4.4.3' under both agent-interface snapshots. Result: two copies of zod in node_modules. Impact is normally nil for a patch bump, but zod v4 has a history of schema instanceof checks failing across instances; any code that passes a schema from root into agent-interface (or vice versa) could trip a 'schema is not a ZodSchema' style error. No evidence this happens here — flagging as a watch-item, not a blocker. Fix only if a downstream instanceof failure surfaces: pin zod with pnpm.overrides to a single version.

🟡 LOW Two versions of @tangle-network/agent-interface coexist (0.8.0 + 0.10.1) — pnpm-lock.yaml

sandbox@0.8.2 depends on agent-interface@0.8.0 while agent-eval@0.95.1 depends on agent-interface@0.10.1. Both resolve, no peer violation, install succeeds. Minor concern only: if any shared type flows through both versions (e.g., a Message/Tool definition authored against 0.8.0 reaching code expecting 0.10.1), structural equality could silently drift. Not actionable in the lockfile itself; justifying a watch-item rather than a fix. The PR's stated scope is just the agent-eval + agent-runtime bumps, so the new agent-interface transitive is expected.

🟡 LOW Import-type lint warning in changed file — src/profiles/researcher.ts

Running pnpm lint reports: src/profiles/researcher.ts uses import { type AgentProfile, ... } instead of import type { AgentProfile, ... }. The original file already used this style, so the PR did not introduce it, but since the imports were edited (createDriver/DriverDecision removed, Iteration added) the warning is still present in the diff. Impact: style-only; no runtime effect. Fix: change import { type ... } to import type { ... }.

🟡 LOW Unreachable 'continue' branch in decide function — src/profiles/researcher.ts

src/profiles/researcher.ts:218-219 — decide returns 'continue' when history.length === 0. In the kernel's calling order, plan() runs before decide(), so history is always non-empty when decide is called. The 'continue' branch is dead code. Not a correctness issue (the driver terminates correctly on 'done'), but dead code obscures intent. Consider removing the branch or adding a comment acknowledging it's a safety net.

🟡 LOW decide's history.length===0 branch is unreachable defensive code — src/profiles/researcher.ts

Lines 218-219: decide: (history) => history.length === 0 ? 'continue' : 'done'. The kernel (agent-runtime 0.70 chunk-QXWGSDAQ.js:1240-1340) always calls plan first; if plan returns tasks the kernel runs them and grows history before invoking decide, and if plan returns [] the kernel breaks out of the loop and reaches decideAndFinalize only with whatever history already accumulated (still length>0 after round 0). So the 'continue' arm can never fire in this topology. Not a bug — the value is harmless and the branch reads defensively — but it is misleading: a reader may believe 'continue' is exercisable. Either drop the ternary (decide: () => 'done') or

🟡 LOW maxFanout cap (was 4) removed in migration — src/profiles/researcher.ts

src/profiles/researcher.ts:216-217 — The old createDriver internally clamped fanout width at 4 via validateMove (would throw PlannerError for >4 tasks). The new direct Driver has no fanout limit. Default harness count is 3, so default path is safe. A caller passing >4 harnesses would previously get an error; now it silently fans out all harnesses, which could cause resource/API pressure. Consider adding an explicit guard if >N harnesses is known to be problematic.

_{tangletools · 2026-06-21T20:15:22Z · trace}

tangletools approved these changes Jun 21, 2026

View reviewed changes

tangletools reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): bump agent-eval 0.95.1 + agent-runtime 0.70.0#27

chore(deps): bump agent-eval 0.95.1 + agent-runtime 0.70.0#27
drewstone wants to merge 1 commit into
mainfrom
chore/bump-substrate-0.95.1

drewstone commented Jun 21, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 21, 2026

Clean

Drift fixed

Remaining

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — f0de4285

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟢 Value Audit — sound

💰 Value — sound

🎯 Usefulness — sound

Uh oh!

tangletools commented Jun 21, 2026

✅ No Blockers — f0de4285

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `f0de4285`

✅ No Blockers — `f0de4285`