chore(deps): bump agent-eval 0.95.1 + agent-runtime 0.70.0#27
Conversation
Refresh the substrate floor to agent-eval ^0.95.1 and agent-runtime ^0.70.0, raising the sandbox peer floor to ^0.8.0. agent-runtime 0.70.0 drops the createDriver factory and DriverDecision type from the /loops subpath. Rebuild multiHarnessResearcherFanout's single-fanout-then-stop topology as a direct Driver literal over the still-exported Driver interface, preserving the name:'dynamic' / fanout-N / 'done'-terminal behavior the loop tests pin.
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — f0de4285
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-21T19:58:33Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 418.9s (2 bridge agents) |
| Total | 418.9s |
💰 Value — sound
Bumps substrate deps to current floors and rewrites the single-fanout researcher driver as a literal after createDriver was removed from agent-runtime 0.70.0; behavior is preserved and tests are green.
- What it does: Raises
@tangle-network/agent-evalto^0.95.1,@tangle-network/agent-runtimeto^0.70.0, and@tangle-network/sandboxto^0.8.0. Insrc/profiles/researcher.tsit replaces the removedcreateDriver({ planner })call with an explicitDriver<ResearchTask, ResearchOutput, FanoutDecision>literal that fans out N task copies on round 0 and returns'done'afterwards, keeping the same `na - Goals it achieves: Keeps agent-knowledge on the supported substrate floor instead of drifting behind agent-runtime/agent-eval; preserves the existing multi-harness research fanout capability without changing consumer contracts.
- Assessment: Good. The driver literal is a direct, minimal mapping of the old planner semantics onto the new runtime export surface. Typecheck, build, lint, and tests are reported green; the change is isolated to the one call site that needed it.
- Better / existing approach: none — this is the right approach. I searched
src/**/*.tsandtests/**/*.tsforcreateDriver,Driver<, and@tangle-network/agent-runtime/loops;src/profiles/researcher.tsis the only driver in the repo, andsrc/research-loop.tsis a separate knowledge-growth control-loop abstraction that does not do harness fanout. Because agent-runtime 0.70.0 removedcreateDriverfrom/loops, bu - Model: opencode/kimi-for-coding/k2p7
- Bridge attempts: 1
🎯 Usefulness — sound
Clean drift fix: rebuilds the researcher's fanout driver as a direct Driver literal over the still-exported interface after agent-runtime 0.70.0 removed createDriver/DriverDecision; behavior is pinned by loop integration tests.
- Integration: multiHarnessResearcherFanout is re-exported from src/profiles/index.ts:22 and exercises the real runLoop from agent-runtime 0.70.0 in tests/loops/researcher-integration.test.ts:92-117. The rebuilt driver (src/profiles/researcher.ts:214) implements the new Driver<Task,Output,Decision> interface verbatim — name:'dynamic', plan returns N task copies on empty history then [], decide returns 'done' (a
- Fit with existing patterns: Direct Driver literal IS the idiomatic 0.70.0 pattern — the createDriver helper was removed, so the kernel-authoritative interface is the only seam. Higher-level helpers (fanout, loopUntil, defineStrategy) would add ceremony without fitting the N-harness one-shot-fanout topology. No competing pattern in this repo.
- Real-world viability: Solid on the realistic paths. The kernel appends an Iteration even when a worker errors (Iteration.error set), so decide still sees length=N and returns 'done' — no stall. Harnesses array can't be empty (constructor defaults to 3 built-ins at src/profiles/researcher.ts:201). Winner falls through to defaultSelectWinner (best-valid, earliest-index) per the LoopResult contract. The 'continue' branch
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| opencode-kimi | glm | deepseek | aggregate | |
|---|---|---|---|---|
| Readiness | 89 | 83 | 76 | 76 |
| Confidence | 75 | 75 | 75 | 75 |
| Correctness | 89 | 83 | 76 | 76 |
| Security | 89 | 83 | 76 | 76 |
| Testing | 89 | 83 | 76 | 76 |
| Architecture | 89 | 83 | 76 | 76 |
Full multi-shot audit completed 3/3 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 3 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 3 changed files. Global verifier still owns final merge decision.
🟠 MEDIUM Duplicate agent-interface versions in resolution (0.8.0 vs 0.10.1) — pnpm-lock.yaml
sandbox@0.8.2 depends on @tangle-network/agent-interface@0.8.0 (L1283-1284) while agent-eval@0.95.1 and agent-runtime@0.70.0 depend on @tangle-network/agent-interface@0.10.1 (L1279-1280). pnpm resolves both into separate symlink targets, meaning the same logical interface type exists in two copies. If any code path passes an object instantiated through sandbox (using agent-interface@0.8.0 types) to agent-eval or agent-runtime (expecting agent-interface@0.10.1 types), instanceof checks, zod schemas, or type-only discriminators may fail silently at runtime. Verify sandbox-returned objects are not consumed directly by agent-eval/agent-runtime call sites in this repo.
🟡 LOW sandbox caret floor ^0.8.0 targets an unpublished version — package.json
package.json:78 declares "@tangle-network/sandbox": "^0.8.0". npm registry publishes 0.6.2 then 0.8.1 then 0.8.2 — there is no 0.8.0 release. The caret range still resolves correctly to 0.8.2 (verified in pnpm-lock.yaml) and satisfies agent-runtime@0.70.0's peer floor of >=0.8.0 <1.0.0, so this is purely cosmetic. Suggest aligning the floor to ^0.8.1 or ^0.8.2 to match an actual published tag and avoid implying a release that doesn't exist. No runtime impact.
🟡 LOW sandbox@0.8.0 exact version not published on npm — package.json
Line 78:
@tangle-network/sandbox@^0.8.0— version 0.8.0 does not exist on npm (published versions: 0.8.1, 0.8.2). The lockfile resolves to 0.8.2 so installs work, but the absent 0.8.0 base suggests a publish-revert or pre-release sequence. Non-blocking.
🟡 LOW Dual @tangle-network/agent-interface versions in lockfile — pnpm-lock.yaml
The lockfile resolves both @tangle-network/agent-interface@0.10.1 (agent-runtime 0.70.0 optional peer, agent-eval 0.95.1 dep) and @tangle-network/agent-interface@0.8.0 (sandbox 0.8.2 dep). Lines 466-469 and 1279-1283. This creates two isolated copies of the interface package. If agent-interface is a shared contract, passing typed objects between runtime/sandbox code paths could mismatch at runtime. Tests currently pass and this repo does not import agent-interface directly, so this is a nit, not a blocker. Fix: align sandbox's agent-interface dependency to ^0.10.0 or confirm the version split is safe.
🟡 LOW Dual zod instances (4.4.2 and 4.4.3) in the resolved tree — pnpm-lock.yaml
agent-interface@0.10.1 (new transitive dep of agent-eval@0.95.1) and agent-interface@0.8.0 (transitive dep of sandbox@0.8.2) both pin zod@4.4.3, while the root and agent-eval continue on zod@4.4.2. Lockfile lines: '+ zod@4.4.3:' (packages section) and 'zod: 4.4.3' under both agent-interface snapshots. Result: two copies of zod in node_modules. Impact is normally nil for a patch bump, but zod v4 has a history of schema instanceof checks failing across instances; any code that passes a schema from root into agent-interface (or vice versa) could trip a 'schema is not a ZodSchema' style error. No evidence this happens here — flagging as a watch-item, not a blocker. Fix only if a downstream instanceof failure surfaces: pin zod with pnpm.overrides to a single version.
🟡 LOW Two versions of @tangle-network/agent-interface coexist (0.8.0 + 0.10.1) — pnpm-lock.yaml
sandbox@0.8.2 depends on agent-interface@0.8.0 while agent-eval@0.95.1 depends on agent-interface@0.10.1. Both resolve, no peer violation, install succeeds. Minor concern only: if any shared type flows through both versions (e.g., a Message/Tool definition authored against 0.8.0 reaching code expecting 0.10.1), structural equality could silently drift. Not actionable in the lockfile itself; justifying a watch-item rather than a fix. The PR's stated scope is just the agent-eval + agent-runtime bumps, so the new agent-interface transitive is expected.
🟡 LOW Import-type lint warning in changed file — src/profiles/researcher.ts
Running
pnpm lintreports:src/profiles/researcher.tsusesimport { type AgentProfile, ... }instead ofimport type { AgentProfile, ... }. The original file already used this style, so the PR did not introduce it, but since the imports were edited (createDriver/DriverDecision removed, Iteration added) the warning is still present in the diff. Impact: style-only; no runtime effect. Fix: changeimport { type ... }toimport type { ... }.
🟡 LOW Unreachable 'continue' branch in decide function — src/profiles/researcher.ts
src/profiles/researcher.ts:218-219 — decide returns 'continue' when history.length === 0. In the kernel's calling order, plan() runs before decide(), so history is always non-empty when decide is called. The 'continue' branch is dead code. Not a correctness issue (the driver terminates correctly on 'done'), but dead code obscures intent. Consider removing the branch or adding a comment acknowledging it's a safety net.
🟡 LOW decide's history.length===0 branch is unreachable defensive code — src/profiles/researcher.ts
Lines 218-219:
decide: (history) => history.length === 0 ? 'continue' : 'done'. The kernel (agent-runtime 0.70 chunk-QXWGSDAQ.js:1240-1340) always calls plan first; if plan returns tasks the kernel runs them and grows history before invoking decide, and if plan returns [] the kernel breaks out of the loop and reaches decideAndFinalize only with whatever history already accumulated (still length>0 after round 0). So the 'continue' arm can never fire in this topology. Not a bug — the value is harmless and the branch reads defensively — but it is misleading: a reader may believe 'continue' is exercisable. Either drop the ternary (decide: () => 'done') or
🟡 LOW maxFanout cap (was 4) removed in migration — src/profiles/researcher.ts
src/profiles/researcher.ts:216-217 — The old createDriver internally clamped fanout width at 4 via validateMove (would throw PlannerError for >4 tasks). The new direct Driver has no fanout limit. Default harness count is 3, so default path is safe. A caller passing >4 harnesses would previously get an error; now it silently fans out all harnesses, which could cause resource/API pressure. Consider adding an explicit guard if >N harnesses is known to be problematic.
tangletools · 2026-06-21T20:15:22Z · trace
Propagates the new substrate floor:
@tangle-network/agent-eval^0.95.1and@tangle-network/agent-runtime^0.70.0. Raises the@tangle-network/sandboxpeer floor to^0.8.0. Resolved at install: agent-eval 0.95.1, agent-runtime 0.70.0, sandbox 0.8.2.Clean
pnpm run typecheck— 0 errorspnpm run build(tsup) — success (ESM + DTS)pnpm test— 114 passed, 5 skipped (live-networksources-liveonly), 0 failurespnpm run lint— 0 errors (3 pre-existing warnings in files this PR does not touch)main(git merge-tree)Drift fixed
agent-runtime 0.70.0 removed
createDriverand theDriverDecisiontype from the/loopssubpath (the new export surface is. /agent /intelligence /loops /profiles /mcp; theDriverinterface itself is still exported).src/profiles/researcher.ts—multiHarnessResearcherFanoutwas built oncreateDriver({ planner }). Rebuilt its single-fanout-then-stop topology as a directDriver<ResearchTask, ResearchOutput, FanoutDecision>literal over the still-exportedDriverinterface:name: 'dynamic',planissues N task copies on round 0 then[],decidereturns the kernel-terminal'done'after the fanout round.FanoutDecision = 'continue' | 'done'type to replace the removedDriverDecision.driver.name === 'dynamic', N iterations,result.decision === 'done', winner selected by the kernel'sdefaultSelectWinner).Remaining
None. Typecheck + build + tests + lint all green; no other call sites referenced the removed API.