feat(supervise): max-live-workers concurrency cap + explicit worker metering opt-in#360
Merged
Merged
Conversation
…etering opt-in
Two equal-k fences on the supervisor spawn path.
(a) maxLiveWorkers cap: spawn_agent counted only the conserved budget pool
(total work), so a driver could flood real infra with N simultaneous
boxes. Add a configurable cap on not-yet-settled workers, enforced at
spawn BEFORE the pool reservation (fail closed: error 'max-live-workers').
Threaded supervise() -> supervisorAgent (both arms) -> driverAgent /
serveCoordinationMcp -> createCoordinationTools. Omit/<=0 = no cap, so
existing callers are unchanged.
(b) Worker metering: createWorktreeCliExecutor hardcoded budgetExempt:true,
a silent equal-k hole. Promote it to an explicit, documented
WorktreeCliExecutorOptions.budgetExempt (defaults true — a harness CLI
surfaces no usage, so metering it would record a fabricated zero the
no-silent-zeros rule forbids). Set false to meter a real-usage harness.
Tests: cap fails closed at the limit without touching the pool and frees a
slot on settle; uncapped admits past it; budgetExempt:false flips metering.
tangletools
approved these changes
Jun 22, 2026
tangletools
left a comment
Contributor
There was a problem hiding this comment.
✅ Auto-approved PR — d30039ae
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:35:16Z
Merged
drewstone
added a commit
that referenced
this pull request
Jun 22, 2026
…provenance + artifact lifecycle (#363) Bundles the build-phase PRs landed since 0.72.0: - #360 max-live-workers concurrency cap + explicit worker metering opt-in - #362 mounted-resource manifest + caller selection receipts on LoopResult - #361 artifact registry + marginal-lift ablation (rt#267 phase 1, ./lifecycle) - #359 preserve partial events on abort via typed SandboxRunAbortError Bumps package.json to 0.73.0, pins docs/canonical-api.md to 0.73.0, and adds decision-table rows for run provenance (result.provenance.mounts/selectionReceipts) and the artifact-lifecycle ablation (measureMarginalLift / ArtifactRegistry, /lifecycle).
21 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes rt#329. Two equal-k fences on the supervisor spawn path; both default to the prior behavior so every existing
supervise()caller is unchanged.(a) max-live-workers concurrency cap
spawn_agent(src/mcp/tools/coordination.ts) gated only on the conserved budget pool — that bounds total work, not simultaneous work, so a driver could flood real infra with N concurrent boxes/sandboxes.CoordinationToolsOptions.maxLiveWorkers: counts the scope's non-terminal nodes (pending/acquiring/running) and fails closed witherror: 'max-live-workers'before reserving from the pool when the cap is met.supervise()→supervisorAgent(both the router and sandbox arms) →driverAgent/serveCoordinationMcp→createCoordinationTools.<= 0= no cap (prior behavior; the pool stays the only fence).(b) Explicit worker metering opt-in
createWorktreeCliExecutorhardcodedbudgetExempt: true— a silent equal-k hole.WorktreeCliExecutorOptions.budgetExempt. It defaults totruebecause a coding-harness CLI surfaces no token/usd usage, so metering it would record a fabricated zero (the no-silent-zeros rule forbids that). Setfalseonly for a harness that surfaces real usage worth metering.Verification (all green, in order)
pnpm run build(examples need dist) — ✓pnpm run typecheck(project + examples) — ✓pnpm test— 1047 passed, 1 skipped, 0 failedpnpm run lint— ✓pnpm run docs:check(regenerateddocs/api, freshness OK) — ✓New tests:
spawn_agent fails closed at the maxLiveWorkers cap WITHOUT touching the pool— admits to the cap, rejects the next without callingscope.spawn, frees a slot on settle, and the uncapped tools admit past the prior cap.budgetExempt: false opts the leaf into metering+ renamedis budgetExempt by default.