fix(sandbox-run): preserve partial events on abort via typed SandboxRunAbortError#359
Conversation
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — ba9845fd
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:33:02Z
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 2 (2 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 197.4s (2 bridge agents) |
| Total | 197.4s |
💰 Value — sound-with-nits
Preserves partial event traces on sandbox-run abort by throwing a typed SandboxRunAbortError carrying drained events and the last readError, while keeping err.name === 'AbortError' backward-compatible for existing callers; a shared isAbortError helper is added but not used to consolidate two existin
- What it does: In src/runtime/sandbox-run.ts:205-247, the settle() function previously collected stream events into a local array and lost them on abort. Now it throws a new SandboxRunAbortError (src/runtime/sandbox-run.ts:76-87) that carries the partial events array and optional readError at every abort point: mid-stream drain catch, pre-read signal guard, and per-attempt artifact read-retry guard. The error's
- Goals it achieves: Makes aborted/timed-out sandbox runs diagnosable: a caller can now distinguish never-started (events: []), looped-but-no-terminal-result (many events), and produced-then-cancelled cases. It also prevents silently reading a stale workspace artifact after cancel and preserves the last readError when abort interrupts the retry loop.
- Assessment: Good change. The goal is coherent, the implementation is minimal and stays within the existing sandbox-run facade, and the backward-compatibility story (name === 'AbortError') is sound. Tests in tests/runtime/sandbox-run.test.ts:293-389 were updated to assert the new partial-trace behavior across the three abort paths.
- Better / existing approach: none for the core feature — there is no existing typed abort error that carries partial sandbox events. However, the new shared isAbortError in src/runtime/util.ts duplicates local helpers that already exist at src/runtime/run-loop.ts:755 and src/runtime/supervise/scope.ts:776. run-loop.ts already imports from './util' and could trivially use the shared one; scope.ts has a slightly looser object-s
- Model: opencode/kimi-for-coding/k2p7
- Bridge attempts: 1
🎯 Usefulness — sound-with-nits
A backward-compatible typed abort error that carries partial events/readError out of settle() — fixes a real diagnosability gap at the producer seam; natural consumers exist and can adopt incrementally.
- Integration: Strong. Exported from the runtime barrel next to openSandboxRun/TurnResult (src/runtime/index.ts:233), thrown at every abort site inside settle() (sandbox-run.ts:215 pre-read stream catch, :221 pre-read guard, :231 retry-loop guard), and preserves name==='AbortError' so the existing cross-kernel matchers in run-loop.ts:744,755-756 and supervise/scope.ts:631,776-783 keep matching unchanged. The new
- Fit with existing patterns: Fits the codebase grain. The pattern — a typed Error subclass carrying diagnostic payload, with name kept as 'AbortError' for cross-kernel duck-typing — mirrors how ValidationError is used as a typed sentinel in run-loop.ts:752. The claim in the PR body about backward compatibility is verifiable: all three existing matchers check err.name==='AbortError' (or duck-type on .name only), and the new cl
- Real-world viability: Solid on the abort paths it covers. All three conversion sites sit at well-defined await boundaries (for-await pull, pre-read guard, retry-loop guard) — no races. The try/catch at sandbox-run.ts:212-217 correctly re-throws non-abort stream errors unchanged and only converts AbortError-shaped throws. events-deliverable mid-stream abort, artifact-deliverable pre-read abort, and artifact-deliverable
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
💰 Value Audit
🟡 New shared isAbortError duplicates existing local helpers [duplication] ``
The PR adds isAbortError to src/runtime/util.ts:55-58, but identical/semantically-equivalent helpers already exist at src/runtime/run-loop.ts:755 (exact same implementation) and src/runtime/supervise/scope.ts:776 (object-shape variant). run-loop.ts already imports from './util' so the switch is one line; scope.ts would need a new import. Consolidating would prevent the helpers from drifting and justify adding the shared primitive.
🎯 Usefulness Audit
🟡 isAbortError now exists in three places — consolidate the two stale local copies [ergonomics] ``
This PR adds the canonical shared isAbortError at src/runtime/util.ts:55-58 (correct home, next to abortError/throwAbort/throwIfAborted). But two pre-existing local copies remain: run-loop.ts:755-756 and supervise/scope.ts:776-783 (the scope.ts copy is duck-typed without instanceof Error, slightly looser). The codebase now has three implementations of the same abort predicate. Not a blocker for this PR — but a follow-up should delete the two locals and import from util.ts so the contract stays s
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| opencode-kimi | glm | deepseek | aggregate | |
|---|---|---|---|---|
| Readiness | 79 | 77 | 86 | 77 |
| Confidence | 70 | 70 | 70 | 70 |
| Correctness | 79 | 77 | 86 | 77 |
| Security | 79 | 77 | 86 | 77 |
| Testing | 79 | 77 | 86 | 77 |
| Architecture | 79 | 77 | 86 | 77 |
Full multi-shot audit completed 2/2 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 4 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 4 changed files. Global verifier still owns final merge decision.
🟠 MEDIUM isAbortError ignores DOMException aborts already handled elsewhere — src/runtime/util.ts
Line 56-58:
isAbortErrorreturns true only forerr instanceof Error && err.name === 'AbortError'.src/runtime/supervise/runtime.ts:382-383and:956-957explicitly detecte instanceof DOMException && e.name === 'AbortError', andtests/loops/inbox.test.ts:114/tests/mcp/task-queue.test.ts:130model DOMException aborts.DOMExceptionis not aninstanceof Error, so a DOMException-shaped abort thrown by a lower layer would fail the newisAbortErrorcheck insandbox-run.ts:215and the partial event trace would be lost, contradicting the PR's stated purpose.src/runtime/supervise/scope.ts:776-782already has a more permissive implementation (`typeof
🟡 LOW Final-read-abort returns TurnResult instead of throwing SandboxRunAbortError — src/runtime/sandbox-run.ts
In the retry loop (lines 230-241), the abort guard runs ONLY at the start of each iteration. If the 4th
box.fs.read()itself aborts synchronously (signal trips mid-call), the catch at line 236 records readError='aborted',attempt < readAttempts-1is false so no sleep, and the loop falls through toreturn { out, events, readError }at line 242. The caller sees a successful TurnResult carrying a readE
🟡 LOW Partial events on abort are not surfaced into the runtime hook payload — src/runtime/sandbox-run.ts
The motivation for SandboxRunAbortError is diagnosability of aborted turns (per the class JSDoc lines 67-75). But when a turn throws it, the error-path hook emission at lines 299-315 / 342-358 builds its payload via turnPayload(...) which only captures
error: errorMessage(error)= 'aborted' (line 200). The carriedeventsandreadErroron the typed error never reach the runtime hook event, so a ho
🟡 LOW SandboxRunAbortError drops the original abort cause/stack — src/runtime/sandbox-run.ts
Lines 82-86 and 215: when an abort error is caught and wrapped, the original error is discarded instead of passed as
{ cause: err }. The partial events are preserved (the PR's goal), but the original stack and any SDK-specific abort metadata are lost, making post-mortem debugging harder when an abort originates from a lower layer. Fix:super('aborted', { cause: err })in the constructor.
🟡 LOW SandboxRunAbortError loses DOMException identity vs supervise/runtime's instanceof check — src/runtime/sandbox-run.ts
The class JSDoc (lines 72-74) claims existing callers in 'the loop kernel, scope, supervise runtime' keep matching unchanged. True for the name-based matchers (run-loop.ts:744, scope.ts:631). But supervise/runtime.ts:381-385 ALSO gates on
e instanceof DOMException && e.name === 'AbortError'. The SDK stream surfaces aborts as DOMException; wrapping it in a plain Error subclass (super('aborted') at sandbox-run.ts:83) drops DOMException identity, so that branch would miss the wrapped error if openSandboxRun is ever routed through the supervise runtime. No current consumer breaks (openSandboxRun today feeds only bench/src/{worker,fleet,cloud-loop,search-bench,co
🟡 LOW Duplicate isAbortError implementations — src/runtime/util.ts
A new
isAbortErroris added in util.ts while near-identical private helpers already exist insrc/runtime/run-loop.ts:755-757and a broader one insrc/runtime/supervise/scope.ts:776-782. This is a missed opportunity to canonicalize the helper and reduce drift. Not blocking for this shot but worth a follow-up cleanup.
🟡 LOW isAbortError now duplicated in 3 files — src/runtime/util.ts
This PR exports a new isAbortError at util.ts:56-58, but two local copies remain: run-loop.ts:755-757 and supervise/scope.ts:776 (same body,
err instanceof Error && err.name === 'AbortError'). The PR adds a third source of truth rather than consolidating. No behavioral divergence today (identical predicates), but it's exactly the kind of seam the PR is supposedly centralizing. Fix: delete the two local copies and import from ./util in both files. Trivial, low-risk follow-up.
🟡 LOW isAbortError pattern inconsistency with scope.ts — src/runtime/util.ts
util.ts:57 uses
err instanceof Error && err.name === 'AbortError'. scope.ts:776-781 uses a more permissive check that doesn't requireinstanceof Error(handles plain objects with.name). This is pre-existing (run-loop.ts:755 mirrors util.ts's stricter pattern) and Node >=20 guarantees DOMException instanceof Error, so not a practical bug. But if a sandbox SDK or fetch polyfill throws a non-Error object with name='AbortError', the stream catch at sandbox-run.ts:214 would re-throw it raw instead of wrapping it in SandboxRunAbortError, losing partial events. Fix: align isAbortError implementations across the codebase in a follow-up.
🟡 LOW No explicit abort test coverage for resume() — tests/runtime/sandbox-run.test.ts
Lines 293-388: All three new abort tests only exercise
run.start(). Theresume()method calls the samesettle()closure and is subject to identical abort behavior. Whilesettleis not parameterized differently forresume, a future refactor could accidentally diverge the call paths without a test catching it. Add one test withstart()→resume()→ abort-on-resume to close the gap.
🟡 LOW No test for abort before any events are yielded — tests/runtime/sandbox-run.test.ts
Lines 327-355: The mid-stream abort test covers 1-event-then-abort, but the scenario where the signal is already aborted before the stream generator's first yield is untested. In that case
collectedwould be[]and aSandboxRunAbortErrorwith empty events should still be thrown (diagnosable as 'never started' per the class JSDoc at sandbox-run.ts:70). The fake client's per-iteration signal check supports this but it's not exercised.
🟡 LOW Test 2 name overpromises 'never-started stays empty' but tests mid-stream abort — tests/runtime/sandbox-run.test.ts
Lines 327-355: the test title says 'carries ONLY the events drained before a mid-stream abort (never-started stays empty)' but the body aborts after the FIRST event drains (onEvent i===1), so it verifies the 1-event case, not the 0-event/never-started case. A pre-aborted signal (controller.abort() before run.start) with an empty carried-events array is never asserted. Impact: the 'never-started' branch of the diagnostic claim (events: [] distinguishes never-started from looped) is untested. Fix: either rename to drop the 'never-started' clause, or add a 4th test that pre-aborts and asserts
err.events).toEqual([]).
🟡 LOW Tests 2 and 3 skip the err.name === 'AbortError' cross-kernel contract assertion — tests/runtime/sandbox-run.test.ts
Lines 353-354 and 383-385: these tests assert instanceof SandboxRunAbortError and .events/.readError but do NOT assert
(err as Error).name === 'AbortError'. Test 1 (line 321) does. The class hardcodesoverride readonly name = 'AbortError'so it's correct today, but the contract that lets existing run-loop/scope matchers keep working unmodified is the PR's central compatibility claim — worth asserting in every abort test, not just one. Fix: addexpect((err as Error).name).toBe('AbortError')to tests 2 and 3.
tangletools · 2026-06-22T13:42:53Z · trace
…unAbortError
settle() collected stream events into a local array that was lost when an
abort fired mid-turn — the bare AbortError rethrown by start()/resume()
carried no partial state, so an aborted/timed-out run was undiagnosable
(never-started vs looped vs produced-nothing).
Throw a typed SandboxRunAbortError carrying { events, readError? } at every
abort point in settle(): the stream-drain catch, the pre-read guard, and the
read-retry guard. name === 'AbortError' so the existing cross-kernel
err.name === 'AbortError' matchers (run-loop, scope, supervise runtime) keep
matching it unchanged. Add a shared isAbortError() to util alongside the
other abort primitives.
Tests now assert the partial trace is preserved: events drained before a
pre-read abort, only-events-before a mid-stream abort, and events + last
readError when the read-retry loop is aborted.
ba9845f to
1f2e149
Compare
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 1f2e1490
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-22T13:47:56Z
…provenance + artifact lifecycle (#363) Bundles the build-phase PRs landed since 0.72.0: - #360 max-live-workers concurrency cap + explicit worker metering opt-in - #362 mounted-resource manifest + caller selection receipts on LoopResult - #361 artifact registry + marginal-lift ablation (rt#267 phase 1, ./lifecycle) - #359 preserve partial events on abort via typed SandboxRunAbortError Bumps package.json to 0.73.0, pins docs/canonical-api.md to 0.73.0, and adds decision-table rows for run provenance (result.provenance.mounts/selectionReceipts) and the artifact-lifecycle ablation (measureMarginalLift / ArtifactRegistry, /lifecycle).
Problem (rt#325)
settle()insrc/runtime/sandbox-run.tscollected stream events into a local array. When the for-await drain threw on abort — or the pre-read / read-retry abort guards fired — that array was lost:start()/resume()rethrew the bareAbortError, which carried no partial state. An aborted or timed-out run was therefore undiagnosable — you couldn't tell never-started (no events) from looped (many events, no terminalresult) from produced-nothing-then-cancelled.Fix
Define and throw a typed
SandboxRunAbortErrorthat carries{ events, readError? }at every abort point insidesettle():try/catch(the stream itself throwsAbortErroron cancel),readError).name === 'AbortError', so the existing cross-kernelerr.name === 'AbortError'matchers (run-loop,scope,superviseruntime) keep matching it unchanged — no behavior change for callers that only branch on abort-ness; the partial trace is now available to any caller that wants to diagnose. Exported from the runtime barrel alongsideopenSandboxRun/TurnResult.Also adds a shared
isAbortError()tosrc/runtime/util.tsnext to the other abort primitives (abortError/throwAbort/throwIfAborted).Tests
tests/runtime/sandbox-run.test.tspreviously asserted the lossy behavior (only that an abort threw and no read happened). Replaced with assertions that the partial trace is preserved:readErrorare carried when the read-retry loop is aborted mid-retry.The fake
streamPromptnow honors the abort signal (models the real backend throwingAbortErroron the next pull) and can emit a scripted event sequence.Verification
pnpm run build— green (examples need dist)pnpm run typecheck(incl.typecheck:examples) — cleanpnpm test— 1047 passed, 1 skipped (sandbox-run file: 16 tests)pnpm run lint— clean (302 files)origin/main