diag(sync-client,hub-e2e): expose stranded-doc DocHandle state in smoke-diag#373
Merged
Conversation
…ke-diag
Implements lead 2 of the e2e reliability research log
(claude-notes/research/2026-07-03-e2e-reliability-experiment-log.md):
the nightly suite's dominant failure is a render target that never
syncs into the VFS, and the remaining candidate mechanisms — request
frame lost in flight ('requesting'), automerge-repo's terminal cached
verdict ('unavailable'), a storage load that never finished ('loading'),
or no handle ever created — are indistinguishable in the timeout
symptom. This makes them observable per failure.
- quarto-sync-client: new getSyncDiagnostics() — for every
index-referenced file with no loaded handle, report the raw
automerge-repo DocHandle state + the client's own unavailable marker,
plus connected-peer count and the unavailable-retry poll's cumulative
tick count / timer state. In-memory reads only. Unit-tested against
the in-process test hub (dangling entry -> 'unavailable'+marker;
healthy project -> empty).
- preview-runtime: re-export via automergeSync, so the existing
wasmRenderer test-hook namespace picks it up (no hub-client src
change; still VITE_E2E-gated, tree-shaken from production bundles).
- e2e previewExtraction: capturePreviewDiagnostics appends
'syncPeers=N syncRetryTicks=N syncRetryTimer=0|1
stranded="<path>=<state>[+marker],..."' to the [smoke-diag] line,
after renderError so existing log parsers are unaffected; degrades to
nothing on bundles without the hook.
Verified: quarto-sync-client vitest 187/187; sync-client +
preview-runtime tsc green; hub-client VITE_E2E=1 production build
green; end-to-end via a scratch Playwright spec against the real hub —
observed 'syncPeers=1 syncRetryTicks=0 syncRetryTimer=0' in the emitted
line for a healthy project (scratch spec deleted). Rust untouched.
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Implements lead 2 of the e2e reliability experiment log (
claude-notes/research/2026-07-03-e2e-reliability-experiment-log.md, on branchresearch/e2e-reliability-log): the nightly smoke-all suite's dominant failure has been a render target that never syncs into the VFS, and the remaining candidate mechanisms — request frame lost in flight (requesting), automerge-repo's terminal cached verdict (unavailable), a storage load that never finished (loading), or no handle ever created — are indistinguishable in the timeout symptom. This makes the mechanism observable in every future failure log:getSyncDiagnostics()— for every index-referenced file with no loaded handle, the raw DocHandle state + the client's own unavailable marker, plus connected-peer count and the unavailable-retry poll's cumulative tick count / timer state. In-memory reads only; no behavior change.automergeSync, so the existingwasmRenderertest-hook namespace picks it up. No hub-client production-source change; hooks remainVITE_E2E-gated and tree-shaken from production bundles.capturePreviewDiagnosticsappends thesyncPeers/syncRetryTicks/syncRetryTimer/strandedsuffix afterrenderError, so existing log parsers are unaffected; degrades to nothing on bundles without the hook.Verification
quarto-sync-clientvitest 187/187 (includes newsync-diagnostics.test.ts: dangling entry →unavailable+marker with retry timer active; healthy project → empty).tscgreen for both packages; hub-clientVITE_E2E=1production build green.syncPeers=1 syncRetryTicks=0 syncRetryTimer=0in the emitted line (scratch spec deleted, not committed).Why merge
The nightly runs against
main; the field only starts appearing in nightly failure logs once merged. If the samod-0.12 era keeps the suite clean, this costs nothing; if flakes return, the first failing night identifies the mechanism instead of starting another guessing round.