Skip to content

diag(sync-client,hub-e2e): expose stranded-doc DocHandle state in smoke-diag#373

Merged
cscheid merged 1 commit into
mainfrom
diag/smoke-diag-handle-state
Jul 3, 2026
Merged

diag(sync-client,hub-e2e): expose stranded-doc DocHandle state in smoke-diag#373
cscheid merged 1 commit into
mainfrom
diag/smoke-diag-handle-state

Conversation

@gordonwoodhull

Copy link
Copy Markdown
Member

What

Implements lead 2 of the e2e reliability experiment log (claude-notes/research/2026-07-03-e2e-reliability-experiment-log.md, on branch research/e2e-reliability-log): the nightly smoke-all suite's dominant failure has been a render target that never syncs into the VFS, and the remaining candidate mechanisms — request frame lost in flight (requesting), automerge-repo's terminal cached verdict (unavailable), a storage load that never finished (loading), or no handle ever created — are indistinguishable in the timeout symptom. This makes the mechanism observable in every future failure log:

[smoke-diag] stage=EDITOR_NO_PREVIEW ... renderError="..." syncPeers=1 syncRetryTicks=12 syncRetryTimer=1 stranded="test.qmd=requesting+marker"
  • quarto-sync-client: new getSyncDiagnostics() — for every index-referenced file with no loaded handle, the raw DocHandle state + the client's own unavailable marker, plus connected-peer count and the unavailable-retry poll's cumulative tick count / timer state. In-memory reads only; no behavior change.
  • preview-runtime: re-exported via automergeSync, so the existing wasmRenderer test-hook namespace picks it up. No hub-client production-source change; hooks remain VITE_E2E-gated and tree-shaken from production bundles.
  • e2e previewExtraction: capturePreviewDiagnostics appends the syncPeers/syncRetryTicks/syncRetryTimer/stranded suffix after renderError, so existing log parsers are unaffected; degrades to nothing on bundles without the hook.

Verification

  • quarto-sync-client vitest 187/187 (includes new sync-diagnostics.test.ts: dangling entry → unavailable+marker with retry timer active; healthy project → empty).
  • tsc green for both packages; hub-client VITE_E2E=1 production build green.
  • End-to-end via a scratch Playwright spec against the real hub: observed syncPeers=1 syncRetryTicks=0 syncRetryTimer=0 in the emitted line (scratch spec deleted, not committed).
  • Dispatch smoke-all run on this branch: 2865465033478/78 passed, 0 flaky, 5.0m, so no failure line was emitted in CI; emission is validated by the unit tests + local e2e run above. (The clean run itself is further evidence for the samod-0.12 lead tracked in the research log.)

Why merge

The nightly runs against main; the field only starts appearing in nightly failure logs once merged. If the samod-0.12 era keeps the suite clean, this costs nothing; if flakes return, the first failing night identifies the mechanism instead of starting another guessing round.

…ke-diag

Implements lead 2 of the e2e reliability research log
(claude-notes/research/2026-07-03-e2e-reliability-experiment-log.md):
the nightly suite's dominant failure is a render target that never
syncs into the VFS, and the remaining candidate mechanisms — request
frame lost in flight ('requesting'), automerge-repo's terminal cached
verdict ('unavailable'), a storage load that never finished ('loading'),
or no handle ever created — are indistinguishable in the timeout
symptom. This makes them observable per failure.

- quarto-sync-client: new getSyncDiagnostics() — for every
  index-referenced file with no loaded handle, report the raw
  automerge-repo DocHandle state + the client's own unavailable marker,
  plus connected-peer count and the unavailable-retry poll's cumulative
  tick count / timer state. In-memory reads only. Unit-tested against
  the in-process test hub (dangling entry -> 'unavailable'+marker;
  healthy project -> empty).
- preview-runtime: re-export via automergeSync, so the existing
  wasmRenderer test-hook namespace picks it up (no hub-client src
  change; still VITE_E2E-gated, tree-shaken from production bundles).
- e2e previewExtraction: capturePreviewDiagnostics appends
  'syncPeers=N syncRetryTicks=N syncRetryTimer=0|1
  stranded="<path>=<state>[+marker],..."' to the [smoke-diag] line,
  after renderError so existing log parsers are unaffected; degrades to
  nothing on bundles without the hook.

Verified: quarto-sync-client vitest 187/187; sync-client +
preview-runtime tsc green; hub-client VITE_E2E=1 production build
green; end-to-end via a scratch Playwright spec against the real hub —
observed 'syncPeers=1 syncRetryTicks=0 syncRetryTimer=0' in the emitted
line for a healthy project (scratch spec deleted). Rust untouched.
@posit-snyk-bot

posit-snyk-bot commented Jul 3, 2026

Copy link
Copy Markdown

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@cscheid cscheid merged commit a94f1b3 into main Jul 3, 2026
9 checks passed
@cscheid cscheid deleted the diag/smoke-diag-handle-state branch July 3, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants