Skip to content

[DRAFT] [FEAT]: GUI Conversation Tree UI #2014

Draft
spencrr wants to merge 83 commits into
microsoft:mainfrom
spencrr:dev/spencrr/gui-tree-ui
Draft

[DRAFT] [FEAT]: GUI Conversation Tree UI #2014
spencrr wants to merge 83 commits into
microsoft:mainfrom
spencrr:dev/spencrr/gui-tree-ui

Conversation

@spencrr

@spencrr spencrr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Draft: Complete Tree UI MVP

Warning

This is more of a reference PR than a good starting point for a real implementation.
There are a lot of dead-end code paths and over-build complexities!

Just try running locally with VITE_ENABLE_TREE_UI=true uv run dev.py

Summary

This PR wraps up the Tree UI MVP as a browser-usable operator experience for opening historical attacks as editable conversation trees, inspecting and extending paths, creating/pruning fan branches, applying converter transforms, and recovering from common edit/refresh mistakes.

The MVP keeps the tree model primarily client-owned while integrating with existing attack history, conversations, targets, converters, and refresh dispatch APIs.

High-Level Concepts

Conversation Trees

The UI represents an attack as a ConversationTree made of typed nodes:

  • root_prompt
  • user_turn
  • converter
  • send
  • fan
  • score
  • import_message

Edges encode parent/child relationships and fan slot indices. The client reducer owns structural edits, stale propagation, branch/clone semantics, fan creation, prune-to-picked behavior, and local recovery states.

Path Chat

The tree canvas now pairs with a selected-path chat pane. Operators can focus a node/path, inspect the selected path as a linear conversation, add follow-up prompts, and see pending responses without losing the tree context.

Refresh Waves

Refresh runs through the existing runner wave model: the client selects dispatchable sends, partitions clean context versus stale suffix, builds labels, and dispatches through the attack API. The UI now makes wave state and preflight failures easier to understand before a backend call is attempted.

Historical Reconstruction

Opening an attack as a tree reconstructs tree shape from backend conversations and labels. Single-conversation attacks become linear trees; multi-conversation attacks can be merged into path branches. Target registry names are recovered from AttackSummary.target.target_registry_name when available so recovered trees can refresh without empty-target failures.

UI Captures

Merged tree with path chat

Shows the loaded historical attack opened as a merged tree, with the path-chat pane available for focused linear inspection.
image
image

Follow-up prompt creates pending response / Attempt fan pruned to picked path

Follow.Up.Prompts.mp4

Converter transform branch

Converters.mp4

Notable Details

  • Adds target_registry_name to target summary DTOs and frontend mirrors.
  • Hydrates root target registry names during Open-as-tree and reload reconstruction.
  • Shows an explicit No target chip and blocks refresh with a clear preflight modal when the target is missing.
  • Adds configurable attempt fan creation with a validated 2-50 count picker.
  • Adds true non-root Branch from here semantics: keep root-to-selected path plus selected subtree, exclude unrelated siblings.
  • Adds converter transform nodes and a ConverterCard, including direct baseline behavior for converter insertion.
  • Adds path-chat focus actions and visible path chat layout.
  • Adds prune-to-picked fan recovery.
  • Improves the detail drawer with full text, copy, execution metadata, target/converter metadata, and latest-error separation.
  • Updates Tree UI browser coverage, including a new MVP acceptance spec with screenshots.
  • Updates the V1 shipability plan and preserves PR7 design/review artifacts.

Tests

Validated locally with:

  • git diff --check
  • npm run type-check
  • npm test -- --runInBand --silent --json --outputFile=/tmp/tree-mvp-jest.json ...
    • 14 suites passed
    • 377 tests passed
  • uv run pytest tests/unit/backend/test_mappers.py::TestAttackResultToSummary
    • 26 passed
  • VITE_ENABLE_TREE_UI=true npm run test:e2e -- e2e/tree.spec.ts e2e/tree-mvp.spec.ts --project=mock --workers=1
    • 9 browser tests passed

Areas For Improvement

Backend Conversation Tree Persistence

The MVP still relies heavily on frontend-owned tree state plus attack labels for reconstruction. A stronger backend model would persist first-class conversation tree records:

  • conversation_tree
  • conversation_tree_node
  • conversation_tree_edge
  • node kind, params, state, version, and parent ids
  • fan slot ids and tombstones
  • parent tree/source conversation lineage
  • durable layout/manual position metadata if desired

Backend Wave Representation

Waves are currently client-orchestrated and reflected through labels/execution records. A future backend wave model could persist:

  • wave id
  • trigger kind
  • selected node set
  • operator and operation labels
  • dispatch start/completion timestamps
  • per-leaf outcomes
  • retry/cancel/queued state
  • parent conversation tree id
  • generated attack/conversation ids

Richer Reconstruction

Historical reconstruction now covers more MVP paths, but fully faithful reconstruction of nested fan topology, converter lineage, deleted slots, and older attack records should eventually be backend-assisted instead of inferred from labels and conversation messages.

Collaboration And Concurrency

The current model is suitable for a single operator session. Persisted node versions plus optimistic concurrency would make collaborative editing, cross-tab recovery, and stale write detection more reliable.

Backend-Owned Tree Search And Audit

Once trees and waves are first-class backend entities, History could filter and display tree-level state directly: latest wave, failed leaves, parent tree lineage, target lineage, unresolved stale nodes, and branch ancestry.

Appendix

Prompt To Attempt Fan

flowchart TD
    P[Prompt] --> AF{{Attempt fan}}

    AF --> A1[Attempt 1]
    AF --> A2[Attempt 2]
    AF --> A3[Attempt N]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    classDef fanNode fill:#ffffff,stroke:#1f6b84,color:#1f6b84,stroke-width:2px;
    class P,A1,A2,A3 treeNode;
    class AF fanNode;
Loading

Prompt To Converter Fan

flowchart TD
    P[Prompt] --> CF{{Converter fan}}

    CF --> C1[Converter 1]
    CF --> C2[Converter 2]

    C1 --> R1[Response 1]
    C2 --> R2[Response 2]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    classDef fanNode fill:#ffffff,stroke:#1f6b84,color:#1f6b84,stroke-width:2px;
    class P,C1,C2,R1,R2 treeNode;
    class CF fanNode;
Loading

Prompt To Converter Settings Fan

flowchart TD
    P[Prompt] --> SF{{Converter settings fan}}

    SF --> S1["Base64Converter<br/>encoding = b64encode"]
    SF --> S2["Base64Converter<br/>encoding = urlsafe_b64encode"]
    SF --> S3["Base64Converter<br/>encoding = b64decode"]

    S1 --> R1[Response 1]
    S2 --> R2[Response 2]
    S3 --> R3[Response 3]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    classDef fanNode fill:#ffffff,stroke:#1f6b84,color:#1f6b84,stroke-width:2px;
    class P,S1,S2,S3,R1,R2,R3 treeNode;
    class SF fanNode;
Loading

Response Branches Into Follow-Ups

flowchart TD
    P[Prompt] --> R0[Response]

    R0 --> F1[Follow up 1]
    R0 --> F2[Follow up 2]

    F1 --> R1[Response]
    F2 --> R2[Response]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    class P,R0,F1,F2,R1,R2 treeNode;
Loading

Similar Projects

spencrr added 30 commits June 15, 2026 13:43
Captures the rev-18 design state across all three GUI tree-view docs
before PR1 (backend operator-validation relocation) starts landing
implementation. Rev 18 closures (per 01 §0):

- Q.S.1 DECIDED: V1.0 ships without intra-wave memoization; Crescendo
  cost cliff is documented in §1.2 instead.
- Q.S.2 DECIDED: operator-as-tag (honor-system); §9.4.5 scaled back to
  relocation-only, no anonymous-rejection. Preserves the no-labels
  early-return.
- Q.S.3 remains a V1.0 gate item pending the Q.S.4 Crescendo experiment.
- Rubber-duck cheap wins: per-leaf ExecutionRecord timing fields,
  per-WaveEvent emittedAt, version: number on ConversationTreeNodeBase
  for V2 forward-compat, i18n string-registry V1.0 commitment, FanNode
  polymorphism honest naming, cost-preview tooltip on the refresh
  affordance, permanent failure class surfaced distinctly, client-side
  telemetry-vs-privacy line in §15.
- F7 mechanical sweep: backend .py:L<n> citations refreshed.

These docs are the contract that PR1-PR7 implement against.
V1.0 tree-UI PR1 per doc/gui/design/01_tree_primitives.md §9.4.5.

Problem
-------
Today's check at pyrit/backend/services/attack_service.py reads from
piece.labels["operator"], which is written by an attack_mappers.py path
marked `removed_in="0.16.0"`. After that deprecation lands, the
piece-label check silently no-ops and the server-side operator
isolation disappears for tree-UI traffic, leaving only the UI posture
(client-side, bypassable via direct REST call).

Change
------
Relocate the source of truth to `AttackResult.labels["operator"]` so
the check survives the 0.16.0 piece-label-write deprecation. The
signature changes from `(conversation_id=, request=)` to
`(ar=, request=)`: the caller (add_message_async) already has the AR
in scope, so passing it directly avoids a duplicate lookup and makes
the dependency explicit.

Backward-compat fallback
------------------------
When ar.labels has no operator key, fall back to reading from existing
piece labels — matches the pre-relocation behavior for legacy ARs that
were tagged at the piece level only. Bounded fallback: it dies with
the deprecated piece-label write path in 0.16.0.

The fallback was chosen over pure relocation per user input: if any
production AR exists without an AR-level operator label, pure
relocation would silently disable enforcement for those rows. The 3
LOC + 2 tests cost is well below the silent-disablement risk.

Honor-system preserved (Q.S.2)
-------------------------------
The no-labels early-return is preserved by design. Anonymous requests
(request.labels absent / empty / missing "operator" key) pass
unchallenged. The operator is a tag the operator picks — not an auth
claim. The V1.0 posture defends against accidental mis-attribution and
casual cross-operator extensions; motivated bypass is out of scope
until V1.1+ multi-operator collaboration revisits.

TDD
---
Wrote 11 tests in `TestValidateOperatorMatch` + `TestAddMessageOperatorIntegration`
covering:
  - Three honor-system early-returns (no labels, empty labels, no
    operator key).
  - Four AR-first paths (match passes; mismatch raises; precedence
    rule when both sources present; no-enforcement when nothing to
    compare).
  - Three backward-compat fallback paths (mismatch raises; match
    passes; empty everywhere passes).
  - One integration test through add_message_async confirming the
    call site is wired with `ar=ar`.
All 11 failed for the right reasons against the original
implementation (TypeError on `ar=` kwarg; DID NOT RAISE on the AR-only
mismatch case). All pass after the relocation. Backend suite green:
652 passed, 4 skipped, 0 regressed.

PR sequencing
-------------
Per design doc §9.4.5, this PR must merge before the V1.0 GUI PR. The
PyRIT version bump that signals "tree-UI safe" is a separate concern
tracked by the §9.4.5 startup version gate in the upcoming PR3.
…essagePiece DTO

V1.0 tree-UI PR2 per doc/gui/design/01_tree_primitives.md §9.4.4 (b).

Problem
-------
The response DTO `MessagePiece` at pyrit/backend/models/attacks.py drops
two domain fields the V1.0 tree-UI must read at reload time:

1. `original_prompt_id` — the lineage-root piece id, preserved across
   `Message.duplicate()` so descendants share the same root. The
   resolver in 03 §4.1 reads it from prepended pieces to keep
   lineage chains intact when re-prepending clean-prefix history.
2. `converter_identifiers` — the sequential converter pipeline that
   produced the piece's converted_value. Without it, reload-
   reconstruction (§9.4.1) renders UserTurnNodes with empty
   converter pipelines indistinguishable from "no converter ever
   applied", and the next Refresh silently fires without the
   operator's authored converters. Also load-bearing for
   `Fan(axis='converter')` variant-payload reconstruction (§9.3.1)
   which derives `variants[s].payload.converters` from the
   fan-child leaf's first user-turn `converter_identifiers`.

Change
------
- Add `original_prompt_id: str | None = None` (defaults to None for
  defensive null-handling, though persisted domain pieces always have
  a non-null value via `_set_original_prompt_id_default`).
- Add `converter_identifiers: list[ComponentIdentifierField] = []`
  (uses the same annotated alias the domain `MessagePiece` uses; the
  PlainSerializer flattens each ComponentIdentifier to the wire shape
  the frontend reads via `ComponentIdentifierField.model_dump()`).
- Update `pyrit_messages_to_dto_async` to populate both. The
  `original_prompt_id` is cast to `str()` since the domain field is
  `uuid.UUID | None`; the `converter_identifiers` is a defensive
  `list(...)` copy.

Reusing ComponentIdentifierField (rather than defining a parallel
ComponentIdentifierDto) keeps the DTO surface honest: the wire shape
matches the domain shape's `model_dump()` output, so the round-trip
contract is structural rather than maintained-in-two-places.

The PR2 contract spelled out in §9.4.4: empty list (not None) means
"no converter applied"; the field being declared is what makes that
distinguishable from "DTO missing the field" on the TypeScript side.

TDD
---
Wrote 6 tests:
  - 4 in `TestPyritMessagesToDto`: exposes original_prompt_id;
    handles None defensively; exposes converter_identifiers with
    round-trippable shape; empty list when domain has no converters.
  - 2 in new `TestMessagePieceDtoDefaults`: direct DTO instantiation
    asserting defaults `[]` and None, plus a JSON-round-trip via
    model_dump() proving the frontend gets the flat shape.
All 6 failed for the right reasons against the original mapper
(KeyError 'original_prompt_id' on dumped DTO; absent field on direct
construction). All pass after the DTO + mapper changes. Backend
suite green: 658 passed (6 new), 4 skipped, 0 regressed.

PR sequencing
-------------
Per §9.4.5 PR sequencing: PR2 ships before the GUI PR3 so the V1.0
frontend types can reference these fields. Build-time check (the
auto-reverse code reads them; TS fails if absent) is the mandatory
enforcement; PR2's landing makes the gate pass.
… DTO fields

V1.0 tree-UI PR3a per doc/gui/design/01_tree_primitives.md §9.4.4 (a).
The TS-side counterpart of backend PR1 (a22854cb1) + PR2 (d1bbeed0d):
extends the frontend API surface to match the new wire shape.

Types added / extended
----------------------
- `CreateAttackRequest.prepended_conversation?: PrependedMessageRequest[]`
  — the runner's central dispatch field; carries clean-prefix history
  per §4.1 / §3.3. Optional so the existing chat tab's
  `source_conversation_id` + `cutoff_index` path keeps working
  unchanged.
- `PrependedMessageRequest` (new) — mirrors the backend's
  PrependedMessageRequest. `role` is the four-value ChatMessageRole
  literal; multimodal turns bundle multiple pieces into one message.
- `ComponentIdentifier` (new) — flat shape from
  `ComponentIdentifier.model_dump()`; `class_name` + `class_module` +
  `params` are required (the V1.0 runner reads these for ConverterRef
  reconstruction in §9.3.1). `hash` / `pyrit_version` / `eval_hash` /
  `children` are declared optional so the wire payload type-checks
  regardless of which optional fields the backend chooses to populate.
- `BackendMessagePiece.original_prompt_id: string | null` — required
  on every PR2-or-newer payload; null only on defensive-test inputs.
- `BackendMessagePiece.converter_identifiers: ComponentIdentifier[]` —
  required; empty list = no converter applied (distinguishable from
  field-missing by being present).

Contract tests
--------------
`frontend/src/types/treeUi.contract.test.ts` — 13 tests using TS
`satisfies` for compile-time shape verification + runtime sanity for
the optional-field defaults. The "contract" is that backend payloads
matching the documented PR2 shape produce usable typed values; if
the frontend types drift, the satisfies clauses fail at compile time.

Test infra bug fixed
--------------------
Latent bug in tsconfig.test.json: the file `extends` tsconfig.json,
which has `"exclude": ["src/**/*.test.ts", ...]`. Per TypeScript's
extends semantics, the inherited `exclude` keeps applying unless the
child overrides it — so the child's `include` patterns were no-ops
and `tsc -p tsconfig.test.json` compiled only jest.config.ts. ts-jest
doesn't type-check at run time, so test-file type drift went uncaught
indefinitely.

Fix: add `"exclude": []` to tsconfig.test.json so the `include`
patterns take effect. The PR3a contract tests rely on this:
running `npx tsc -p tsconfig.test.json` is what proves the
`satisfies` clauses are actually checked.

Pre-existing test type errors surfaced
--------------------------------------
The fix surfaces pre-existing type errors in other test files
(api.test.ts, AttackHistory.test.tsx, services/api.ts) that have
been latent since the config bug was introduced. These are
intentionally NOT fixed in this PR — they're unrelated to the
tree-UI work and need their own focused fix. `npm run type-check`
(the existing CI-wired script targeting tsconfig.json) is unchanged
and continues to pass. The tree-UI test surface is validated via
`npx tsc -p tsconfig.test.json | grep treeUi.contract` (clean).

TDD
---
Wrote treeUi.contract.test.ts first, ran type-check, watched 9
specific type errors fire ("Property 'X' does not exist on type Y" /
"Module has no exported member 'Z'"). Added the types; type errors
cleared; 13 runtime tests pass; existing 662 frontend tests
unaffected; lint clean; main `npm run type-check` clean.

PR sequencing
-------------
PR3a is a pure frontend type extension; nothing else consumes the
new types yet. PR3b (tree-UI domain types — ConversationTree,
ConversationTreeNode, ExecutionRecord, WaveEvent, etc.) will follow,
then PR4 (runner core) will consume both PR3a and PR3b types to
implement the V1.0 dispatch loop.
V1.0 tree-UI PR3b. Translates the design doc set into TypeScript:
type model from doc/gui/design/01_tree_primitives.md §4–§6 / §13
(data model, lifecycle, propagation, workspace, undo) and runner
interfaces from doc/gui/design/03_runner.md §2 / §6 (Runner,
RunnerStateSink, CostGuardrail, CrossTabLockManager, WaveEvent).

What ships
----------
`frontend/src/runner/treeTypes.ts` (single file; PR4 will add
`runner.ts` + helpers alongside it):

- Branded id types `ConversationTreeId`, `ConversationTreeNodeId` —
  type-level disambiguation with zero runtime cost. Catches "passed
  node id where tree id was expected" at compile time.

- Lifecycle: `NodeState` (the 7 values from §6.1), `NodeFailureClass`
  (transient / rate_limited / permanent / blocked per §6.1 + §3.3a),
  `ApiErrorReason` (the structured `{message; failure_class}` sink
  reason from §3.3a `_format_api_error`).

- Shared types: `PromptDataType`, `ConverterRef` (stored id or inline
  spec per §4.6), `PieceSpec`, `WaveTriggerKind` (the closed V1.0 +
  V1.1/V2 enum from §6.2), `ExecutionRecord` (with the rev-18 timing
  triple `dispatchedAt` / `targetFirstByteAt` / `completedAt`),
  `ReflogEntry` (per-tree pinned wrapper around immutable
  ExecutionRecord per §6.5 sharing semantics).

- Node taxonomy: `ConversationTreeNodeBase` + six discriminated
  variants (`RootPromptNode`, `ImportMessageNode`, `UserTurnNode`,
  `SendNode`, `FanNode`, `ScoreNode`) → `ConversationTreeNode` union
  discriminated by `kind`. `NodeParams` helper alias for the undo
  system's snapshot needs.

- FanNode surface: `FanAxis` (V1.0 attempt+converter, V1.1+ the four
  others), `FanVariant` (discriminated union over axis with per-axis
  payload shape). `promotedChildSlotIndex` + `deletedSlotIndices`
  tombstone array per §5.1 invariants 2 + 4.

- Edge: `ConversationTreeEdge` with the slotIndex fan-discriminator
  that MUST be in the resolved-input hash per §5.1 microsoft#4.

- Undo: `UndoOp` discriminated union per §6.9 with the rev-16
  state-snapshot widening (`editParams` carries `priorState` +
  `priorDescendantStates` so undo restores the §6.3-cascade state, not
  just the named field; same for `makeCurrent` per §6.7 step 4).

- Tree container: `ConversationTree` (with `parentConversationTreeId` /
  `parentSourceConversationId` / `undoStack`).

- Workspace: `Workspace` (V1.0 minimal: `currentTree`, `recentTreeIds`,
  `settings`) + `WorkspaceSettings` (`reflogCapPerNode`,
  `confirmThresholdCount`, `suppressConfirmModalThisSession`).

- Wave: `WaveEvent` discriminated union over `kind` (start /
  node_complete / complete / busy / queued / reflog_eviction /
  operator_tag_required) with required `emittedAt: string` per rev-18
  Finding C.1. `complete.summary.failed` bucketed by class per rev-16
  Findings 2+3; `blocked` is in-flight-cascade victims; `cancelled`
  is operator wave-abort; `reflog_evicted` rolls up wave-time evictions.

- Runner interfaces: `Runner` (6 entry points: refresh* + cancelWave
  + cancelQueued + retryFailedNodes), `RunnerStateSink` (the runner's
  sole React-state mutation surface with the rev-15 reason semantics:
  string/ApiErrorReason/null/omitted, plus missing-node tolerance),
  `CostGuardrail`, `CrossTabLockManager` (BroadcastChannel advisory
  lock per §10.4).

`frontend/src/runner/treeTypes.contract.test.ts` — 35 tests using
`satisfies` for compile-time shape verification + runtime sanity for
each variant. Validates discriminator narrowing for the 6 node kinds
(switch-on-kind gives type-safe access to params[…]), the 6
FanVariant axes, the 7 WaveEvent kinds, and the 5 UndoOp kinds.

TDD
---
Wrote treeTypes.contract.test.ts first (importing 35 symbols from a
nonexistent module). Type-check `tsc -p tsconfig.test.json` flagged
"Cannot find module './treeTypes'" — the expected red. Created
treeTypes.ts with each named symbol; cleared an unused-import slip
(stray `./converterRef` reference); type-check returned clean for the
treeTypes.contract test surface. Runtime suite: 35 passed.
Aggregate frontend: 697 passed (+35), no regression; main
type-check + lint clean.

Scope discipline
----------------
This commit is type definitions + interface declarations only. No
runner implementation; PR4 will land that in `runner.ts` alongside
this file and consume both PR3a (API types) and PR3b (domain types).
The deliberate split avoids a 4000-line PR4 by getting the type
surface reviewer-stable first.
…(PR4a)

V1.0 tree-UI PR4a — first slice of the runner. Pure functions over the
ConversationTree that compute which leaf Sends are dispatchable for a
given wave, plus the retry-failed pre-readiness demotion that makes
[Retry failed] waves work.

What ships
----------
`frontend/src/runner/readiness.ts`:

- `findLeafSends(tree)` — every SendNode with no SendNode descendant.
  UserTurn / Fan / Score descendants do NOT make a Send interior (per
  the §2 vocabulary). Orphan Sends (Send with no children) are leaves
  per 03 §3.2.

- `isLeafSend(tree, nodeId)` — predicate counterpart to the above.

- `computeReady(tree, S)` — the §3.1 readiness rule literally:
  leaf Sends in S whose every SEND ancestor has state in {edited,
  stale, running, clean}. Interior Sends never enter ready (they
  regenerate as part of their leaf's sequence per §3.2). Failed /
  cancelled SEND ancestors block the leaf — the rev-15 Finding 4
  anti-amplification rule that prevents single-5xx-cascades-to-N-
  retries against rate-limited targets.

- `buildSForTree` / `buildSForSubtree` / `buildSForNode` — S
  construction per the three refresh scopes from 03 §2.1.

- `demoteRetryFailedNodes(tree, S, sink)` — §3.1 step 2b: for
  `waveTriggerKind === 'retry_failed'` only, flip every S-member
  {failed, cancelled} node back to `stale` and clear its execution
  BEFORE the readiness rule runs. Uses the `null` reason sentinel
  (per 03 §2.2) so `lastError` clears rather than lingers. This is
  the mechanism that lets the [Retry failed] toast button's wave
  actually re-dispatch failed leaves — without it the ancestor
  allowlist would silently exclude them and the wave would no-op.

`frontend/src/runner/testHelpers.ts` — shared across all PR4 sub-PRs:

- Builders per node kind (`mkRoot`, `mkUserTurn`, `mkSend`, `mkFan`,
  `mkScore`, `mkImport`) that fill boilerplate (timestamps, empty
  history, default state) so tests name only the fields under test.
- `mkEdge`, `mkTree` for graph construction. `mkTree` derives edges
  from `parentId` if not supplied; tests needing explicit fan
  slotIndices pass `overrides.edges`.
- `mkExecution` for ExecutionRecord fixtures.
- `mkMockSink` — recording sink that captures every call. Helpers
  `callsOf(method)`, `events()`, `stateChanges(nodeId)` keep test
  assertions terse.

`frontend/src/runner/readiness.test.ts` — 30 tests covering:

- findLeafSends: 7 cases including the orphan-Send edge case, fan
  children, Crescendo-style interior detection, Score-only descendants
  not making Send interior, multi-depth trees.
- computeReady: 11 cases including the rev-15 anti-amplification
  cases (failed/cancelled ancestor blocks; edited/stale/running/clean
  ancestor admits), Fan/Score-transparent ancestor walking, the leaf-
  not-in-S edge case, empty S.
- buildSForTree / Subtree / Node: 5 cases including the running/draft
  state exclusion, subtree scope, single-node scope.
- demoteRetryFailedNodes: 5 cases including the null reason sentinel
  (clears lastError vs leaves it stale), ignoring non-{failed,
  cancelled} S members, ignoring nodes outside S, and the integration
  case showing computeReady admits a previously-blocked leaf after
  demotion.
- 1 fan-slot-aware traversal smoke test confirming explicit
  slotIndex edges don't perturb leaf detection.

TDD + scope discipline
----------------------
Tests written first against a nonexistent ./readiness module (TS2307
+ implicit-any cascade as the expected red). Implementation made the
file resolve and the types narrow; all 30 pass. One real bug caught
in the helpers: `as const` on the shared `base()` object made
`executionHistory: []` resolve to `readonly []`, unassignable to the
mutable `ReflogEntry[]` field shape — fixed in the same PR (since
the helper is its own file landing here for the first time).

Test infra-bug from PR3a stays surfaced (pre-existing test type
errors in unrelated files); the runner directory type-checks clean
under `tsc -p tsconfig.test.json`. Aggregate frontend: 727 tests
pass (+30), no regression; main type-check + lint clean.

Next slice
----------
PR4b: `resolvePathPartition` — the pure function that walks a leaf's
root-to-leaf path and partitions Sends into clean prefix (load into
prepended_conversation as historical context) and fresh suffix (the
N add_message calls). Builds on testHelpers; no dependency on
readiness.ts. Will exercise the §5.1 invariant 5 Fan/Score
transparency on the path-walk side.
…w (PR4a.1)

Three concrete fixes from the post-PR4a rubber-duck pass (Opus 4.7 extra
high reasoning). Each addresses a defect that would have bitten PR4b
or rotted into bad test hygiene.

1. mkTree: auto-number fan-child edge slotIndex
   ----------------------------------------------
Before, every derived edge got `slotIndex: 0` — including all children
of a FanNode. The readiness tests don't read edges so they passed, but
PR4b's `resolvePathPartition` will read `edge.slotIndex` to drive the
fan-child variant resolution. Fixtures built via `mkTree` would have
handed PR4b bogus shapes (all attempt-fan siblings sharing slot 0,
violating the §5.1 slot-stability invariant), producing test failures
that look like resolver bugs but are actually fixture bugs.

The fix tracks which parent ids are FanNodes and auto-numbers their
child edges by ordinal. Non-fan parents stay on slot 0. Tests needing
explicit slot indices still pass `overrides.edges`.

2. MockSink: delete stateChanges helper
   -------------------------------------
The `stateChanges(nodeId)` helper baked in a query shape — "give me
the sequence of states for this node" — that tempted tests to assert
exact transition sequences (`expect(stateChanges('s')).toEqual(['running', 'clean'])`).
That kind of assertion locks the runner into a specific transition
order and breaks the moment a legitimate intermediate transition is
added. `events()` was also dropped: thin sugar for `callsOf('emitWaveEvent')`
with no payoff. Callers can compose `callsOf('setNodeState').filter(...)`
when they need a per-node view, which is rare enough to not deserve
a helper.

Also dropped the unused `ALL_WAVE_TRIGGER_KINDS` re-export (kitchen
sink waiting to grow).

3. demoteRetryFailedNodes test: actually compose with computeReady
   ----------------------------------------------------------------
The previous "integration check" test built TWO trees by hand — one
with failed states and one with the desired post-demotion states —
then ran computeReady against the second. The demoter could have
written 'staale' (typo) and the test would still pass because the
hand-built tree had the correct 'stale' state. Test-that-passes,
not test-that-proves.

The new version runs the demoter against a recording sink, projects
the recorded `setNodeState` calls onto a copy of the original tree,
then runs computeReady over the projection. If the demoter writes
anything but `stale`, the projected tree differs from the
hand-built one and computeReady's result diverges. The composition
is honest now.

Verification: 30 readiness tests pass, 727 frontend tests pass, no
regression. lint + type-check clean.

Other rubber-duck items not in this commit (filed for later):
- Trim contract-test runtime expect boilerplate (PR4a.2, this cycle).
- Add narrow CI gate for contract-test type-check (PR4a.3, this cycle).
- Tighten DTO original_prompt_id to non-nullable str + add contract
  test proving the validator guarantee (separate decision; the doc
  spec'd nullable).
- Q.S.1 cost-cliff regression test (lands in PR4c).
- Drop new doc citations starting PR4b; full citation strip at end
  of V1.0.
… (PR4a.2)

Two coupled changes from the rubber-duck review.

1. Trim `treeTypes.contract.test.ts` runtime-expect boilerplate
   ------------------------------------------------------------
The previous file was 694 lines with 35 tests, most matching the
shape `const x = { ... } satisfies SomeType; expect(x.field).toBe(...)`.
The `satisfies` clause was doing all the real work; the `expect` was
asserting a literal typed two lines up — type theater that would
never fail unless someone deliberately broke the literal.

Rewrote to 11 tests focused on what genuinely catches future bugs:
- Four discriminator-narrowing tests (ConversationTreeNode kind,
  FanVariant axis, WaveEvent kind, UndoOp kind). Each exercises the
  switch-on-discriminator pattern the runner / UI rely on; lost
  narrowing here would silently degrade dispatch sites to `any`.
- Three default-value contracts where null-vs-absent matters at
  runtime (SendNode empty params → inherits target; FanNode
  promotedChildSlotIndex `null` vs `0` for Pick semantics;
  ExecutionRecord timing-triple nullability for pre-target-call
  failures).
- Three interface-shape stubs (Runner, RunnerStateSink with all
  three reason shapes, CostGuardrail + CrossTabLockManager).
- One forward-compat assertion (WaveTriggerKind admits V1.1+
  markers).

Plus a short block of type-only assertions (`type _Assert... = X
extends Y ? true : never`) for structural drift that runtime tests
don't reach (branded id types on ConversationTree, ReflogEntry
shape, ConverterRef union, Workspace shape).

Net: -614 / +383 lines. Trim ratio ~50%, but the kept tests are
genuinely load-bearing.

2. Narrow CI gate for the satisfies clauses
   -----------------------------------------
PR3a added `"exclude": []` to tsconfig.test.json so the contract
tests' satisfies clauses type-check. But CI (frontend_tests.yml)
only runs `npx tsc --noEmit` against tsconfig.json — the satisfies
clauses were unenforced in CI. Type theater.

Per-user direction (narrow gate), added tsconfig.contract.json that
includes ONLY the tree-UI contract files + their type dependencies
(treeUi.contract.test.ts, treeTypes.contract.test.ts, treeTypes.ts,
testHelpers.ts, types/index.ts). Wired:
- new `npm run type-check:contract` script for local DX
- new CI step `Run tree-UI contract type check` in frontend_tests.yml

The pre-existing test type errors that PR3a surfaced
(126 errors across 6 component-test files) stay latent — they
predate this work and fixing them would balloon scope. They are not
gated and not regressed by this change.

3. Real bug found by the trim
   ---------------------------
The new "ExecutionRecord timing triple admits null per-field" test
failed against `mkExecution` because `overrides.dispatchedAt ?? ISO_FIXED`
collapses explicit `null` overrides into the default ISO value. Fixed
mkExecution to use spread-merge (`{ ...defaults, ...overrides }`),
which preserves `null` distinct from `undefined`. The kind of bug
no operator would have seen until PR4c started constructing failure-
path execution records — caught at the right layer.

Verification: 703 frontend tests pass (was 727; trim cut 24 trivial
tests), lint clean, both type-check + type-check:contract green.

Other rubber-duck items still pending:
- DTO original_prompt_id nullability tightening (separate concern;
  the doc spec'd nullable, deferred to a focused discussion).
- Q.S.1 cost-cliff regression test (lands in PR4c).
- Stop adding new doc citations starting PR4b.
- Full citation strip at end of V1.0.
…lker (PR4b)

Second slice of the runner. Pure function over the tree: walks a leaf
Send's root-to-leaf path and produces the dispatch plan that PR4c will
turn into one `create_attack` + N `add_message` calls.

What ships
----------
`frontend/src/runner/partition.ts`:

- `rootToLeafPath(tree, leafId)` — walks parents back to the root, reverses.
  Throws if `leafId` is not in the tree.

- `isStaleForResolver(send)` — predicate: state in {edited, stale, failed,
  cancelled} OR execution is null. The null clause is the safety net for
  failed/cancelled (which null their execution by contract) and for
  freshly-added draft Sends that have no execution yet.

- `resolvePathPartition(tree, leafId)` — the main walker. Output:
    - `prepended: PrependedMessageRequest[]` — turns from the clean prefix
      (Sends whose params still match their executions). Each clean Send
      contributes its input UserTurn message + its stored assistant
      response message; both load into `prepended_conversation`.
    - `freshSuffix: FreshSuffixEntry[]` — the first stale Send and
      everything after, down to the leaf. Each entry is (userTurn,
      fanVariant, sendNode); PR4c turns each into one `add_message`.
    - `treePathSegments: Array<[FanAxis, number]>` — (axis, slotIndex)
      pairs for every Fan ancestor, in topo order. PR4c's _build_labels
      will JSON-encode this into the `tree_path` AR label.
    - `target: string` — the resolved target_registry_name (per-Send
      override wins over root's value).

Notable shape decisions
-----------------------
- Synthetic UserTurn from Root. When a Send's first non-Fan, non-Score
  ancestor is the RootPromptNode (i.e., no operator-authored UserTurn
  between root and Send), the walker synthesizes a UserTurn-shaped
  object carrying root's text + attachments. Marked with `synthetic:
  true` so PR4c can detect and avoid double-wrapping; the dispatch
  code can read uniform fields (role, text, attachments) regardless.

- `fanVariant` on FreshSuffixEntry vs. `treePathSegments`. Two different
  things on purpose:
    - `treePathSegments` records every Fan ancestor on the path. Used
      for the `tree_path` label (round-trips fan structure for reload).
    - `fanVariant` carries the variant the runner needs at dispatch
      time *for this specific Send*. Set when a Fan sits between the
      Send's input UserTurn and the Send itself (the attempt-axis-
      directly-above-Send pattern). Cleared by the intervening
      UserTurn for the converter-axis-with-per-child-UserTurn pattern,
      where the variant data lives on the child UserTurn's authored
      `converterPipeline` (fan-child materialization at create time).
  Convergent: tree_path is always present; fanVariant is only present
  when the runner needs it to disambiguate the Send beyond what's
  already on the input UserTurn.

- Clean-prefix assistant message carries `original_prompt_id` per piece
  but defers the actual piece content to PR4c's piece cache. The
  partition is pure tree-walking; piece data lives in a separate cache
  the dispatcher hydrates from `GET /attacks/{id}/messages` at wave start.

- ImportMessageNode on the dispatch path throws. V1.0 does not extend
  imported context via the runner's dispatch loop; if a future caller
  wires that path, this throw makes the gap loud rather than silent.

TDD
---
Tests written first against a nonexistent ./partition module
(TS2307 + implicit-any cascade). 25 cases covering:
  - rootToLeafPath: topo order, Fan/Score ancestors included, error on
    unknown id.
  - isStaleForResolver: each of {edited, stale, failed, cancelled},
    no-execution case, clean-with-execution case, running-with-null.
  - Single-Send chain (all-stale).
  - Root-promoted-to-UserTurn (no UT between root and first Send).
  - systemPrompt → leading system message; absent → no system message.
  - All-clean upstream + edited leaf: prepended = (user-turn + assistant-
    response), freshSuffix = [leaf].
  - Stale interior + leaf: prefix empty, both stales in freshSuffix.
  - Clean prefix + stale interior + leaf: prefix loaded, both stales
    in freshSuffix.
  - Defensive: failed-with-execution still goes to freshSuffix (state
    check wins, retries always re-dispatch).
  - Fan(attempt)-directly-above-Send: input UT is from above the Fan,
    fanVariant captures (axis, slot), tree_path has one segment.
  - Fan(converter)-with-per-child-UT: input UT is the per-child UT,
    fanVariant null (data on child UT's pipeline), tree_path has one segment.
  - Score ancestor: pass-through; UT and variant unchanged.
  - Nested fans: tree_path accumulates in topo order.
  - SendNode target override wins; absence inherits from root.
  - Throws on non-Send target.
  - Throws on interior-Send target (the runner's dispatch loop is
    documented as only calling this for leaves; precondition fails loud).
  - tree_path produces JSON-serializable shape; empty array for fan-
    less leaf.

One real design clarification surfaced during TDD: the converter-fan
test originally asserted `freshSuffix.fanVariant = (converter, slot)`,
which contradicts the spec's variant-clearing-on-UserTurn rule. Fixed
the test to reflect the right semantics (variant data lives on the
materialized child UserTurn's pipeline; fanVariant is null at that
Send; tree_path still captures the fan ancestor for label round-trip).

Verification: 728 frontend tests pass (+25), no regression. lint,
type-check, type-check:contract all clean.

Next slice
----------
PR4c will wire the dispatch sequence (one `create_attack` + N
`add_message` calls against a mock API client) consuming the partition
output, plus `_build_labels` (with the tree_path JSON-encoding from
treePathSegments), `_format_api_error` failure classification, the
200-message cap short-circuit, the labels-divergence invariant test,
and the Q.S.1 cost-cliff regression test that the rubber-duck flagged.
Pure helpers the dispatcher (PR4c2) calls per leaf. Split out so the
orchestrator stays small and these are testable without any API-client
mocking. No I/O; no React.

What ships
----------
`frontend/src/runner/dispatchHelpers.ts`:

- `buildLabels(args)` — the source of the labels-divergence invariant.
  Produces the Record<string, string> attached to every create_attack
  and add_message call in one leaf's dispatch sequence. The dispatcher
  calls this once per dispatch and reuses the result; identical labels
  across all N+1 calls fall out by construction.

  Schema:
    operator                     (required; hard-asserts non-empty)
    operation                    (empty string permitted)
    conversation_tree_id         (stringified)
    wave_id                      (uuid from the wave)
    wave_trigger_kind            (the closed-enum value)
    tree_path                    (JSON-encoded array of [axis, slot])
    parent_conversation_tree_id  (OMITTED when null; written only when
                                  the tree is a clone)

  The "omit-on-null" rule for parent_conversation_tree_id is a real
  contract: writing the empty string would surface a self-parent in
  History "Open clones of T" — actively wrong. Omission is the honest
  "no parent" signal.

  The empty-operator hard-assert is defense-in-depth. The runner's
  entry-point shim's tag-hygiene gate is the load-bearing check; the
  assert exists so a future refactor that bypasses the gate panics
  rather than silently writing operator:'' ARs (which destroy audit
  attribution).

- `parseTreePathLabel(label)` + `isTreePathLabelValid(label)` — JSON
  decoder + validator for the tree_path round-trip. Fail-soft on
  malformed input (empty array) so older clients encountering a
  future encoding don't hard-crash. The validator is for tests /
  defensive code paths that want to distinguish well-formed empty
  from malformed.

- `formatApiError(error, callName)` — failure-class classification.
  Maps an ApiError (already-normalized by services/errors.ts) into
  one of the four NodeFailureClass values:
    transient    : 5xx, network, timeout (auto-retry-eligible)
    rate_limited : HTTP 429 + provider-specific (Anthropic 529,
                   detail-body strings 'rate_limit_exceeded' or
                   'overloaded_error'). Retry-gated in UX until the
                   operator manually re-triggers.
    permanent    : 4xx other than 429. Operator-fix required.
    blocked      : runner-synthesized for in-flight cascade victims;
                   NOT produced by formatApiError (cascade lives in
                   the dispatcher, PR4d).

  Defaults to `transient` for unclassifiable shapes. A wrongly-
  classified transient gives an unhelpful but harmless Retry click;
  a wrongly-classified permanent silently locks the operator out of
  recovery. Safer default.

  Provider-specific detection lives in a small private registry. The
  V1.x plan is to push detection to the backend (which knows each
  target's provider) once token-bucket throttling lands; until then,
  client-side registry keeps the runner self-contained.

TDD
---
Tests written first against a nonexistent ./dispatchHelpers module
(TS2307 expected). 26 cases covering:
  - buildLabels: required keys present, treePath JSON encoded,
    operation='' permitted, parent_conversation_tree_id omitted-on-null
    vs written, empty-operator throws, identical inputs produce
    deep-equal labels (the divergence invariant source).
  - tree_path encoding round-trip: build → parse → equal; '[]' (not
    absent) for fan-less; fail-soft on absent/empty/malformed/
    wrong-typed input.
  - formatApiError: transient (network, timeout, 5xx), rate_limited
    (429, 529, detail-body strings), permanent (4xx, operator-mismatch
    body, 401/403), unknown-status defaults to transient, message
    shape includes callName/status/detail.

26 pass first try (one lint nit fixed: unused NodeFailureClass type
import — the value-returning function uses the inferred ApiErrorReason
type, not the underlying class union).

Aggregate frontend: 754 tests pass (+26), no regression. lint,
type-check, type-check:contract all clean.

Next slice
----------
PR4c2: the dispatch orchestrator. Consumes the partition output
(PR4b) + these helpers + a mocked API client to drive one leaf's
`create_attack` + N `add_message` sequence. Tests the 200-cap
short-circuit, the labels-divergence invariant at the call site,
the Q.S.1 cost-cliff regression (60-leaf attempt-fan with 10-deep
shared stale prefix = 600 add_message calls under V1.0's no-
intra-wave-memoization rule), and the mid-chain partial-commit
failure semantics.
…ssages (PR4c2)

The heart of one leaf's HTTP lifecycle. Consumes PR4b's resolvePathPartition
output + PR4c1's helpers + a mocked-or-real RunnerAttacksApi to drive the
backend interaction. The only place in the runner that talks to the API.

What ships
----------
`frontend/src/runner/dispatch.ts`:

- `RunnerAttacksApi` interface — the 2-method slice of the existing
  `attacksApi` the runner depends on. Production wires this to
  services/api.ts; tests pass a recording mock.

- `dispatchLeaf(args)` — orchestrates one leaf:
    1. Defense-in-depth: throws on empty operator (the entry-point
       shim's tag-hygiene gate is upstream; this fires if a future
       refactor bypasses the gate).
    2. Calls `resolvePathPartition` to compute clean prefix + fresh
       suffix + tree_path segments + resolved target.
    3. 200-message cap short-circuit. Fails the leaf with a permanent
       failure-class and a recovery-pointing message; NO backend call
       fires. Recovery is branch-from-midpoint.
    4. Builds labels ONCE (the labels-divergence invariant source).
    5. Marks every fresh-suffix Send `running` atomically so siblings
       observing in-flight state see them together.
    6. `create_attack` with prepended_conversation + labels.
    7. For each fresh-suffix entry, `add_message` (send=true) with
       the same labels + the resolved converter_ids; extracts new
       assistant pieces by turn_number diff; records ExecutionRecord
       on the Send; flips to `clean`.
    8. On API error mid-sequence: failing Send → failed (execution
       cleared); later Sends roll back to `stale` (executions cleared);
       earlier successful Sends keep their clean state + executions.

- `LeafDispatchOutcome` discriminated union:
    `{ kind: 'success', leafId, callsIssued }`
    `{ kind: 'failed', leafId, failedNodeId, failureClass, partialAttackResultId }`
  The cascade-on-failure layer (PR4d) will consume this to drop sibling
  leaves whose paths include the failed ancestor.

Notable shape decisions
-----------------------
- `asApiError` private helper. The shared `toApiError` from services/
  errors normalizes axios + Error + string throws but treats an
  already-normalized ApiError as an unknown object (falling into the
  "anything else" branch that loses the status code). The dispatcher
  catches both raw axios errors (production) and pre-normalized
  ApiErrors (tests, upstream layers that re-throw). The passthrough
  fixes both without modifying the shared helper.

- New pieces extracted via turn_number diff. `AddMessageResponse.messages.messages`
  carries the ENTIRE conversation; the dispatcher holds a per-sequence
  `priorMaxTurnNumber` watermark and collects assistant pieces from
  turns strictly above it. Bounded O(messages-per-AR) per call,
  hard-capped at 200 by the backend; cheap.

- ExecutionRecord built in-runner. The runner mints the executionId
  via `crypto.randomUUID()` (with a Math.random() fallback for very
  old browsers); records dispatchedAt / targetFirstByteAt / completedAt
  as the same timestamp (single-turn synchronous dispatch; full
  per-call timing is a streaming-target enhancement out of V1.0 scope).

- The 200-cap is enforced on `prepended_conversation` ONLY, matching
  the backend Pydantic max_length=200. add_messages extend the AR's
  conversation past 200 messages cleanly; only the clean-prefix load
  trips the cap.

TDD
---
Tests written first against a nonexistent ./dispatch module. 16 tests:
- Happy path (single-Send chain): 1 create_attack + 1 add_message;
  correct request shapes; running→clean transition; ExecutionRecord
  carries waveId/waveTriggerKind/AR id/conversation id/pieceIds.
- Multi-Send chain with clean prefix: prepended carries (user, assistant)
  pairs from each clean Send; one add_message per stale Send in path
  order; clean Sends are untouched.
- Multi-stale chain: each stale Send becomes its own add_message.
- Labels-divergence invariant at the CALL SITE: 4-stale chain produces
  5 requests; all five labels dicts are deep-equal. Plus parent_-
  conversation_tree_id written only on clones.
- tree_path label populated from fan ancestors (axis, slot).
- 200-cap short-circuit: 101 clean Sends → 202 prepended turns; leaf
  fails with permanent class and recovery-pointing message; no backend
  call fires.
- create_attack failure: every fresh-suffix Send fails; no add_message.
- Mid-chain add_message failure: failed Send → failed; later Sends →
  stale; earlier Sends stay clean; partialAttackResultId surfaced.
- Classification round-trip: 429 → rate_limited; 400+operator-mismatch
  → permanent.
- Tag-hygiene defense-in-depth: empty operator throws synchronously.
- The Q.S.1 cost-cliff REGRESSION: 60 sibling leaves with a 10-deep
  shared stale prefix produce 60 create_attacks + 660 add_messages =
  720 total calls. Pins V1.0's no-intra-wave-memoization invariant
  via test rather than absence-of-code; a well-meaning future
  contributor who adds memoization sees this drop to ~71 and the
  assertion fires loudly.
- Outcome shape: success carries leafId + callsIssued; failure carries
  failedNodeId + failureClass + partialAttackResultId (null when
  create_attack failed before any AR was created).

Three real defects surfaced during TDD:
1. `toApiError` doesn't recognize already-normalized ApiError throws
   → all classification tests returned 'transient'. Fixed via asApiError
   passthrough.
2. The Q.S.1 test originally had `Fan → Send` directly (no UserTurn
   between), which violates the §5.1 microsoft#5 invariant and makes the resolver
   throw. Real test bug: rewrote to put a UserTurn above the Fan (the
   correct attempt-fan shape where all siblings share an input UT).
3. mkTree's auto-numbered fan-child edges (PR4a.1) earn their keep
   here: without that fix, all 60 fan-children would have shared
   slot 0 and the partition's tree_path would have been wrong.

Verification: 770 frontend tests pass (+16), no regression. lint,
type-check, type-check:contract all clean.

Next slice
----------
PR4d: the in-flight cascade. When `dispatchLeaf` returns `kind:
'failed'`, the dispatch loop drops every sibling leaf in `ready`
whose root-to-leaf path includes the failed Send → `stale` with
`failure_class: 'blocked'`. Plus `cancelWave(treeId)` (flips a per-wave
flag at `ready.popNext()` boundaries) and `cancelQueued(treeId)`
(drops queued waves without aborting the active one).
Wraps PR4c2's per-leaf dispatcher in the wave-level loop. Owns:
  - the concurrency cap (maxParallel, default 4)
  - in-flight cascade: when a leaf's dispatch fails on a shared interior
    Send, not-yet-dispatched siblings drop to `blocked` rather than
    independently retrying the same failure
  - operator cancellation via a per-wave controller checked at each
    ready-pop boundary; in-flight HTTP completes (V1.0 UI-level cancel
    contract), not-yet-dispatched leaves transition to `cancelled`
  - wave-event emission (start with estimated call count, one
    node_complete per dispatched leaf, complete with bucketed summary)

What ships
----------
`frontend/src/runner/wave.ts`:

- `WaveDispatchController` + `createWaveController()` — factory for the
  per-wave cancel handle. PR4e's entry-point shim creates one per
  wave and registers it in the active-wave map so `runner.cancelWave(treeId)`
  can flip it.

- `WaveSummary` — the wave-complete tally shape. Mirrors
  `WaveEvent.complete.summary` (succeeded, failed bucketed by class,
  blocked, cancelled, reflog_evicted) and is also the return value of
  runWave so callers can read it without subscribing to the event stream.

- `runWave(args)` — the loop:
    1. Compute initial ready set via PR4a's computeReady; estimate calls
       upfront for the start event.
    2. Emit `start` event with estimatedCalls + wave metadata.
    3. Drain ready into inflight up to maxParallel; wait on Promise.race;
       on each completion tally outcome + emit node_complete; on failure
       run the cascade.
    4. On cancellation: skip further picks; let in-flight complete
       naturally (their natural outcomes are tallied, not clobbered with
       `cancelled` — the execution-clobber gate).
    5. Cancel-tally: leaves still in `remaining` after the loop exits via
       cancellation get marked cancelled in the sink + outcome map.
    6. Emit `complete` event with the bucketed summary; return it.

Notable shape decisions
-----------------------
- Cascade is "drop from ready", not "drop from running". When a leaf's
  dispatch fails on an interior Send, only siblings still in the
  remaining-set get blocked. Siblings already in flight continue and
  may independently fail on the same Send (counted as their own
  per-leaf transient failures, not as cascade-blocked).

- The cascade walks `remaining`, not the whole tree. Each leaf in
  `remaining` is checked against the failedSendId via rootToLeafPath
  containment. Bounded by remaining-leaf count × path depth; cheap
  at V1.0 sizes.

- Cancel uses a controller flag, not an AbortController. JS native
  AbortController is bigger surface than needed and doesn't compose
  naturally with `dispatchLeaf`'s in-flight contract (which is "let
  the HTTP complete"). The controller's role is purely "don't pick
  more leaves"; in-flight ones decide their own fate.

- The cancel-tally step uses `'transient'` as the placeholder
  failure_class in lastError because NodeFailureClass doesn't carry
  a 'cancelled' variant (cancelled is a NodeState, not a failure class).
  The tally bucket separately accounts for cancelled leaves; the
  failure_class in lastError is informational only for cancelled state.

- The summary is returned directly AND emitted via the complete event.
  PR4e's shim will use the return value; the UI subscribes to the
  event stream. Both paths see the same data; no divergence risk
  because there's one source of truth (the outcomes map).

Test scaffolding
----------------
A controllable mock API client (`mkControllableApi`) gives tests precise
control over per-leaf timing via deferred Promise resolutions
(`releaseNext` / `failNext` / `pendingCount`). The wave loop is
deterministic given the resolution order; tests use a poll-based
`waitFor` helper to wait for specific predicates rather than fixed
microtask-flush counts (which proved racy when the dispatchLeaf
chain has more await hops than the flush counter — caught and fixed
during this PR's TDD cycle).

TDD
---
Tests written first against a nonexistent ./wave module (TS2307 +
missing-module cascade). 16 tests covering:
  - Empty S: no dispatches; zero summary; start + complete only.
  - Single leaf happy path: one dispatch; summary.succeeded=1; events
    in order (start → node_complete → complete) with the right
    metadata (waveId, triggerKind, estimatedCalls, treeId).
  - 3-leaf fan all succeed: summary.succeeded=3.
  - Concurrency cap: maxParallel=2 with 5 ready leaves; inflight.max=2
    throughout the wave.
  - Default maxParallel=4 when omitted.
  - Wave-event ordering: start first; one node_complete per leaf with
    'success' or 'failure' outcome; complete last with bucketed summary.
  - In-flight cascade (maxParallel=1, shared interior Send fails):
    sibling leaves drop to `blocked` with failure_class='blocked' on
    lastError; no further API calls fire; summary 1 failed + 2 blocked.
  - In-flight cascade ONLY blocks not-yet-dispatched siblings: with
    maxParallel=3 and 3 sibling leaves all dispatching, each leaf's
    failure is independent; no blocked count; summary 3 failed.
  - Mixed cascade across two fan subtrees sharing one ancestor Send:
    cascade walks across fan boundaries; both subtrees' siblings get
    blocked.
  - Cancel before any dispatch: all leaves cancelled; no API calls.
  - Cancel mid-wave: in-flight completes; remaining → cancelled.
  - Execution-clobber gate: in-flight leaves that complete AFTER cancel
    still record their executions; tallied as succeeded, not cancelled.
  - Controller omitted defaults to never-cancelled.
  - Summary failure-class bucketing across mixed outcomes (1 each of
    transient + rate_limited + permanent).

Two real test-infrastructure defects surfaced during TDD:
1. `flushMicrotasks(4)` was too few hops for the dispatchLeaf chain's
   await sequence (await createAttack → await addMessage → record →
   wrap → race → loop → pick next → await createAttack → await
   addMessage). Bumped default to 32 and added a poll-based `waitFor`
   helper for tests that need a specific predicate satisfaction.
2. The original concurrency-cap test tried to interleave releases and
   pending-count assertions, which raced the loop. Simplified to drain
   all 5 deferreds + assert inflight.max at the end — the invariant
   the test was actually trying to prove.

Verification: 786 frontend tests pass (+16), no regression. lint,
type-check, type-check:contract all clean.

Next slice
----------
PR4e: 5-step entry-point shim. Wraps runWave with:
  1. Tag-hygiene gate (operator non-empty)
  2. Cross-tab lock acquire (mock for now; real BroadcastChannel in PR4f)
  3. Cost guardrail modal
  4. Per-tree wave queue check
  5. Wave start + try/finally lock release
Plus the active-wave map so `runner.cancelWave(treeId)` can lookup the
controller created by the shim. Then PR4f wires the real BroadcastChannel
lock + queue drain, completing the runner.
…V1.0 (PR4d.1)

Critical fix. Rubber-duck reviewer caught: partition.ts's clean-prefix
branch was writing the literal placeholder string `'<deferred: resolved
from piece cache>'` as the `original_value` of every assistant piece
loaded into prepended_conversation. The piece cache the design assumes
(per 03 §3.3a `_load_piece_as_request`) doesn't exist. Every clean-
prefix dispatch in production would send fabricated assistant history
to the LLM — either backend-rejected (validation) or model-accepted
and reasoned-against (silently corrupting target responses).

Tests passed because no assertion looked at `pieces[i].original_value`.

The fix
-------
Two options were viable:
  A. Build the piece cache (~150 LOC: new module, wave-start integration,
     GET /attacks/{id}/messages per distinct source AR, cache by piece_id,
     partition reads from cache). Closes the design's contract; preserves
     the per-leaf-edit cost optimum.
  B. Drop the clean-prefix branch entirely. Every Send on the path
     enters freshSuffix and re-fires against the target. ~10 LOC change.
     Operators pay full re-dispatch cost on every wave; correctness is
     restored because every assistant message the target sees was
     actually generated by the target.

V1.0 ships option B per operator's explicit choice — aligns with Q.S.1
"dumb but correct" V1.0 discipline. The clean-prefix optimization
returns in V1.x with the piece cache.

Operator-visible cost: editing only a leaf at the bottom of a 10-deep
clean chain now costs 11 calls (1 create_attack + 11 add_messages)
instead of the 2 calls the clean-prefix model would have produced.
~5× hot-path regression on edit-leaf-only. The wave-summary's call
count reflects this honestly; the cost-guardrail modal intercepts
expensive refreshes.

What ships
----------
- `partition.ts`: clean-prefix branch removed. Every Send on the path
  enters freshSuffix unconditionally. `prepended` carries at most the
  system message (when RootPromptNode.params.systemPrompt is set).
  Dead helpers deleted: `userTurnMessage`, `assistantResponseMessage`,
  the placeholder-string code. File-header docstring rewritten to
  document V1.0 reality and the V1.x migration path.
- `isStaleForResolver` kept but re-documented: it's no longer on the
  resolver's hot path under V1.0; retained for defensive callers (UI
  cost preview, V1.x cache layer).
- `partition.test.ts`: three test cases updated to assert V1.0 behavior
  ("all-clean upstream", "clean Send + stale interior", "failed Send
  with execution"). All now expect empty prepended + every Send in
  freshSuffix. The implementation defect that the original tests
  silently tolerated (placeholder string in original_value) is now
  unreachable.
- `dispatch.test.ts`: "multi-Send chain with clean prefix" rewritten
  to assert V1.0's empty-prepended behavior + N add_messages for the
  full chain. "200-message cap" rewritten — the cap is unreachable in
  V1.0 normal traffic (prepended ≤ 1), so the test now documents
  contract (100-deep chain dispatches successfully because prepended
  is empty) + leaves a placeholder for the V1.x cache-layer expansion.

Design doc updates
------------------
- `01 §1.2 known limitations` gains a new entry: "No clean-prefix
  optimization in V1.0 — every dispatch re-fires the full chain from
  the root." Spells out the ~5× hot-path cost, the V1.x migration,
  and the correctness rationale. Updates the 200-turn-cap entry to
  note the cap is unreachable in V1.0 by construction.
- `03 §4.1` gains a "V1.0 implementation reality" subsection before
  the resolver pseudocode. Names the two options considered, the
  choice rationale, the cost trade-off, and the V1.x migration shape.
  The pseudocode below is left intact as the eventual V1 model.

Verification: 787 frontend tests pass, no regression. lint, type-check,
type-check:contract all clean. The Q.S.1 cost-cliff test in dispatch.test.ts
remains green at 720 backend calls per 60-leaf attempt-fan wave — the
shared prefix was already stale, so it was already in freshSuffix under
the prior buggy behavior; no math change.

The rubber-duck's other items (asApiError move, synthetic discriminator,
dead helper deletion, cancel-tally reason, reflog_evicted TODO,
dispatcher tag-gate redundancy) land in PR4d.2 to keep this commit
focused on the single load-bearing fix.
Six smaller items from the rubber-duck review bundled into one focused
cleanup. None are behavior changes; each is a layer / discipline fix.

1. Move ApiError pass-through to services/errors.ts
   ------------------------------------------------
The dispatcher had a private `asApiError` helper that duck-checked the
ApiError shape and passed it through `toApiError`. Reviewer caught:
the shape is OWNED by services/errors.ts; the dispatcher was
re-implementing a check at the wrong layer. Moved the passthrough
into `toApiError` itself as an early-return: any already-normalized
ApiError gets returned verbatim (idempotent). Two new test cases
in errors.test.ts pin (a) the referential-equality pass-through and
(b) that plain objects lacking the ApiError shape still fall into
the unknown-throw branch. The dispatcher's `asApiError` + the
private `isAlreadyApiError` helper are deleted; dispatcher catches
just call `toApiError(raw)` now.

2. Synthetic UserTurn → real discriminator
   ----------------------------------------
`SyntheticUserTurnFromRoot` carried a `synthetic: true` field that
the dispatcher's `piecesForUserTurn` / `resolvedConverterIds` ducked
against via `(ut as { synthetic?: boolean }).synthetic === true`.
That's a runtime check across a declared union — the worst of both
worlds. Fixed: the synthetic shape now uses `kind:
'synthetic_user_turn_from_root'`, which matches the `kind` field on
real `UserTurnNode` (`'user_turn'`), making the union a real
discriminated union. Consumers narrow via `if (ut.kind ===
'synthetic_user_turn_from_root') ...` and TypeScript narrows the
rest. All `as`-casts deleted.

3. Delete dead helpers
   --------------------
- `parseTreePathLabel` and `isTreePathLabelValid` in dispatchHelpers.ts:
  no production callers; the reviver was speculative for unrealized
  future needs. Tests for both also deleted; one kept-and-rewritten
  test pins `buildLabels` output shape ('[]' for fan-less leaves,
  JSON-encoded for nested fans) — that's the actual contract.
- `cryptoRandomUuid` Math.random() fallback in dispatch.ts: jsdom +
  modern Node both have `crypto.randomUUID`; the fallback was an
  unreachable RNG dependency adding coverage gaps. Replaced with a
  direct call.

4. Cancel-tally honest reason
   ---------------------------
`runWave`'s cancel-tally was writing `{ message: '...', failure_class:
'transient' }` on cancelled leaves' `lastError`. State='cancelled' is
what consumers actually read; the structured failure_class on a
not-failed node was confusing. Switched to a plain string reason
`'wave cancelled by operator'`. The sink normalizes string reasons
to transient internally, but consumers reading lastError.failure_class
in a cancelled-state node now see the documented "string was passed"
path rather than a hand-coded structured form.

5. Reflog_evicted TODO
   --------------------
`runWave`'s summary hardcodes `reflog_evicted: 0` until the reflog GC
layer lands. Added a TODO(reflog) comment explaining the migration
shape so the placeholder doesn't rot into a forgotten zero.

6. Drop dispatcher's redundant tag-gate assertion
   -----------------------------------------------
`dispatchLeaf` had its own `if (!args.operator) throw` check. But
`buildLabels` (called downstream) also asserts the same. Two defense-
in-depth checks for the same precondition is one too many — pick the
source. Kept `buildLabels`'s assert (it's the actual label-build site
where the consequence of a missing operator would silently destroy
audit attribution). Dropped dispatcher's; tests covering the synchronous
throw on empty operator still pass because the throw happens via
buildLabels at the same callsite, returns the same error class.

Verification: 787 frontend tests pass (same count as PR4d.1 — the
new errors.test.ts pass-through tests offset the deleted dead-code
tests). Coverage: 93.22%/85.71%/92.23%/95.2% globally; runner
directory 93.93/85.2/94.2/94.98 — all above the 85/85/90/90 thresholds.
The two remaining gaps in dispatch.ts (200-cap branch + the inline-
converter inner if) are defensible: the cap is unreachable in V1.0
by construction (per PR4d.1) and the inline branch fires only for
ConverterRef shapes that don't carry a `converterId` (rare in V1.0).

Lint, type-check, type-check:contract all clean.

Open rubber-duck items still pending:
- DTO original_prompt_id nullability tightening (V1.0 ships nullable
  per the spec; not yet relitigated).
- Citation-strip discipline: still in-progress; partition.ts and
  wave.ts source comments retain a few inline section refs that
  should clear at end-of-V1.0.
- The `reconcileTransformStates` / `reconcileAllTransforms` calls
  (03 §3.3 try/finally + §3.1 step 6) are NOT implemented. PR4e or
  a dedicated follow-up needs them or UserTurn/Score nodes that need
  state reconciliation after the wave stay stuck `stale`.
- `WaveEvent.operator_tag_required` is in the type union but never
  emitted; PR4e's shim wires the gate.
The runner's entry-point shim that wraps runWave with the canonical
five-step ordering from 03 §2.1: tag-hygiene gate → cross-tab lock
acquire → cost-guardrail modal → per-tree wave-queue check → wave
start. Steps 2–5 sit inside try/finally so the lock releases on every
exit path; drain runs OUTSIDE the lock so each drained wave can
acquire its own. This commit also lands the wave-end
reconcileAllTransforms pass (§3.1 step 6) and the buildSForRetry
helper that the retry-failed scope requires.

What ships

  - frontend/src/runner/shim.ts
    - createRunnerShim(deps) → RunnerShim with refreshNode /
      refreshSubtree / refreshTree / retryFailedNodes / cancelWave /
      cancelQueued. Per-tree active-wave + queue maps live in the
      closure so cancel{Wave,Queued} can look up controllers and
      dropped requests.
    - The five steps in order:
      1. Operator tag missing → emit operator_tag_required, no
         acquire, no release. Returns.
      2. lockManager.acquire returns 'busy' → emit busy event, return
         (no release; we don't hold the lock).
      3. costGuardrail.approve(estimatedCalls, triggerKind) → reject
         returns via outer finally (lock released).
      4. currentWaveByTree.has(treeId) → enqueue {waveId, triggerKind,
         scope, leafCount}, emit queued event with queueDepth, return
         via outer finally.
      5. createWaveController + runWaveStarter → set
         currentWaveByTree → await settled → reconcileAllTransforms →
         delete currentWaveByTree → outer finally releases lock →
         drain queue OUTSIDE the lock.
    - cancelWave(treeId): looks up active controller, .cancel(), then
      awaits settled (swallowing rejection) so the public contract
      "resolves when the wave fully settles" holds.
    - cancelQueued(treeId): splices the queue, emits a synthetic
      complete event with summary.cancelled = leafCount per dropped
      wave (the §10.3 contract — operator sees the queued banner
      reconcile).
    - Dependencies are injected (operatorProvider, treeProvider, sink,
      lockManager, costGuardrail, runWaveStarter, uuid, optional now)
      so tests mock every boundary and the shim's orchestration is
      the only thing under test.

  - frontend/src/runner/reconcile.ts
    - reconcileAllTransforms(tree, treeId, sink) walks every
      transform-class node (user_turn / fan / score) once and flips
      stale → clean when the parent is clean. Single pass; transforms
      whose parent flipped this pass stay stale for a follow-up wave
      (catches the operator-typical Score-as-Send-sibling case the
      per-dispatch path-scoped reconcile cannot reach).

  - frontend/src/runner/readiness.ts
    - buildSForRetry(tree, nodeIds): S = {nodeIds} ∪ {failed/cancelled
      Send ancestors on each input's root-to-leaf path}, deduped.
      Walks transparently through UserTurn / Fan / Score ancestors
      (only Send state counts). Silently skips missing nodeIds (UI
      race tolerance).
    - demoteRetryFailedNodes signature evolved from void to return
      ConversationTree: fires sink calls (React-state side) AND
      returns a new tree with demoted nodes flipped to stale +
      execution null + lastError null (the pure-data side runWave's
      computeReady reads). Returns the input tree by identity on
      no-op so caller memoization stays valid.

Notable shape decisions

  - Drain runs OUTSIDE the outer lock-release. The spec's literal
    pseudocode in §2.1 nested the drain inside the outer try, which
    would deadlock on real BroadcastChannel-keyed locks (re-entered
    drained waves would block trying to acquire the same lock). The
    correct read is: each drained wave gets its own acquire-release
    cycle. Drain reachability after the outer try is the right signal
    — every early-exit path (tag-gate, busy, missing tree,
    cost-cancel, enqueue) returns from inside the try (bypassing the
    drain block); a step-5 exception propagates through the finally
    and exits the function before drain runs. No bookkeeping flag
    needed.

  - waveId minted per shim entry, not per queue request. The §10.3
    spec mints one waveId per entry; for an enqueued wave the queued
    event carries it. When the queued wave later drains, its
    re-entered shim invocation mints a fresh waveId for the dispatch
    itself. That's what the literal spec does and what the V1.0 UI
    needs (the queue banner reads queueDepth, not per-wave tracking).

  - Retry-failed shim DEMOTES via demoteRetryFailedNodes and uses the
    RETURNED tree for runWaveStarter. The §3.1 step 2b demotion is
    spec'd as inside the dispatch loop, but doing it in the shim
    keeps runWave dumb (it doesn't need to know about
    waveTriggerKind === 'retry_failed' semantics). The dual sink-call
    + returned-tree shape on demoteRetryFailedNodes is the source of
    truth for both surfaces.

  - cancelQueued emits the complete event itself. The shim is the
    only place that has the per-queued-wave waveId + leafCount; the
    mocked runWaveStarter for queued waves never runs, so no other
    layer can emit the wave's lifecycle complete event. This matches
    the §10.3 contract literally.

  - Missing-tree path: silent return. The UI is the only legitimate
    caller of these entry points and always passes a treeId that
    matches the current tree. If the operator races a tree-delete
    against a refresh click, the shim no-ops rather than crashes. The
    lock IS released (we acquired before the lookup, per the spec's
    step ordering).

  - The shim doesn't depend on runWave's wave.ts directly — it takes a
    RunWaveStarter dependency (the function signature mirrors
    RunWaveArgs minus the sink/api/operation, which production wires
    in via a thin adapter). This is how the shim tests assert the
    five-step ordering without mocking the whole dispatcher.

TDD narrative

  Started with readiness.test.ts: added 8 cases for buildSForRetry
  (the entry point's pre-S helper) + 4 cases for the
  demoteRetryFailedNodes return-tree contract. RED was tsc TS2724
  ('buildSForRetry' not exported) + TS2339/TS2345 on the void return.
  Implemented in readiness.ts; 43 readiness tests pass (28 prior + 15
  new).

  reconcile.test.ts next: 12 cases pinning the
  transform-flips-when-parent-clean rule, the Send-untouched
  invariant, idempotency on clean transforms, and the wide-tree walk
  that catches sibling-of-Send Scores. RED was TS2307 on
  './reconcile'. Implemented reconcile.ts; 12/12 green.

  shim.test.ts is the bulk of the PR: 39 cases across 12 sections —
  tag-gate, lock-busy, cost-cancel, queue enqueue/drain, wave-start
  args, lock release on every exit path (success / starter throws /
  cost cancel / queue / tag abort / busy abort), S construction per
  entry point, waveTriggerKind mapping, cancelWave behavior,
  cancelQueued behavior, wave-end reconcile, per-tree isolation,
  missing-tree no-op. RED was TS2307 on './shim'. Implemented;
  38/39 green on first run.

Defects surfaced during TDD

  - "cancelQueued does NOT affect the active wave" timed out on first
    pass because shim.refreshTree synchronously returns a Promise that
    suspends at the first await (lock acquire); the second invocation
    hadn't reached the enqueue step by the time cancelQueued ran, so
    the queue was empty and the no-op cancel didn't prevent the
    second wave from later draining normally — but the test only
    resolveNext'd once, leaving the drained second starter pending
    forever. Fix: waitFor 'queued' event before cancelQueued. Real
    bug surfaced: the race exists in production too, and operators
    who fire cancel-queued before the queued event has emitted will
    see the queued wave drain anyway. Acceptable for V1.0 (the
    queued event is the UI's signal to enable the cancel chip), but
    worth a note for PR4f when real BroadcastChannel-keyed locks
    change the timing.

  - Initial implementation carried a `waveDispatched` flag through
    the outer try/finally to guard the drain. ESLint's
    no-useless-assignment rule correctly flagged the initialization
    as dead — the drain block is only reachable on the step-5 success
    path (returns inside the outer try propagate through finally and
    exit; exceptions propagate through both finallys and exit). The
    flag was a cargo-culted pattern from C-style exit-code tracking;
    JavaScript's try/finally semantics already encode the right
    signal in reachability. Dropped the flag.

  - Wave-end reconcile uses treeProvider for a fresh lookup rather
    than the tree object we built S against. Reason: in production
    the wave's sink writes flow into the React state container that
    treeProvider reads, so the post-wave snapshot reflects the
    dispatcher's Send state-flips and the reconciler walks the
    correct world. In tests with closed-over fixed trees, this is a
    no-op (same reference returned) — same answer either way.

Verification

  Tests: 851 frontend passing (787 prior + 64 new: 15 readiness +
    12 reconcile + 39 shim — see test counts below per file). Backend
    unchanged (~658 passing).
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/runner directory 94.88 / 87.29 / 94.04 / 96.05
    against the 85/85/90/90 thresholds — shim.ts at 98.7/93.33/90.9/
    100, reconcile.ts at 94.73/90/100/100, readiness.ts at
    94.53/89.04/92.3/96.15. Pre-existing 126 latent test type errors
    untouched (the narrow contract-test gate stays green).

  Per file:
  - readiness.test.ts: 43 tests (28 prior + 15 new)
  - reconcile.test.ts: 12 tests (new file)
  - shim.test.ts: 39 tests (new file)

Next slice

  PR4f: replace mock CrossTabLockManager with real
  BroadcastChannel('pyrit-runner') keyed on conversation_tree_id
  (per 01 §9.4.3 / 03 §10.4). Real-lock timing will shift the
  cancelQueued race noted above — worth retesting the drain
  semantics with the polyfilled BroadcastChannel under jsdom.
  Together PR4e + PR4f complete the runner core; the rubber-duck
  reviewer fires after PR4f per the template.

Open rubber-duck items still pending

  - DTO original_prompt_id nullability (since PR3a; not yet
    re-litigated).
  - Citation-strip discipline (partition.ts + wave.ts inline section
    refs still present; end-of-V1.0 strip).
  - reconcileTransformStates (path-scoped per-dispatch variant) NOT
    implemented in this PR — wave-end reconcileAllTransforms covers
    the canvas-stale-transform-after-wave bug; path-scoped variant
    is a perf optimization for the incremental UI update that V1.0
    can defer.
  - CI gate for the 126 latent test type errors (deferred; narrow
    contract-test gate stays in place).
  - PR1 backward-compat fallback corpus verification (needs prod DB
    access).
…mantics (PR4f)

The cross-tab advisory lock implementation per 01 §9.4.3 / 03 §10.4.
Replaces the mock CrossTabLockManager that the shim consumed in PR4e
with a real BroadcastChannel('pyrit-runner')-keyed lock; adds the
`broadcast-channel` npm polyfill for jsdom (simulate mode) so two
LockManager instances in the same jest process talk to each other the
same way two browser tabs talk through the native BroadcastChannel.
Also adds the queue-drain stale-set regression test that pins the
§10.3 "stale-set recomputed at wave-start" contract.

What ships

  - frontend/src/runner/crossTabLock.ts
    - createBroadcastChannelLockManager(options) returns
      BroadcastChannelLockManager — a CrossTabLockManager + close()
      + exposed tabId. Options: channelName (default 'pyrit-runner'),
      tabId (auto-mint via crypto.randomUUID), acquireTimeoutMs
      (default 50 per §9.4.3), logger (default console), uuid
      (replaceable for tests).
    - Wire format on the channel (per §9.4.3 rev-10 correctness note):
        { type: 'lock_request',  treeId, requestId, tabId }
        { type: 'lock_busy',     requestId, holderTabId }
        { type: 'lock_released', treeId }
      Request/reply correlation rides on `requestId` — MessagePort
      transfer-list does NOT work with BroadcastChannel.
    - Single onmessage dispatcher with a set of subscribers; the
      persistent holder-response handler and the per-acquire busy
      listener both register through it. One handler = consistent
      behavior across native and the npm polyfill (which have
      different onmessage calling conventions; see Defects below).
    - Same-tab reacquire short-circuits: heldLocks.has(treeId) ⇒
      { acquired: true, holderTabId: null } immediately, no message
      round-trip. Per the §9.4.3 protocol — otherwise we'd race our
      own holder-response handler.
    - Graceful degradation when BroadcastChannel is undefined (Safari
      ≤15.3): warn once, then always-acquired. Operators on legacy
      Safari accept the V1.0 fork-bomb risk; everything else keeps
      working. acquire/release/close all no-op safely.

  - frontend/src/runner/treeTypes.ts
    - CrossTabLockManager.acquire return type changed from
      'acquired' | 'busy' to a discriminated union:
        type LockAcquireResult =
          | { acquired: true;  holderTabId: null }
          | { acquired: false; holderTabId: string }
      so the busy reply's holderTabId flows through the shim into
      the WaveEvent { kind: 'busy', holderTabId } the UI consumes.

  - frontend/src/runner/shim.ts
    - Consumes the new LockAcquireResult shape: on !acquired, emit
      busy with lockResult.holderTabId (was hard-coded '' in PR4e).

  - frontend/src/runner/shim.test.ts
    - Updated mkControllableLockManager to take ReadonlyArray<
      LockAcquireResult> and default to the acquired shape.
    - Tightened the busy-event test to assert holderTabId propagation.
    - NEW test: "drained re-entry recomputes S from the LATEST tree
      state, not the snapshot at enqueue" — flips treeProvider's
      tree between enqueue and drain; asserts the drained wave's
      starter call carries S computed from the post-edit tree (per
      §10.3 stale-set-recomputed-at-wave-start contract). Pins the
      PR4e shim's correctness against the rev-15 reviewer Finding 5
      concern that prompted the §10.3 → §2.1 unification.

  - frontend/src/setupTests.ts
    - Loads the `broadcast-channel` npm polyfill globally with
      `enforceOptions({ type: 'simulate' })`. Simulate mode keeps the
      transport in-process — required for jest's parallel test workers,
      which would otherwise step on each other via the polyfill's
      file-RPC default (broadcast-channel/methods/node.js).

  - frontend/package.json + package-lock.json
    - broadcast-channel ^7.3.0 added as devDependency per spec §9.4.3
      ("V1.0 commits to polyfilling via the broadcast-channel npm
      package (~5 KB) loaded in the jest setup file"). Production
      bundles use the browser's native BroadcastChannel; the polyfill
      is dev-only.

Notable shape decisions

  - Discriminated union for LockAcquireResult instead of a bare
    { acquired: boolean; holderTabId: string | null }. The narrowing
    `if (!r.acquired) { r.holderTabId is string }` falls out
    automatically; no nullability dance at every callsite.

  - Single subscriber-set dispatcher rather than calling
    addEventListener/removeEventListener directly on the channel.
    Two reasons: (a) the native API supports addEventListener but
    the npm polyfill's `onmessage` handler calling convention differs
    from native (raw data vs MessageEvent — see Defects). Normalizing
    in one place at the top-level onmessage = setter handles both
    backends; (b) the protocol has one persistent handler (lock-
    request responses) + N ephemeral handlers (per-acquire busy
    listeners). The subscriber-set is the natural fit for that
    structure.

  - simulate-mode polyfill instead of the polyfill's default `node`
    mode. The `node` method uses file-based RPC under /tmp — fine
    for cross-process tests but a leak vector for jest's
    parallel-worker layout. `simulate` is in-process, deterministic,
    and ~5ms per message hop (the polyfill's SIMULATE_DELAY_TIME
    constant — comfortably below the test's 20ms acquireTimeoutMs).

  - The lock manager exposes `tabId` directly (not behind a getter)
    so the busy modal's "another tab (id: …)" label can read it
    without an extra API. Production passes the manager's tabId to
    the modal's "this tab" hint.

  - `acquireTimeoutMs` defaults to 50ms per §9.4.3. Configurable for
    tests + a future operator preference if the latency proves to
    be annoying (V1.x). Imperceptible vs a typical 10+ second wave.

  - close() removes the onmessage handler and clears subscribers
    before calling channel.close(). The polyfill throws on
    double-close; we swallow that defensively (try/catch around
    channel.close()).

  - The release wire message (`lock_released`) is posted but the
    spec's "Wait" auto-acquire flow is NOT wired in V1.0 — that's a
    UI affordance (per §9.4.3 "Wait listens for the lock_released
    message"). The spec also notes "Refresh anyway" as the operator
    override path; both are UI-layer work that lands with PR5/PR6.
    The runner's lock-side of the protocol (post on release) is
    correct today; the listener side wires in with the UI.

TDD narrative

  Started with the shim interface update (LockAcquireResult shape)
  to let the lock tests reference the new type. Updated treeTypes.ts
  + shim.ts + shim.test.ts mock + the busy-event assertion in one
  pass. Shim suite stayed 39/39 green through the interface change.

  Then wrote crossTabLock.test.ts: 13 cases across single-instance
  lifecycle, two-instance contention (busy reply, release-then-
  reacquire, three-way contention), per-tree isolation, same-tab
  reacquire, BroadcastChannel-absent degradation, close-stops-
  responding, auto-mint tabId. RED was TS2307 on './crossTabLock'.

  Implemented crossTabLock.ts. First run: 9/13 green — all
  single-instance + degradation + close tests passed; all four
  cross-instance tests failed. The reason was the polyfill calls
  `onmessage(data)` with the raw user payload, but the native API
  calls `onmessage(MessageEvent)` with a wrapper. My initial
  implementation treated the argument as MessageEvent always and
  read `.data` — which on the polyfill returned undefined. Fixed
  by normalizing at the dispatcher (`instanceof MessageEvent` test
  decides whether to unwrap). All 13 green.

  Added the drain-stale-set regression test to shim.test.ts. First
  run: failed on a test bug (used `sink.calls` instead of
  destructured `callsOf` helper). Fixed; 40/40 shim green.

Defects surfaced during TDD

  - The `broadcast-channel` npm polyfill's onmessage handler is
    called with the RAW user data (not a MessageEvent). Native
    BroadcastChannel calls it with a MessageEvent that wraps the
    data in `.data`. This is a polyfill bug or design choice (the
    polyfill's documentation doesn't surface it loudly), and the
    contention tests caught it immediately — the runner's filter
    `data.type === 'lock_request'` was reading `.type` off a
    MessageEvent (which has `.type === 'message'`, not the wire
    `.type === 'lock_request'`), so the holder-response handler
    never fired and busy replies never went out. The fix is a
    one-liner normalization at the onmessage setter; the runner
    code stays clean. Documented in the BroadcastChannelLike type
    comment.

  - The polyfill's simulate transport adds a 5ms delay per
    postMessage. Two hops (A→B request, B→A response) is 10ms.
    Our default 50ms acquireTimeoutMs is fine for production, and
    tests use 20ms which is also comfortable. Worth noting if
    future tests get flaky on slower CI: bump the test-mode
    timeout, don't lower the polyfill delay.

  - The polyfill's `BroadcastChannel.close()` may throw on double-
    close. The shim's try/finally pattern could trigger a double
    close if a lock manager is reused across shim invocations
    (today it isn't, but defensively the close() wraps channel.close
    in try/catch). Silent swallow — closing an already-closed
    channel has no observable effect.

  - Same-tab reacquire (heldLocks.has) is a load-bearing short-
    circuit, not an optimization. Without it, the holder-response
    handler would fire for the tab's own lock_request and the
    acquire would resolve to busy against itself. The §9.4.3
    pseudocode includes this check ("if (heldLocks.has(treeId))
    return 'acquired'"); the test "same-tab reacquire" pins it.

Verification

  Tests: 865 frontend passing (851 prior + 14 new: 13 lock + 1
    shim drain). Backend unchanged (~658 passing).
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/runner directory 94.5 / 86.57 / 93.87 / 96.11
    against the 85/85/90/90 thresholds —
      crossTabLock.ts at 91.66/81.39/92.85/96.61
      shim.ts        at 98.7/93.33/90.9/100
      (others unchanged)
    Two uncovered crossTabLock.ts lines:
      - line 93 (release no-op in degraded path) is covered by the
        added release call in the BroadcastChannel-absent test.
      - line 207 (defaultUuid's crypto-randomUUID-absent fallback)
        is intentional dead branch for non-Node environments.

Next slice

  PR4f closes the runner core (PR4a-f). Per the rubber-duck
  template, fire the reviewer now before starting PR5 (react-flow
  scaffold). Reviewer scope: PR4e + PR4f + the readiness/reconcile/
  lock/shim files; specific concerns —
    A. shim's drain-outside-lock decision (worth revisiting against
       the §10.3 spec literal?)
    B. CrossTabLockManager interface change (discriminated union
       vs other shapes — premature?)
    C. polyfill choice (npm vs in-process shim — leaner alternative?)
    D. wave-end reconcile via fresh treeProvider lookup (correctness
       against the React state container's update timing)
    E. queue-drain stale-set test honesty (does it prove what it
       claims, or is it a test-that-passes?)
    F. coverage gaps (defensible vs material)
    G. spec drift since PR4d.1
    H. citation-strip discipline (still pending end-of-V1.0)
    Plus the standard rubber-duck items (J. anything else,
    hidden time-bombs, etc.).

Open rubber-duck items still pending

  - DTO original_prompt_id nullability (since PR3a; not yet
    re-litigated).
  - Citation-strip discipline (partition.ts + wave.ts inline section
    refs still present; end-of-V1.0 strip).
  - reconcileTransformStates (path-scoped per-dispatch variant) not
    implemented; wave-end reconcileAllTransforms covers the canvas-
    stale-transform-after-wave bug.
  - CI gate for the 126 latent test type errors (deferred).
  - PR1 backward-compat fallback corpus verification (needs prod DB
    access).
  - PR4e race: cancelQueued fired before the queued event has
    emitted would let the queued wave drain anyway. Acceptable for
    V1.0 (the queued event is the UI's signal to enable the cancel
    chip); document for PR5/PR6 UX work.
Address must-fix items from the post-PR4e+PR4f rubber-duck review.
The runner core remains feature-complete; this commit hardens the
read-back semantics, kills a maintenance hazard, tightens the lock
interface, and adds defense-in-depth tests for the
labels-divergence invariant.

What ships (per reviewer finding)

  Finding D — wave-end reconcile read-back race [LOAD-BEARING]
    The PR4e implementation re-read deps.treeProvider AFTER runWave
    settled to get the post-wave tree snapshot for
    reconcileAllTransforms. Reviewer flagged: production wires
    treeProvider to React state, and React 19's setState commits
    are queued at microtask boundaries — `await settled` may resume
    before React has committed the wave's setState calls, leaving
    treeProvider returning the STALE pre-wave tree. Reconcile would
    then walk the wrong world and miss every transform the wave's
    Send-completions just unblocked.

    Fix: the shim now wraps deps.sink in a per-wave recorder
    (createStateRecorder) and passes the wrapped sink to
    runWaveStarter via the new `sink` field on RunWaveStarterArgs.
    The recorder captures every setNodeState call into a per-wave
    Map<NodeId, NodeState>; after `await settled`, the shim
    constructs the post-wave tree via applyStateRecorder (input
    tree + overlay of captured states) and feeds THAT to
    reconcileAllTransforms. No React-state read-back; no timing
    dependency. The recorder forwards every other sink method
    (recordExecution / clearExecution / setReflogPinned /
    emitWaveEvent) untouched so React state stays in sync.

    Tests pin the new contract:
      - "reconcile reads POST-WAVE state via the recording sink, not
        a treeProvider snapshot" — tree with a stale Send + stale
        Score child; the test starter writes setNodeState(Send,
        clean) through args.sink; reconcile flips the Score to
        clean. Without the recorder, the treeProvider (fixed-tree
        closure) would return Send=stale and Score would stay stale.
      - "starter receives a sink that is a wrapper (not the bare
        deps.sink reference)" — defense-in-depth against a future
        refactor that reverts the wrapping.
      - "recording sink forwards every sink method to the underlying
        deps.sink" — pins that recordExecution / clearExecution /
        setReflogPinned / emitWaveEvent are NOT swallowed.

  Finding C — polyfill swap (npm → in-process)
    The broadcast-channel npm package's onmessage(data) calling
    convention differs from native onmessage(MessageEvent), forcing
    a normalization shim in crossTabLock.ts. Reviewer flagged:
    (a) the normalization is a maintenance hazard (a future browser
    MessageEvent subclass would break the `instanceof` check
    silently); (b) simulate mode bypasses structured-clone
    serialization, so non-JSON-serializable fields added to
    WireMessage later would pass tests but fail in production;
    (c) 7 transitive deps for a 25-line problem.

    Fix: 25-line in-process BroadcastChannel shim in setupTests.ts
    that delivers a real MessageEvent on postMessage, matching
    native semantics exactly. Removes the broadcast-channel devDep
    (and its 7 transitive packages), removes the eventOrData
    normalization shim from crossTabLock.ts, and tightens
    BroadcastChannelLike back to `onmessage: (event: MessageEvent)
    => void`. Production code unchanged.

    Spec note: 01 §9.4.3 says "V1.0 commits to polyfilling via the
    broadcast-channel npm package." Spec wins on settled
    commitments by default; reviewer's case for the in-process
    shim ranged across three orthogonal concerns (test fidelity,
    maintenance burden, dep surface) and was strong enough to
    override. Spec amendment for end-of-V1.0 doc pass.

  Finding B — LockAcquireResult union simplification
    PR4f's discriminated union was:
      { acquired: true;  holderTabId: null }
      | { acquired: false; holderTabId: string }
    Reviewer flagged the `holderTabId: null` on the acquired-true
    variant as junk data — invites a reader to think it carries
    meaning when it doesn't. `acquired: true` already says "no
    holder."

    Fix: tightened to `{ acquired: true } | { acquired: false;
    holderTabId: string }`. Shim consumer unchanged (already only
    reads holderTabId on the !acquired branch). All call-sites
    updated (crossTabLock.ts, shim.test.ts mock,
    crossTabLock.test.ts assertion, treeTypes.contract.test.ts).

  Finding J — acquire-after-close throws
    Reviewer flagged: PR4f's closed-manager acquire returned
    { acquired: true } silently. A closed manager has no
    holder-response handler and no peer subscription — returning
    "acquired" is a silent lie that produces a phantom cross-tab
    race. Fail-loud over silent-lie.

    Fix: acquire on a closed manager throws Error('cross-tab lock
    manager is closed'). Both the native-BC path and the
    BroadcastChannel-absent degraded path track a `closed` flag and
    throw consistently. Release stays best-effort no-op on closed
    managers (release is the cleanup path; throwing there would
    surface inside the shim's outer finally and cascade with the
    wave settlement). Tests added: "acquire after close throws"
    + "release after close is a no-op (NOT throw)."

  Finding J (also) — parentConversationTreeId preservation contract
    Reviewer flagged: the shim reads `tree.parentConversationTreeId`
    off the post-demotion tree returned by demoteRetryFailedNodes
    and forwards it to runWaveStarter for label-divergence
    compliance. If a future demoteRetryFailedNodes refactor lost
    the spread of tree-level fields, parentConversationTreeId
    becomes undefined and the labels-round-trip integration test
    would fail in CI — but no unit test would catch the regression
    on its own.

    Fix: contract test in readiness.test.ts —
    "preserves tree-level fields on the returned tree (id, edges,
    parentConversationTreeId, undoStack, etc.)" — verifies the
    demoted tree carries id / parentConversationTreeId /
    parentSourceConversationId / displayName / rootId / createdAt /
    edges / undoStack identical to the input. Pin lands before any
    regression.

  Finding A — drain-block structural invariant comment
    Reviewer reading: the drain-outside-the-lock decision is
    correct, but the commit body's "prevents a deadlock" argument
    is wrong (the lock manager's same-tab reacquire short-circuits
    would handle that). The real reason is cross-tab fairness —
    drain-inside makes other tabs wait for N-deep queues; drain-
    outside lets them interleave. Reviewer also flagged the
    "no bookkeeping flag needed" property as load-bearing on the
    structural invariant "every early-exit uses `return`" — a future
    refactor that replaces a guarded return with an else-branch
    would silently start draining on the early-exit path.

    Fix: extended the comment block on shim.ts's drain block to
    state the cross-tab-fairness rationale (corrects the original
    commit body's deadlock argument) AND name the
    return-on-early-exit structural invariant explicitly. The next
    refactor reader has explicit guidance.

Notable shape decisions

  - createStateRecorder is a local helper (not exported). The shim
    is the only producer/consumer of the recorder lifecycle; making
    it a public utility would invite reuse in places where the
    React-state-read-back race doesn't apply.

  - applyStateRecorder returns the input tree by identity when
    states.size === 0 (no-op waves). Preserves caller-side identity-
    based memoization the same way demoteRetryFailedNodes does on
    no-op demotions.

  - The recording sink wraps deps.sink but does NOT track
    recordExecution. Reason: state transitions are always via
    setNodeState; recordExecution attaches the execution record
    THEN setNodeState flips state to 'clean'. Capturing
    setNodeState only is sufficient for the reconcile use case
    AND keeps the recorder small. If a future caller needs to see
    the execution map, extend then; don't speculate.

  - The in-process BroadcastChannel polyfill uses queueMicrotask
    for delivery (matching native sync-emit-but-async-receive
    semantics). Tests await two microtask hops to settle a
    round-trip (A request → B response). Determinism in
    parallel-worker jest is preserved because state is purely
    in-memory per-process.

  - Acquire-after-close throws but release-after-close is no-op.
    The asymmetry is intentional and documented inline: release
    runs from the shim's outer finally; throwing there would
    cascade with the wave settlement and turn one issue into two.
    Acquire runs from a fresh shim entry; throwing there surfaces
    the bug at the right callsite.

  - Drop the broadcast-channel npm dep + its 7 transitives entirely
    (was added in PR4f, removed in PR4e+f.1). Net dep change for
    PR4e+f vs PR4d.2: zero.

Defects surfaced during TDD

  - Initial D-fix attempt forgot to update RunWaveStarterArgs's
    public type, causing the test's custom starter to read
    args.sink from a type that didn't declare it. Caught at
    type-check; added `sink` to the args interface.

  - The shim's import block accidentally pulled in ApiErrorReason
    after a copy/paste from treeTypes. ESLint's
    no-unused-vars caught it; cleaned up.

  - When swapping the polyfill, the existing "after close, the
    closed manager no longer responds as holder" test continued to
    pass without the in-process shim's listeners-set teardown
    behaving correctly. Verified by running the test BEFORE
    removing the npm dep; both implementations satisfy the
    contract identically.

  - Section 11 comment header in shim.test.ts got dropped during
    the section-10 edit. Restored.

Verification

  Tests: 871 frontend passing (865 prior + 6 new: 3 shim D-proving +
    1 lock close-throws + 1 lock release-after-close + 1 readiness
    parentConversationTreeId contract). Backend unchanged (~658
    passing).
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/runner directory 95.13 / 87.42 / 96.26 / 96.58
    (was 94.5 / 86.57 / 93.87 / 96.11). All modules above the
    85/85/90/90 thresholds.
      crossTabLock.ts at 94.66/85.36/100/98.36 (functions to 100)
      shim.ts        at 98.91/96.87/95/100    (branches improved)
      readiness.ts   at 94.53/89.04/92.3/96.15 (unchanged)
      reconcile.ts   at 94.73/90/100/100      (unchanged)
      wave.ts        at 100/93.1/100/100      (unchanged)
      partition.ts   at 93.75/85.1/100/94.59  (unchanged)
      dispatchHelpers.ts at 97.43/89.47/100/97.22 (unchanged)
      dispatch.ts    at 85.33/50/88.88/88.23  (UNCHANGED; pre-existing
                                                low-branch debt from PR4c2;
                                                tracked as open item)

Open rubber-duck items still pending

  Carried forward (not addressed in PR4e+f.1):
  - DTO original_prompt_id nullability (since PR3a; not re-litigated).
  - Citation-strip discipline (partition.ts + wave.ts inline section
    refs; end-of-V1.0 strip).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification (needs prod DB
    access).
  - PR4e cancelQueued-race: cancelQueued fired before the queued
    event emits lets the queued wave drain anyway. Acceptable for
    V1.0; PR5/PR6 UX should document.

  New from PR4e+f.1 review:
  - dispatch.ts branch coverage at 50% (pre-existing from PR4c2,
    not introduced by PR4e/f). dispatch.ts owns create_attack +
    add_message sequencing and partial-commit semantics; the low
    branch coverage on mid-chain failure paths is genuine debt.
    Tracked for follow-up; not gating PR5.
  - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim,
    LockAcquireResult DU): three sentence-length amendments owed to
    01 §9.4.3, 03 §2.1, 03 §3.1 step 2b. End-of-V1.0 doc pass.
  - shim drain loop serializes ALL queued waves under the first
    shim invocation's call stack. If operator clicks Refresh 5
    times: the FIRST call's promise doesn't resolve until ALL 5
    waves complete. Contract decision deferred to PR6 (which adds
    the wave-status banner that consumes this).

Next slice

  PR5 — react-flow UI scaffold. Sub-PRs PR5a-g per plan. Runner
  core (PR4a-f + PR4e+f.1) is now feature-complete and the rubber-
  duck checkpoint is closed. The runner-shape PR5 depends on is
  frozen: ConversationTree as plain object, RunnerShim's six-method
  surface, RunnerStateSink + WaveEvent stream contracts.
…pter (PR5a)

First slice of PR5 (the react-flow UI). Adds the pure adapter that maps
a domain ConversationTree onto react-flow's Node[]+Edge[] shape, plus a
minimal `TreeCanvas` scaffold component that mounts ReactFlow with the
adapter's output. Per-node components (PR5b), action rails (PR5c),
edge `+` chip (PR5d), Stack rendering (PR5e), Pick/Unpick (PR5f), and
layout (PR5g) land in subsequent slices.

What ships

  - frontend/src/components/Tree/conversationTreeToReactFlow.ts
    - conversationTreeToReactFlow(tree) → { treeId, nodes, edges }
    - One react-flow Node per ConversationTreeNode (1:1, no
      restructuring). Discriminated-union typing on the result
      (TreeFlowNode = Node<{node:RootPromptNode},'root_prompt'> | ...
      | Node<{node:ScoreNode},'score'>) so PR5b's node components
      can register by kind and narrow params via a switch.
    - Each node's `data.node` is the SAME ConversationTreeNode
      reference (not a clone) so downstream useMemo hooks in node
      components can identity-check against unchanged-tree renders.
    - Each node carries a placeholder `{ x: 0, y: 0 }` position;
      PR5g's d3-hierarchy layout pass overrides on render.
    - One react-flow Edge per ConversationTreeEdge with
      type='smoothstep' (orthogonal routing, per 02 §4.4).
    - Edge data carries `slotIndex` so the PR5e Fan-Children Stack
      predicate and PR5f Pick/Unpick (writes
      promotedChildSlotIndex) can read it directly without
      walking back to the source ConversationTreeEdge.
    - Stable edge id from the source `ConversationTreeEdge.id`
      (load-bearing for react-flow's reconciler — id changes force
      unmount/remount and kill the PR5d edge-hover state).
    - Exhaustive kind switch with `never`-typed default arm —
      compile-time guard against adding a new ConversationTreeNode
      kind without an adapter arm.

  - frontend/src/components/Tree/TreeCanvas.tsx
    - Scaffold component: takes a ConversationTree, calls the
      adapter, mounts ReactFlow + ReactFlowProvider.
    - `useMemo` on the adapter call so tree-prop identity changes
      (not just shape changes) drive re-adaption.
    - `data-tree-id` + `data-testid="tree-canvas"` on the wrapper
      div for test introspection AND so PR5b+ can route action-
      rail callbacks back to the runner shim by reading the
      ancestor tree id.
    - nodeTypes + edgeTypes deliberately not registered yet — the
      scaffold uses react-flow's default node renderer (shows the
      node id) until PR5b's per-kind components land. Commented
      stubs flag where PR5b/5d will plug in.
    - fitView prop on for the scaffold so single-tree-fits-canvas
      works without manual zoom.

Notable shape decisions

  - The adapter is a pure function over tree shape; no react,
    no hooks, no closures over runner state. PR5g's layout pass
    will wrap this output and add positions; per-node interactivity
    (action rail callbacks, edge `+` chip) lands in the node /
    edge components and routes through props supplied at the
    TreeCanvas boundary.

  - Discriminated-union node typing rather than a single
    Node<{node: ConversationTreeNode}>. Without the discrimination,
    every node component would need an internal `if (node.kind ===
    ...)` narrow before reading params. The kind-discriminated
    union lets PR5b register node components as
    `nodeTypes: { root_prompt: RootPromptCard, ... }` and have
    TypeScript narrow `data.node` to the right type at the
    component boundary.

  - TreeFlowEdgeData is `{ slotIndex: number }`, NOT the full
    `ConversationTreeEdge`. Reasons: (a) slotIndex is the only
    field a Stack / Pick consumer reads; (b) extending TreeFlowEdgeData
    later is a non-breaking type change (edge consumers read
    specific keys, not the whole object); (c) keeps the adapter
    output minimal.

  - Placeholder positions at (0,0) instead of computing a quick
    initial layout. react-flow tolerates same-position nodes
    (they overlap at the origin). PR5g's layout pass owns
    positioning end-to-end; computing a throwaway layout here
    would be wasted work. The scaffold's `fitView` keeps the
    overlap from breaking the canvas mount visually until layout
    lands.

  - TreeCanvas doesn't take a sink, runner, or any callbacks
    yet — pure-render shell. PR5b will widen the prop set as
    action-rail callbacks land; defining the prop surface
    speculatively would invite over-specifying before we know
    what the components need.

TDD narrative

  conversationTreeToReactFlow.test.ts: 17 cases pinning node 1:1
  mapping, kind → type passthrough, ImportMessageNode-as-root,
  data.node identity preservation, placeholder positions, edge
  1:1 mapping, edge id stability, slotIndex on edge data,
  smoothstep edge type, fan-children edges with auto-numbered
  slot indices, explicit edges with non-default slotIndex,
  root-only tree (zero edges), input-mutation safety, wide
  multi-fan-path tree, treeId on result, kind-discriminated
  type narrowing. RED was TS2307 on './conversationTreeToReactFlow'.
  Implemented; 17/17 green.

  TreeCanvas.test.tsx: 4 cases — node count per tree, treeId
  attribute on wrapper, tree-swap survives without losing nodes,
  wide tree mounts cleanly. RED was TS2307 on './TreeCanvas'.

Defects surfaced during TDD

  - First TreeCanvas test pass used `[data-id]` as the node-card
    selector. react-flow tags connection handles with `data-id`
    too (e.g., "1-r-null-target"), so the selector matched 6 DOM
    nodes per card (1 wrapper + 2 handles + handles on the
    handles). Switched to `[data-testid^="rf__node-"]` which
    matches only the card wrapper. 3/4 tests went green to 4/4.

  - First implementation pass used `): JSX.Element` as the
    TreeCanvas return type. Main tsconfig (vs tsconfig.test.json)
    doesn't include the legacy global JSX namespace; eslint
    no-undef caught it. Dropped the explicit return type to
    match the existing component convention (ConnectionBanner,
    ErrorBoundary, etc.).

Verification

  Tests: 892 frontend passing (871 prior + 21 new: 17 adapter + 4
    scaffold). Backend unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 90.9 / 85.71 / 100 /
    90.47 — clear of the 85/85/90/90 global thresholds.
      conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71
        (uncovered lines: the `never`-typed default arm of the kind
        switch — compile-time-unreachable; istanbul still counts it)
      TreeCanvas.tsx: 100/100/100/100

Dependencies

  - @xyflow/react ^12.11.0 added as a runtime dep (replacement for
    the older `reactflow` package; v12+ ships under the @xyflow
    scope). Adds 19 transitive packages.

Next slice

  PR5b — per-kind node components (RootPromptCard, UserTurnCard,
  SendCard, FanCard, ScoreCard, ImportMessageCard). Each card
  reads `data.node` (typed to its kind), renders a Fluent UI card,
  and exposes hooks for the PR5c action rail. The scaffold's
  nodeTypes prop wires up once components are ready.

Open rubber-duck items still pending (unchanged from PR4e+f.1)

  - DTO original_prompt_id nullability.
  - Citation-strip discipline (partition.ts + wave.ts + shim.ts
    have inline section refs; end-of-V1.0 strip).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim,
    LockAcquireResult DU).
  - shim drain loop call-stack serialization.
Six per-kind node cards (RootPromptCard, ImportMessageCard,
UserTurnCard, SendCard, FanCard, ScoreCard) registered against the
TreeCanvas's `nodeTypes` prop. Each card displays the kind-specific
summary fields a tree-viewing operator needs to see at a glance —
prompt text + target on the root, role + text + converter count on
user turns, target override + lastError on sends, axis + variant
count + pick indicator on fans, scorer type + V1.0 render-only hint
on scores. Action rails (PR5c), edge `+` chip (PR5d), Stack
rendering (PR5e), Pick/Unpick (PR5f), and layout (PR5g) land
separately.

What ships

  - frontend/src/components/Tree/nodeCards.tsx
    - RootPromptCard: prompt text (truncated body) + target chip;
      no target handle (root has no parent).
    - ImportMessageCard: source conversation id + cutoff index;
      no target handle.
    - UserTurnCard: text body + role + optional converter-count
      chip; both handles. Body carries the full text on the `title`
      attribute so hover discoverability works even when the body
      is line-clamp truncated.
    - SendCard: optional target override + lastError surface
      (visible only when state is 'failed' AND lastError !== null);
      both handles.
    - FanCard: axis + variant count + Pick indicator (when
      promotedChildSlotIndex is non-null); both handles.
    - ScoreCard: scorer type + V1.0 render-only hint (per 02 §2.2
      ScoreNode rail: configure-scorer is V1.1).
    - Shared CardFrame component renders the kind label, the
      state badge (7-state color map: draft / clean / edited /
      stale / running / failed / cancelled), and the react-flow
      Handles (target on top, source on bottom; toggleable via
      props since root + import don't have parents).
    - Shared MetaRow for "label: value" key-value pairs in a
      consistent layout.

  - frontend/src/components/Tree/treeNodeTypes.ts
    - Kind → component map: `{ root_prompt, import_message,
      user_turn, send, fan, score }`. Lives in its own module so
      eslint's react-refresh/only-export-components rule stays
      happy (mixing the registry with components defeats HMR).

  - frontend/src/components/Tree/TreeCanvas.tsx
    - Wired `nodeTypes={treeNodeTypes}` into ReactFlow. The
      scaffold's commented-out stub from PR5a is now live; cards
      render in place of react-flow's default node renderer.

  - frontend/src/components/Tree/nodeCards.test.tsx
    - 34 tests across six per-card describes + a registry
      integration describe. Each card describe asserts:
        - kind label rendered
        - kind-specific fields rendered
        - state badge renders the current state
        - handle visibility per parent expectations
      Registry tests mount cards via TreeCanvas to prove the
      `nodeTypes` wiring is intact end-to-end (missing entries
      would fall back to react-flow's default node renderer, which
      only shows the node id — the test asserts specific card
      content like "Root prompt" + "Send", which only render when
      the registry is properly registered).

Notable shape decisions

  - Cards are read-only display only in PR5b. No callbacks, no
    edit handlers, no action rail. PR5c widens the prop set as
    action-rail callbacks land; defining the surface speculatively
    would over-specify before we know what the components need.

  - State badge uses a 7-state inline color map rather than
    Fluent UI's intent-based MessageBar. Reason: state colors are
    a domain-specific visual language (clean=green, edited=yellow,
    stale=orange, running=blue, failed=red) that doesn't map onto
    Fluent's success/warning/error intents. The inline map keeps
    the color choices in one place and visible in the test grep.

  - CardFrame takes showTargetHandle + showSourceHandle as
    explicit props with defaults of true. Root + import set
    showTargetHandle={false} explicitly. The visual reads
    correctly without the top handle (no half-edge stub) and the
    type-system enforcement falls out of CardFrame's prop signature.

  - SendCard's lastError block renders inline (red panel) rather
    than as a popover or tooltip. Operators scanning a failed
    fan want the failure reason at a glance; clicking through
    to a tooltip adds friction without value. The PR6 wave-
    complete toast covers the wave-level summary; the per-card
    lastError covers the per-leaf detail.

  - FanCard's Pick indicator renders as a MetaRow ("pick: slot N")
    rather than a special chip. The §3.3 Pick semantic is one of
    several fan-card facts (axis, count, pick); rendering them
    uniformly as MetaRows keeps the card visually consistent.
    PR5f's Pick/Unpick interaction lands here, but it's
    affordance — not display — so the visual stays this PR.

  - ScoreCard's "V1.0: displays scores attached to upstream
    pieces" hint prevents the V1.0 operator from expecting to
    click and configure (the affordance is V1.1 per 02 §2.2 + the
    spec's render-only contract per 01 §4.5 / 03 §3.2). Surfacing
    the limitation on the card is cheaper than a tooltip on the
    (currently-not-rendered) configure icon.

  - NodeProps generic uses `Extract<TreeFlowNode, { type: 'X' }>`
    so each card's props type is exactly the corresponding adapter
    output node. This is what makes `data.node: RootPromptNode`
    (vs `ConversationTreeNode`) without a runtime narrow inside
    the card — the discriminated-union shape PR5a set up pays
    off here.

TDD narrative

  Single test file (nodeCards.test.tsx) with 34 cases organized
  into seven describe blocks (one per card + the registry). RED
  was TS2307 on './nodeCards'. Implementation took two passes:
  first pass put treeNodeTypes in nodeCards.tsx; lint flagged
  react-refresh/only-export-components because the file mixed
  component exports with a non-component constant. Moved
  treeNodeTypes into its own module (treeNodeTypes.ts); 55/55
  green.

Defects surfaced during TDD

  - Initial registry placement in nodeCards.tsx triggered the
    react-refresh/only-export-components warning. Mixing exports
    breaks HMR for the components (the bundler can no longer
    fast-refresh just the changed component). Split into a
    separate treeNodeTypes.ts module — net +1 file, +0 LOC of
    logic, and HMR works.

  - First lint pass also caught that `): JSX.Element` is not
    valid under the main tsconfig (no global JSX). PR5a hit the
    same issue; convention is to omit the explicit return type
    on React components.

Verification

  Tests: 926 frontend passing (892 prior + 34 new). Backend
    unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 96.07 / 93.1 / 100 /
    96 — clear of the 85/85/90/90 thresholds.
      nodeCards.tsx: 100/95.45/100/100 (one uncovered branch is
        the inline color map's defensive fallback for a state
        not in the union — unreachable)
      treeNodeTypes.ts: 100/100/100/100
      TreeCanvas.tsx: 100/100/100/100
      conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71
        (unchanged from PR5a; the never-typed default arm)

Next slice

  PR5c — per-node action rail (icons + tooltips per 02 §2.2).
  Adds onRefresh, onBranch, onDelete, onOpenLinear callback props
  to TreeCanvas → cards. The rail itself is a small floating row
  positioned by react-flow's NodeToolbar component (or a custom
  absolute-positioned div if NodeToolbar's defaults don't fit).
  Action wiring routes through props supplied at the TreeCanvas
  boundary; the actual runner calls land in PR7 when persistence
  + auto-reverse make a complete-cycle integration possible.

Open rubber-duck items still pending (unchanged from PR5a)

  - DTO original_prompt_id nullability.
  - Citation-strip discipline.
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim,
    LockAcquireResult DU).
  - shim drain loop call-stack serialization.
Address must-fix + should-fix items from the post-PR5a+PR5b rubber-duck
review. The Tree UI scaffold and per-kind cards stay feature-equivalent;
this commit fixes a load-bearing theme bug, removes coupling to react-
flow internals, threads the `selected` prop through every card so PR5c's
action rail can read it without rewriting card prop surfaces, and strips
spec citations that regressed past the PR4b directive.

What ships (per reviewer finding)

  Finding B.1 — dark-theme-only baked colors [LOAD-BEARING]
    Reviewer flagged: PR5b's `STATE_COLORS` constant hardcoded dark
    hexes (#1e3a1e for clean-green, #3a1e1e for failed-red, etc.).
    App.tsx toggles webLightTheme ↔ webDarkTheme; switching to light
    theme rendered cards as dark-on-near-white stripes — clean-green
    looked like a smudge, failed-red was unreadable. Real runtime
    defect; the cards were the only component in the workspace
    ignoring the theme contract.

    Fix: replace `STATE_COLORS` with `STATE_BADGE_TOKENS: Record<
    NodeState, { background, foreground }>` keyed on Fluent palette
    tokens (colorPaletteGreenBackground2 / Foreground2 pairs, etc.).
    Both light + dark themes auto-adapt. Matches the rest of the
    codebase's status-chip convention (Chat/TargetBadge.styles.ts).

  Finding C / J.3 — CardFrame inline styles + missing `selected`
    Reviewer flagged:
      (a) PR5b cards used inline `style={{...}}` for every visual
          decision. PR5c's action rail will need `:hover` +
          `[data-selected]` pseudo-classes that inline styles can't
          express, forcing a useState(hover) + onMouseEnter retrofit
          on every card.
      (b) react-flow passes `selected: boolean` to every node
          component, but PR5b cards dropped it on the floor (the
          test helpers even passed `selected: false` already).
          PR5c's selection-visual + action-rail-visibility-when-
          selected would have rewritten every card's prop surface.

    Fix: new `nodeCards.styles.ts` companion module using
    `makeStyles` + Fluent tokens for every card visual. CardFrame
    now accepts `selected?: boolean` (default false at the
    destructure site — one default site, not per-card), threads it
    through to `data-selected` on the wrapper, and applies the
    `frameSelected` class for the brand-color outline. Every card
    receives + forwards `selected` from NodeProps. PR5c becomes
    "add the action rail," not "rewrite six cards."

  Finding E — data-testid="rf__node-*" couples to react-flow internals
    Reviewer flagged: PR5a's TreeCanvas test used
    `container.querySelectorAll('[data-testid^="rf__node-"]')` — a
    private testid scheme that could silently shift with a @xyflow
    minor-version bump.

    Fix: CardFrame emits `data-tree-node-id={nodeId}` on the wrapper.
    TreeCanvas tests now select via `[data-tree-node-id]`, which is
    under our control and immune to react-flow renames. Also enables
    the PR5c action rail to find cards by node id without walking
    react-flow's DOM tree.

  Finding A — compile-time guard on the kind → component registry
    Reviewer flagged: PR5b's `treeNodeTypes` registry was just an
    `as const`. The "every kind has a registry entry" guarantee
    relied on the test alone — a developer adding a new
    ConversationTreeNodeKind without a registry entry would only
    discover it at jest time.

    Fix: `as const satisfies Record<ConversationTreeNodeKind,
    ComponentType<never>>` on treeNodeTypes. Compile-time
    completeness: tsc fails the moment a new kind lands without a
    registry arm. The existing runtime test is now defense-in-depth.

  Finding I — citation discipline regressed by 8 instances
    PR5a + PR5b added 8 new `02 §...` references in JSDoc + test
    comments after the PR4b directive ("no new citations in code").
    Stripped all 8 — 5 in module-header JSDoc, 3 in test comments
    and titles. The doc/gui/design/ files remain the source of
    truth; the code just no longer mirrors specific section refs.

  Finding H.4 — operator-facing V1.0/V1.1 text in ScoreCard
    Reviewer flagged: ScoreCard's body copy was
    "V1.0: displays scores attached to upstream pieces (configuration
    is V1.1)." Operators don't know what V1.0/V1.1 means; release
    labels in operator-facing copy are a smell. The "scorer
    configuration coming later" detail belongs on the PR5c action
    rail's disabled `✏` icon tooltip, not the card body.

    Fix: strip the V1.0/V1.1 text. The ScoreCard footer now reads
    just "Read-only display" — enough to signal non-interactivity
    without naming an internal version label.

  Finding F — implementer's coverage rationalization was wrong
    Reviewer flagged: PR5b commit body claimed the missing branch
    was "the inline color map's defensive fallback for a state not
    in the union" — but there was no `??` fallback in the code; an
    invalid state would throw on `.background` access. The
    rationalization didn't match the source.

    Fix: actual uncovered branches found and exercised:
      - SendCard's `state==='failed' && lastError===null` quadrant
        (test "does NOT render the error panel when state is
        'failed' but lastError is null")
      - UserTurnCard's singular-converter ternary
        (test "uses singular 'converter' for a one-converter pipeline")
      - CardFrame's `selected = false` default (test "cards default
        to unselected when `selected` is undefined")

  Finding D — adapter ↔ registry alignment defense-in-depth
    Reviewer flagged: the registry test could be the single point of
    failure for adapter/registry alignment. If the adapter emitted
    `type: 'rootPrompt'` and the registry keyed on `root_prompt`,
    only the registry test would catch it; per-card tests would still
    pass.

    Fix: new test "every kind emitted by the adapter has a registry
    entry (adapter ↔ registry alignment)" — builds a tree with every
    kind, runs the adapter, asserts every result `node.type` is a
    registry key. Defense-in-depth alongside the `satisfies`
    compile-time guard from Finding A.

  Finding J.2 — `mockNodeProps` generic stub builder
    Reviewer flagged: PR5b had six near-identical `as unknown as
    Parameters<typeof X>[0]` cast helpers. The `as unknown as` is a
    "trust me" the type-checker can't validate; adding a mandatory
    field to NodeProps would silently break tests.

    Fix: one generic `mockNodeProps<T>(id, data, selected?)` helper;
    the six card-specific wrappers (`rootPromptProps(node)`, etc.)
    are now thin invocations that supply the appropriate `T`.

Notable shape decisions

  - `makeStyles` was chosen over emotion or styled-components because
    Chat/TargetBadge.styles.ts is the existing in-codebase pattern.
    Mixing CSS-in-JS systems would be churn for no benefit.

  - Griffel (Fluent's CSS engine) rejects the `borderColor`
    CSS shorthand for theme-token reuse reasons (so individual sides
    can be overridden via longhand). The `frameSelected` slot uses
    `borderTopColor` / `borderRightColor` / `borderBottomColor` /
    `borderLeftColor` instead. Same visual; lint-clean.

  - The inline state-badge color comes from STATE_BADGE_TOKENS at
    render time (one `<span style={{ background, color }}>` per
    card). makeStyles can't easily produce 7 dynamic-key combinations
    without static slot generation; the runtime lookup is cheaper
    and reads more naturally. Token references resolve at render
    against the current theme, so dark/light still both work.

  - The `selected` default lives in CardFrame's destructure
    (`selected = false`), not per-card (`selected ?? false`). One
    branch to cover, not six. The cards forward whatever NodeProps
    handed them and rely on CardFrame's default for the absent case.

  - Stripped the SendCard inline lastError panel? No — kept it.
    Reviewer flagged it as spec drift (H.2: the spec pins failure
    summarization at the wave-status banner, not the per-card
    panel). Decision: keep the inline surface because operator
    "why did this leaf fail" is a real read-this-card moment, and
    the action-rail `💬` (PR5c, drawer) is the "full raw response"
    surface — different need. The spec should grow an acknowledgment
    sentence (deferred to end-of-V1.0 doc pass with the other spec
    drift notes).

TDD narrative

  Worked finding-by-finding starting from the load-bearing one
  (B.1 theme bug). Each finding:
    - identified the regression-prone surface
    - landed the fix
    - added or tightened a test that proves the fix sticks

  Two test failures caught regressions:
    1. The "failed-with-null-lastError" test used a text-content
       regex that matched the state badge's "failed" string —
       false-positive. Fixed by switching to a class-substring scan
       for `errorPanel` (makeStyles preserves slot names in dev
       mode for debuggability, so the class contains 'errorPanel').
    2. The Griffel `borderColor` shorthand warning surfaced during
       the makeStyles refactor. Fixed via four longhand properties.

Defects surfaced during TDD

  - Griffel's CSS-shorthand rejection list includes `borderColor`
    but NOT `border` (the full shorthand). The distinction is
    "shorthand that lets later rules override specific sides via
    longhand" — only the partial-override family is rejected.
    Documented inline in nodeCards.styles.ts.

  - The "doesn't render error panel" test originally used a
    text-substring scan that false-matched against the state
    badge's literal "failed" text. Lesson: don't text-scan for
    state words when the state itself is a literal element on
    the card.

  - Coverage on nodeCards.tsx's branch count moved with the
    `selected` consolidation. Originally each card had its own
    `?? false` (6 branches); consolidation to CardFrame's
    destructure-default left 1 branch. The "defaults to
    unselected" test covers it, but istanbul still scores the
    default-parameter as two arms.

Verification

  Tests: 932 frontend passing (926 prior + 6 net: 4 new + 2 renamed
    + 0 removed). Backend unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 96.66 / 94.28 / 100 /
    96.61 (was 96.07 / 93.1 / 100 / 96 in PR5b).
      nodeCards.tsx: 100/96.42/100/100 (one uncovered branch is
        the default-parameter arm of `selected = false` — istanbul
        over-counts default-parameter coverage)
      nodeCards.styles.ts: 100/100/100/100
      treeNodeTypes.ts: 100/100/100/100
      TreeCanvas.tsx: 100/100/100/100
      conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71
        (unchanged; the never-typed default arm of the kind switch)

Open rubber-duck items still pending

  Carried forward (not addressed in PR5a+b.1):
  - DTO original_prompt_id nullability.
  - Citation-strip discipline (partition.ts + wave.ts + shim.ts
    still have section refs; this PR did NOT strip those, only the
    new ones introduced by PR5a+b).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim,
    LockAcquireResult DU); PR5b.1 adds another: SendCard's inline
    lastError surface is spec drift kept by design.
  - shim drain loop call-stack serialization.

  Deferred from this review:
  - Card decomposition split point (six cards + three shared
    primitives in one 270-line file). Reviewer J.4: "split when
    CardFrame's hover/select state lives in its own hook —
    somewhere mid-PR5c." Honored: not splitting now.
  - PR5g layout memo keying. Reviewer J.1 noted PR5g will be
    tempted to re-layout per-leaf state ping. Not addressable
    in PR5b; tracked for PR5g.

Next slice

  PR5c — per-node action rail (icons + tooltips). Callback props
  on cards (onRefresh, onBranch, onDelete, onOpenLinear), threaded
  via TreeCanvas. CardFrame's `selected`-driven `data-selected`
  attribute is what PR5c's rail-visibility-on-hover-or-selected
  CSS reads. The makeStyles shell from PR5b.1 is where the hover
  pseudo-class lands.
…xt (PR5c)

The common-to-every-node action rail (Refresh / Branch / Branch-as-
subtree-stub / Delete / Open-in-linear) wired through TreeCanvas →
CardFrame → ActionRail via a React context. Per-callback opt-in:
undefined callbacks hide their buttons so PR5c lands the wiring
without forcing every runner integration to land in the same PR.
Kind-specific actions (✏ edit, ⚡ converter, ≡ role, ↻×N re-run,
📎 attachment, 🎯 target-override, etc.) defer to later sub-PRs —
each needs its own state machine + dialog and is poorly served by
the same minimum-viable callback bag.

What ships

  - frontend/src/components/Tree/actionRail.tsx
    - ActionRail({ nodeId, callbacks, branchLabel }) — small Fluent
      UI Button row with five action slots:
        ↻ Refresh    (ArrowSyncRegular)
        🌿 Branch    (BranchRegular; label varies by node kind)
        ⫝ Branch-subtree (BranchForkRegular, ALWAYS disabled —
          V1.1 placeholder; reserved slot per the operator-facing
          convention that V1.1 enablement is a state flip, not a
          new button)
        🗑 Delete    (DeleteRegular)
        🔍 Open-in-linear (OpenRegular)
    - Each callback is optional; undefined hides the button entirely
      so PR5c ships before per-action runner wiring.
    - Wrapper emits `data-tree-action-rail` + `data-tree-node-id`
      for DOM scoping (PR5d's edge `+` chip will use the rail
      element's position as an anchor reference).

  - frontend/src/components/Tree/actionRail.styles.ts
    - Fluent makeStyles for the rail row layout (horizontal flex,
      tokens-spaced gap, top border via stroke2 token). Visibility
      defaults to always-visible in PR5c; the hover/focus
      visibility-flip per the design wires alongside Stack
      rendering (PR5e) when CardFrame grows a hover handler.

  - frontend/src/components/Tree/actionCallbacksContext.ts
    - ActionCallbacksContext (React context, default null).
    - useActionCallbacks() returns ActionCallbacks | null.
    - Lives in its own module so the adapter stays pure and a
      callbacks-only-change render doesn't re-run
      conversationTreeToReactFlow.

  - frontend/src/components/Tree/TreeCanvas.tsx (modified)
    - Added optional `actionCallbacks?: ActionCallbacks` prop.
      When supplied, wraps ReactFlow in
      <ActionCallbacksContext.Provider value={actionCallbacks}>.
      When omitted, provider value is null and cards skip the
      rail entirely (preserves the PR5a/PR5b "display only" use).
    - The adapter is NOT in the actionCallbacks dependency list —
      changes to callbacks don't perturb the tree's adapter output,
      preserving identity-stable nodes/edges for react-flow's
      reconciler.

  - frontend/src/components/Tree/nodeCards.tsx (modified)
    - CardFrame consumes useActionCallbacks(); when non-null,
      renders <ActionRail nodeId={nodeId} callbacks={callbacks}
      branchLabel={branchLabel} />.
    - Every card forwards its kind-appropriate `branchLabel`:
      RootPromptCard → "Clone tree" (the operator-facing language
      for a root clone), every other card → "Branch from here".

  - frontend/src/components/Tree/actionRail.test.tsx
    - 20 tests across four sections:
        1. ActionRail in isolation — opt-in/opt-out rendering per
           callback presence, disabled Branch-subtree slot, click
           invocations with the right nodeId
        2. TreeCanvas integration — rail per card, Refresh-click
           on a specific card invokes onRefresh with that card's
           id, root vs non-root branchLabel, callbacks-omitted
           renders zero rails (back-compat)
        3. Accessibility — aria-label on every button, tooltip on
           the disabled V1.1 button, data attributes for DOM scoping
        4. CardFrame integration — rail doesn't break
           data-tree-node-id wrapper attribute (PR5a/PR5b selector
           contract survives)

Notable shape decisions

  - Context vs prop-threading for callbacks. The adapter (PR5a) is
    pure and identity-stable; threading callbacks through `data`
    on every adapter node would force re-adaption on every
    callback-prop change. Context lets the rail consume callbacks
    where it renders, leaving the adapter untouched. Trade-off:
    cards become non-pure (they consume context); accepted because
    the alternative breaks the adapter's identity contract for
    react-flow's reconciler.

  - branchLabel is a CardFrame prop, not derived inside ActionRail.
    The card knows the kind ("root prompt" → "Clone tree"); the
    rail doesn't and shouldn't. Pushing branchLabel into ActionRail
    via the cards keeps the rail kind-agnostic so PR5d/PR5e can
    reuse it on the Stack card without inventing new render rules.

  - Tooltip `relationship="description"` rather than
    `relationship="label"`. The Fluent label-relationship pattern
    uses aria-labelledby pointing at the tooltip content; the
    tooltip only renders the content when shown, so the accessible
    name comes from the hover tooltip alone. Under jsdom (and
    arguably under screen-reader-without-hover), this means the
    button has no name. Switched to `description` and kept an
    explicit aria-label on every button so the accessible name
    is permanent.

  - V1.1 Branch-subtree button always renders (disabled). Per the
    operator-facing convention: V1.1 enablement is a state flip,
    not a new affordance appearing. Keeping the slot reserved
    prevents an operator's muscle memory from forming around four
    buttons that suddenly become five. The `title` attribute
    carries the operator-friendly explanation ("coming in a future
    release") — no V1.0/V1.1 release labels in operator-facing copy.

  - Five callbacks, not nine. Per the rubber-duck principle of
    "don't speculate," PR5c ships the common-to-every-node rail
    only. The seven kind-specific actions (per-card edit,
    converter, role, re-run-N, attachment, target-override, view-
    raw-response) each carry their own UI work (palette, role
    cycler, count-prompt, etc.) and don't share a callback
    surface. Each lands when its full interaction is ready.

  - Callbacks accept nodeId only, not (treeId, nodeId). The host
    that mounts TreeCanvas already knows the treeId — the
    callbacks close over it at the consumer's call site:
      `onRefresh={(id) => runner.refreshNode(treeId, id)}`.
    Adding treeId to the rail's callback signature would be
    speculative complexity for a single-tree V1.0 canvas.

TDD narrative

  Started with actionRail.test.tsx — 20 cases pinning callback
  invocations, opt-in rendering, the disabled V1.1 slot, and the
  rail's behavior when threaded through TreeCanvas. RED was
  TS2307 on './actionRail'.

  Implementation took three corrective passes:
    1. Initial ActionRail used `<Tooltip relationship="label">` —
       caused all buttons except the first to lose their accessible
       name in jsdom. Switched to `relationship="description"` and
       kept explicit aria-label on every button.
    2. userEvent.click on a button inside a react-flow node card
       throws "Cannot read properties of null (reading 'document')"
       — react-flow's pointerdown handler dereferences a null
       window owner inside jsdom. Switched click tests to
       fireEvent.click which dispatches a single MouseEvent
       without pointer-event machinery.
    3. screen.getByRole filters out visibility-hidden elements;
       react-flow renders nodes with `visibility: hidden` until
       its layout pass runs (which jsdom never triggers). Switched
       integration tests to container.querySelectorAll
       scoped by data-tree-node-id, matching the PR5a TreeCanvas
       test pattern.

Defects surfaced during TDD

  - Fluent Tooltip's `relationship="label"` is operator-hostile
    in non-interactive renders: the button's only accessible name
    is the tooltip content, but the tooltip content isn't in the
    DOM until hover. Screen readers without hover navigation lose
    the name. Documented inline in actionRail.tsx that the
    `description` relationship is the right choice when buttons
    also carry an explicit aria-label.

  - react-flow's `visibility: hidden` on un-laid-out nodes
    confuses testing-library's role-based queries. Tests using
    role queries inside the canvas need to use the
    data-tree-node-id wrapper-scoped pattern instead. Worth a
    note for PR5d-g test authors.

  - userEvent.click → react-flow pointerdown handler →
    NullPointerException in jsdom. fireEvent.click is the
    workaround for interactive testing inside the canvas.
    Both patterns documented inline at the test callsites.

Verification

  Tests: 952 frontend passing (932 prior + 20 new). Backend
    unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 97.53 / 95.74 / 100 /
    97.5 (was 96.66 / 94.28 / 100 / 96.61 in PR5b.1).
      actionRail.tsx: 100/100/100/100
      actionRail.styles.ts: 100/100/100/100
      actionCallbacksContext.ts: 100/100/100/100
      TreeCanvas.tsx: 100/100/100/100
      nodeCards.tsx: 100/96.66/100/100 (the rail-render branch
        adds a single unmeasured combination — context-null vs
        non-null × per-card; defensible)
      conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71
        (unchanged; never-typed default arm)

Open rubber-duck items still pending (unchanged from PR5b.1)

  - DTO original_prompt_id nullability.
  - Citation-strip discipline (partition.ts + wave.ts + shim.ts
    legacy refs; new code stays clean).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim,
    LockAcquireResult DU, SendCard inline lastError).
  - shim drain loop call-stack serialization.

Next slice

  PR5d — per-edge `+` chip + insert-on-edge popover. Adds a custom
  edgeTypes entry to TreeCanvas, an onEdgeInsert callback to the
  ActionCallbacks surface, and a Fluent Popover with kind-aware
  insert options per the upstream-node-kind contract. Reuses the
  data-tree-node-id wrapper-scoping pattern from PR5b.1 for
  position anchoring.
The per-edge `+` chip + insert popover per the operator-facing
insert-on-edge affordance. Adds a custom react-flow edge component
(InsertEdge) that wraps SmoothStepEdge with a midpoint chip; clicking
the chip opens a Fluent Menu whose options vary by the upstream node's
kind (only legal next-node types render, hiding illegal ones is
cheaper than enabling-with-error). Selecting an option fires the
host-supplied onEdgeInsert callback with a discriminant (EdgeInsertKind)
naming the chosen action.

What ships

  - frontend/src/components/Tree/InsertEdge.tsx
    - Custom react-flow edge component (replaces 'smoothstep' as the
      default edge type for Tree adapter output).
    - Renders BaseEdge for the orthogonal stroke + an absolute-
      positioned chip at the midpoint via EdgeLabelRenderer (with a
      fallback inline render for test environments where the
      EdgeLabelRenderer portal target isn't mounted).
    - Chip suppression rules:
        - onEdgeInsert callback absent → no chip
        - parent kind is `score` (terminal) → no chip
        - parent kind is `fan` (children managed via FanCard +) → no chip
    - menuForParent() builds the kind-aware option list:
        - root_prompt / import_message: Follow-up + Inject + Send
          + Fan attempt + Fan converter (+ V1.1 disabled stubs)
        - user_turn: Send + Append converter + Fan converter
          (+ V1.1 disabled stubs)
        - send: Follow-up + Inject + Score + Fan attempt
          + Fan converter (+ V1.1 disabled stubs)
    - Branded source/target back to ConversationTreeNodeId at the
      onEdgeInsert callback boundary so hosts receive the same brand
      the runner uses everywhere.

  - frontend/src/components/Tree/insertEdge.styles.ts
    - makeStyles for the chip wrapper (absolute + pointer-events:all)
      and the chip button (20px circular Fluent Button).

  - frontend/src/components/Tree/treeEdgeTypes.ts
    - Edge-type registry: `{ insert: InsertEdge }`. Passed to
      ReactFlow's `edgeTypes` prop. Sibling to treeNodeTypes.ts.

  - frontend/src/components/Tree/actionRail.tsx (modified)
    - Added `EdgeInsertKind` exported type (discriminant for
      onEdgeInsert).
    - Added `onEdgeInsert?: (parentId, childId, kind) => void` to
      `ActionCallbacks`. Per-callback opt-in pattern from PR5c
      preserves: undefined hides the chip.

  - frontend/src/components/Tree/conversationTreeToReactFlow.ts (modified)
    - Edge data now carries `parentKind: ConversationTreeNodeKind`
      so InsertEdge can pick the kind-aware menu without a tree
      lookup at render. Built once per adapter call (O(nodes) hash).
    - Edges now emit `type: 'insert'` (was `'smoothstep'`) so
      ReactFlow routes them to the InsertEdge component.
    - Added PLACEHOLDER_WIDTH/HEIGHT (260×80) on every node so the
      reconciler treats nodes as measured ahead of layout. PR5g's
      layout pass overrides positions; the dims unblock edge
      rendering in environments without real ResizeObserver.

  - frontend/src/components/Tree/conversationTreeToReactFlow.test.ts (modified)
    - Renamed the edge-type assertion to reflect the 'insert' type
      (was 'smoothstep' in PR5a).

  - frontend/src/components/Tree/TreeCanvas.tsx (modified)
    - Wired `edgeTypes={treeEdgeTypes}` alongside the existing
      `nodeTypes={treeNodeTypes}` registration.

  - frontend/src/components/Tree/edgeInsert.test.tsx
    - 23 tests across six sections:
        1. chip presence/suppression (callback present/absent,
           context null, score/fan parents)
        2. kind-aware menu options (one test per parent kind +
           V1.1 disabled axes)
        3. callback invocation (one test per kind discriminant
           + disabled-items-don't-fire)
        4. accessibility (aria-label "Insert after X")
        5. adapter parentKind contract (edge data carries source
           kind; fan parents emit parentKind='fan')
        6. registry smoke test (treeEdgeTypes.insert === InsertEdge)

Notable shape decisions

  - Test approach: direct InsertEdge mount inside ReactFlowProvider,
    not TreeCanvas → react-flow → edge render. Reason: react-flow's
    edge layer is gated on full node measurement (handleBounds
    populated via ResizeObserver), which jsdom can't simulate
    cleanly without invasive setupTests changes (DOMMatrixReadOnly
    stub, ResizeObserver fire-on-observe, getBoundingClientRect
    override). Per PR5c's rubber-duck lesson "don't fight react-
    flow internals," the integration is covered by the adapter's
    `edge.type === 'insert'` assertion + the registry's
    `treeEdgeTypes.insert === InsertEdge` assertion; the
    InsertEdge component itself is tested via direct mount.

  - EdgeLabelRenderer fallback. Production wraps the chip in
    EdgeLabelRenderer (portals out of the SVG into a fixed layer
    above the canvas so the HTML chip renders over the SVG path).
    In tests the portal target (`.react-flow__edgelabel-renderer`
    div) doesn't exist — InsertEdge checks `useStore(s =>
    Boolean(s.domNode?.querySelector(...)))` and falls back to
    inline render. Visual is identical in jsdom (no layout); in
    production the portal path is taken.

  - parentKind on edge data, not derived at render. The adapter
    computes parentKind once per edge (O(nodes) hash lookup); the
    InsertEdge consumes data.parentKind without any tree-side
    re-query. Keeps the edge component pure and avoids context
    lookups for what's essentially adapter-state.

  - One callback (onEdgeInsert), one discriminant (EdgeInsertKind).
    The host receives (parentId, childId, kind) and decides how to
    splice the new node. Alternative: one callback per kind
    (`onInsertFollowUp`, `onInsertSend`, etc.). Rejected because
    the host's tree-edit logic is typically a single function
    `insertBetween(parent, child, kindToBuild)` — splitting forces
    seven near-identical wrappers.

  - V1.1 fan axes (`fan_prompt`, `fan_target`) reserve menu slots
    as DISABLED items, not absent. Same operator-facing convention
    as PR5c's Branch-as-subtree button: V1.1 enablement is a state
    flip, not a new affordance appearing. Disabled-stub strings
    use "(coming later)" — no "V1.1" release labels in operator
    copy.

  - PLACEHOLDER_WIDTH/HEIGHT (260×80) on adapter output nodes.
    React-flow won't render edges until source + target nodes
    have measured dimensions; supplying them up-front (via
    node.width/height per the NodeBase interface) lets edges
    render before the ResizeObserver loop completes. Production
    cards report their real size via ResizeObserver on mount;
    these placeholders are the until-then value. PR5g's layout
    pass owns positions, not dimensions.

  - Fan-child edges DO emit parentKind='fan' so InsertEdge
    suppresses the chip even when an operator selects a fan-child
    edge. Adding a chip there would be operator-hostile — the +
    button next to a fan-child edge would compete with the
    FanCard's own `+ Add variant` button.

TDD narrative

  Started with edgeInsert.test.tsx — 23 cases pinning chip
  presence/suppression, the per-parent menu options, callback
  invocation, and the adapter contract. RED: TS2305 on
  EdgeInsertKind + TS2353 on onEdgeInsert (member missing from
  ActionCallbacks).

  Implementation took three corrective passes:
    1. Initial assumption: edge components would render inside
       TreeCanvas via the canvas integration. jsdom + react-flow's
       handleBounds gate kept edges out of the DOM entirely.
       Pivoted to direct-mount tests; covered the canvas-level
       contract via the adapter test + a registry smoke test.
    2. EdgeLabelRenderer's portal target only exists inside
       <ReactFlow>, not the bare ReactFlowProvider used by direct
       mount. Added a portal-target check via useStore + an inline
       render fallback. Production keeps the portal path.
    3. Two type-check failures from the main tsconfig (not the
       test config): readonly fanAxes mismatch with the InsertMenu
       interface (fixed: typed fanAxes as ReadonlyArray); plain
       string source/target passed to a branded-id callback (fixed:
       cast at the callback boundary).

Defects surfaced during TDD

  - jsdom + react-flow edge measurement: the standard
    no-op ResizeObserver mock in setupTests.ts prevents react-flow
    from populating node.internals.handleBounds, which gates
    edge rendering entirely. Tried upgrading the ResizeObserver
    mock to fire on observe(); revealed DOMMatrixReadOnly is also
    absent from jsdom (react-flow's transform-decoder throws);
    reverted. The right pattern is to test components that depend
    on full canvas state via direct mount, not via TreeCanvas.

  - EdgeLabelRenderer's portal target is mounted by <ReactFlow>,
    not ReactFlowProvider. Tests using direct mount need either
    a portal-target fallback in the component (chose this) or a
    full ReactFlow mount with the handleBounds workaround above
    (rejected as too invasive).

  - Main tsconfig is stricter than tsconfig.test.json for branded-
    id types. The test passes plain strings as source/target on
    the synthetic EdgeProps; in production react-flow emits plain
    strings too, so the cast at the callback boundary is the
    permanent shape.

Verification

  Tests: 975 frontend passing (952 prior + 23 new). Backend
    unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 97.14 / 89.87 / 100 /
    97.08 (was 97.53 / 95.74 / 100 / 97.5 in PR5c).
      InsertEdge.tsx: 95.65/83.33/100/95.55 (two uncovered branches
        live in parentLabel() switch fall-throughs that the
        suppress-chip-on-score/fan rule make unreachable — they
        exist for type exhaustiveness)
      insertEdge.styles.ts: 100/100/100/100
      treeEdgeTypes.ts: 100/100/100/100
      actionRail.tsx: 100/100/100/100 (onEdgeInsert added to
        ActionCallbacks; no rail-side change)
      conversationTreeToReactFlow.ts: 90.9/77.77/100/90.47
        (one uncovered branch is the `nodeKindById.get(parentId) ??
        'root_prompt'` fallback for an orphan edge — unreachable
        in a well-formed tree)
      All other modules: 100/100/100/100

Open rubber-duck items still pending (unchanged from PR5c)

  - DTO original_prompt_id nullability.
  - Citation-strip discipline (legacy partition.ts + wave.ts +
    shim.ts refs; new code stays clean).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs.
  - shim drain loop call-stack serialization.

  New from PR5d:
  - Test pattern: TreeCanvas-level integration tests can't drive
    react-flow's edge layer in jsdom. Documented in the
    edgeInsert.test.tsx header so PR5e/PR5f authors take the
    direct-mount approach for any test that depends on edge
    rendering.

Next slice

  PR5e — Fan-Children Stack rendering. FanCard + adapter conspire
  to render N identical-subtree fan children as a single stacked
  card. Builds on the slotIndex field already on edge data (PR5a)
  + parentKind=fan suppression (this PR).
The Fan-Children Stack: when an attempt-axis fan has N >= 2
structurally-identical children, the FanCard renders a single inline
summary ("Send ×10, 9 ✓, 1 ⚠") instead of N separate child cards on
the canvas. Auto-collapse for N > 3 by default; operator-toggleable
via a ⊞/⊟ button on the FanCard. Builds on the slotIndex + parentKind
edge data from PR5a/d and the chip-suppress-on-parent=fan rule
from PR5d.

What ships

  - frontend/src/components/Tree/fanStack.ts
    - `isStackable(tree, fanId)` — predicate: fan kind +
      attempt axis + N >= 2 + all children's subtrees structurally
      identical (recursive shape + kinds; params + state may differ
      per the design's "execution may differ" note).
    - `defaultCollapsedFanIds(tree)` — subset of stackable fans
      with N > 3 (the auto-collapse threshold). Smaller stackable
      fans render expanded by default but can be manually collapsed.
    - `computeStackAggregate(tree, fanId)` → `{ childKind, total,
      byState: Record<NodeState, number> }`. Total + per-state
      counts feed the FanCard's collapsed body.
    - All functions are pure tree-walkers — no react, no DOM, no
      side effects. Indexed via a per-call children-by-parent map
      so the predicate is O(tree-size) even on deeply-nested fans.

  - frontend/src/components/Tree/stackCollapseContext.ts
    - `StackCollapseContext` (React context, default null).
    - `StackCollapseValue = { collapsedFanIds: ReadonlySet<NodeId>;
      toggleStack: (fanId) => void }`.
    - `useStackCollapse()` returns the value or null when no
      provider is mounted — cards rendered outside a TreeCanvas
      (per-card tests) skip the toggle entirely.

  - frontend/src/components/Tree/conversationTreeToReactFlow.ts (modified)
    - New `TreeFlowAdapterOptions { collapsedFanIds? }` parameter.
    - When `collapsedFanIds` is supplied, the adapter:
        (a) walks each collapsed fan's subtree and collects all
            descendant ids (`collectHiddenDescendants` BFS)
        (b) drops those descendants from `nodes` AND drops every
            edge whose source or target is hidden
        (c) attaches `stackedSummary: StackAggregate` to the
            collapsed fan's `data` so FanCard renders the stack
            body
    - The FanNode discriminant's data type widened to
      `{ node: FanNode; stackedSummary?: StackAggregate }`. Backwards-
      compatible: existing per-kind component types are unchanged.
    - Omitted/empty `collapsedFanIds` behaves identically to PR5d
      (no collapse, full tree). The EMPTY_SET sentinel skips the
      filter entirely for the no-op case.

  - frontend/src/components/Tree/nodeCards.tsx (modified)
    - FanCard reads `data.stackedSummary` and renders an inline
      `StackSummaryBody` component when present: shows
      "send ×10" + a status line `4 ✓, 1 ●, 1 ⚠, 5 ⧖` built
      from the aggregate's byState counts.
    - FanCard renders a `StackToggleButton` when
      `useStackCollapse()` returns a non-null value. Button
      icon flips between ArrowMinimizeRegular (currently
      expanded → click to collapse) and ArrowMaximizeRegular
      (currently collapsed → click to expand).
    - aria-label flips ("Collapse to stack" / "Expand stack")
      so the accessible name matches the action the click will
      perform.

  - frontend/src/components/Tree/nodeCards.styles.ts (modified)
    - Added `stackSummary` slot (dashed-border body inside the
      fan card; flex-col with kind label + status line),
      `stackKindLabel` (semibold), `stackStatusLine` (monospace,
      subdued color).

  - frontend/src/components/Tree/TreeCanvas.tsx (modified)
    - Owns the `collapsedFanIds: Set<NodeId>` state via useState,
      seeded from `defaultCollapsedFanIds(tree)`.
    - `lastTreeId` sentinel watches `tree.id`; when the operator
      swaps to a different tree, the collapse state is reseeded
      from that tree's default-collapsed set. (The runner mutates
      trees in place during waves, so we watch the id, not the
      reference.)
    - `toggleStack(fanId)` flips the fan id's membership; passed
      into the context value.
    - StackCollapseContext.Provider wraps ReactFlow alongside the
      existing ActionCallbacksContext.Provider.
    - The adapter is now called with `{ collapsedFanIds }`; the
      useMemo deps include the set so toggles re-adapt.

  - frontend/src/components/Tree/fanStack.test.ts
    - 20 tests across three describes: `isStackable` (positive,
      negative, kind/axis/N edge cases), `defaultCollapsedFanIds`
      (auto-collapse threshold, multi-fan), `computeStackAggregate`
      (mixed states, empty/non-fan defaults).

  - frontend/src/components/Tree/conversationTreeToReactFlow.test.ts (modified)
    - 7 new tests for the `collapsedFanIds` adapter option:
      child filtering, edge filtering, recursive subtree filter,
      stackedSummary attachment, no-summary-when-uncollapsed,
      omitted-options backwards-compat, multi-fan independence.

  - frontend/src/components/Tree/fanStackCanvas.test.tsx
    - 15 tests covering:
        - FanCard summary body rendering (kind × count + status
          line with ✓/●/⚠/⧖ counts)
        - FanCard toggle button presence / context-null hiding
        - toggle click → toggleStack(fanId)
        - aria-label flips between Collapse/Expand
        - TreeCanvas auto-collapses N>3 stackable fans
        - N=3 NOT auto-collapsed (boundary)
        - converter-axis NOT auto-collapsed (predicate excludes)
        - toggle round-trip: collapse → expand → collapse
        - tree-id-change reseeds the collapse state

Notable shape decisions

  - Stack state lives at TreeCanvas, not on the domain tree. The
    collapse decision is a UI affordance, not authoring state —
    mutating ConversationTree to track it would leak through the
    runner contract + persistence layer. TreeCanvas-internal
    state means the host doesn't see it, and a tree-id swap
    correctly reseeds without needing collapse-state migration.

  - Adapter takes `collapsedFanIds` as an OPTION, not a required
    param. Existing callers (PR5a-d tests, any future caller that
    doesn't care about stacks) keep working. Empty set = no-op
    fast path; the adapter doesn't even build the hidden-id set
    when nothing is collapsed.

  - `stackedSummary` lives on `data`, not as a separate prop. The
    adapter attaches it to the fan node it emits; the FanCard
    consumes via NodeProps. This keeps the prop surface for
    cards stable (they all share the `NodeProps<TreeFlowNode>`
    shape) and the summary is automatically available via the
    same discriminated-union narrowing PR5b set up.

  - childKind in StackAggregate is nullable (returns null on
    empty/non-fan). Defensive against the orphan-fan case (a
    FanNode with zero children — which the stackable predicate
    rejects, so this is unreachable in practice, but the type
    allows a clean default).

  - Auto-collapse threshold is N > 3 (matches the design doc).
    Below threshold the stack renders expanded by default but
    the operator CAN collapse manually via the toggle. This
    matches the spec's "Collapse to Stack is auto-applied when
    N>3 and all children are structurally identical; otherwise
    expanded" language.

  - Status line uses ✓ / ● / ⚠ / ⧖ glyphs matching the
    operator-facing convention from the wave-status banner spec
    (PR6 will reuse these). Lumps `failed + cancelled` into the
    ⚠ count because both are operator-visible problem states;
    lumps `draft + edited + stale` into the ⧖ pending count
    because all three mean "not yet executed." Counts that are
    zero are omitted from the status line (no noise like
    "4 ✓, 0 ●, 0 ⚠").

  - Toggle icon flip: ArrowMinimizeRegular when expanded (the
    action is "minimize / collapse"), ArrowMaximizeRegular
    when collapsed (the action is "maximize / expand"). aria-
    label matches: "Collapse to stack" or "Expand stack." The
    icons match the action operators will take, not the current
    state — operators click the button to do a thing, not to
    describe a state.

  - StackSummaryBody renders inside the FanCard body, not as
    a sibling card or a dedicated stack-card node. The spec's
    ASCII art shows the stack summary nested inside the fan
    card's border, which matches this implementation. Avoids
    an extra react-flow node + edge for the stack, which would
    double the canvas DOM cost.

TDD narrative

  Three test files in sequence:
    1. fanStack.test.ts — pure predicate + aggregate (20 tests).
       RED was TS2307 on './fanStack'. Implementation
       straightforward: recursive subtree structural-equality
       walk + per-axis filter.
    2. conversationTreeToReactFlow.test.ts additions (7 tests)
       for the adapter's new option. Implementation involved
       widening TreeFlowNode's fan arm to carry the optional
       summary + adding the descendant-filter pass.
    3. fanStackCanvas.test.tsx (15 tests) — the FanCard and
       TreeCanvas wiring. RED was on the new ⊞/⊟ button +
       stack-summary body (cards don't render either today).
       Implementation: extended FanCard with StackSummaryBody
       + StackToggleButton helpers; TreeCanvas owns the state
       + provider.

  All three suites green on the first implementation run after
  the type-check pass (one lint warning for an unused `styles`
  binding in FanCard — eliminated by deleting the binding).

Defects surfaced during TDD

  - First FanCard implementation pass had both `const styles =
    useNodeCardStyles()` at the FanCard level AND inside the
    helper components, with the parent's binding unused. Lint
    caught it (no-unused-vars). Removed the parent's binding;
    each helper calls the hook itself.

  - Stray .github/workflows/frontend_tests.yml diff appeared in
    git status (a one-character whitespace change someone made
    out-of-band). Reverted before commit so this PR's diff is
    Tree-component-only.

Verification

  Tests: 1017 frontend passing (975 prior + 42 new: 20 fanStack +
    7 adapter + 15 fanStackCanvas). Backend unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 96.91 / 90.47 / 100 /
    98.05 (was 97.14 / 89.87 / 100 / 97.08 in PR5d — branch +
    line coverage both improved).
      fanStack.ts: 95.83/90.9/100/98 (the one uncovered line is
        the orphan-fan `total === 0` early-return in
        computeStackAggregate — unreachable when called via the
        adapter because the predicate filters first)
      stackCollapseContext.ts: 100/100/100/100
      TreeCanvas.tsx: 100/100/100/100
      conversationTreeToReactFlow.ts: 94.73/89.28/100/96
        (improved from 90.9/77.77/100/90.47; the lone uncovered
        branch is the `collapsedFanIds.size === 0` fast-path
        skip of `collectHiddenDescendants` — exercised by every
        existing test but istanbul under-counts default-param
        branches)
      nodeCards.tsx: 98.3/92.3/100/100

Open rubber-duck items still pending (unchanged from PR5d)

  - DTO original_prompt_id nullability.
  - Citation-strip discipline (legacy partition.ts + wave.ts +
    shim.ts refs; new code stays clean).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs.
  - shim drain loop call-stack serialization.

Next slice

  PR5f — Pick / Unpick. The FanCard's MetaRow "pick: slot N"
  display from PR5b lands its interactive twin: clicking a fan
  child (in the expanded view, or a member of the stack summary)
  fires `onPickFanChild(fanId, slotIndex)` and writes
  `FanNode.params.promotedChildSlotIndex`. V1.0 visual is "dim
  non-promoted children to ~40% opacity" per the spec's
  V1.0-simplification of §3.3.
Interactive Pick / Unpick for fan children. Per-child toggle icon on
the action rail (CheckmarkCircle outline = pickable, filled =
currently picked). Click toggles: picking switches to the clicked
slot; clicking the currently-picked icon unpicks (passes null).
When the fan is in the collapsed Stack state (PR5e), the FanCard
renders a "Pick…" dropdown that lists each member by slot + state;
clicking an item picks (or unpicks if already promoted) without
having to expand the stack first.

Visual: the promoted child gets a brand-color outline (similar to
selection); siblings dim to 40% opacity per the V1.0-simplification
of the design's §3.3. V1.0 Pick is visual only — the runner doesn't
read `promotedChildSlotIndex` yet; V1.1+ scopes Refresh and Stack-
edit to the picked attempt.

What ships

  - frontend/src/components/Tree/fanStack.ts (modified)
    - StackAggregate gains a `members: StackMember[]` field built
      in slot-index order from the tree's edges. The collapsed-stack
      Pick popover (PR5f) reads it without doing a tree walk at
      render time. Exported new `StackMember` interface.

  - frontend/src/components/Tree/conversationTreeToReactFlow.ts (modified)
    - New `FanChildInfo` interface (exported): `parentFanId`,
      `slotIndex`, `promoted`, `dimmed`.
    - Every TreeFlowNode kind's data widened with optional
      `fanChildInfo?: FanChildInfo`. Cards consume it; non-fan-
      children carry undefined.
    - Adapter builds a `fanChildIndex: Map<childId, {parentFan,
      slotIndex}>` from tree.edges (one O(edges) pass) then for
      each node looks up the entry + computes
      `promoted = parentFan.params.promotedChildSlotIndex === slot`
      and `dimmed = parentFan.params.promotedChildSlotIndex !== null
      && != slot`. No per-render tree walks.

  - frontend/src/components/Tree/actionRail.tsx (modified)
    - `ActionCallbacks` gains `onPickFanChild?: (fanNodeId,
      slotIndex | null) => void`. Null = unpick.
    - `ActionRailProps` gains optional `fanChildInfo?: {
      parentFanId, slotIndex, promoted }`. When supplied AND
      `onPickFanChild` is wired, the rail renders a
      CheckmarkCircle toggle button (outline = pickable, filled =
      picked). Click:
        - promoted → invokes onPickFanChild(parentFanId, null)
        - not promoted (or sibling picked) → invokes
          onPickFanChild(parentFanId, slotIndex)
    - aria-label flips between "Pick this attempt" and "Unpick this
      attempt" to match the action the click will perform.

  - frontend/src/components/Tree/nodeCards.tsx (modified)
    - CardFrame gains optional `fanChildInfo?: FanChildInfo`
      threaded from every card's `data.fanChildInfo`. CardFrame:
        - emits `data-dimmed` + `data-promoted` attributes on the
          wrapper for DOM scoping
        - applies `frameDimmed` (40% opacity) when dimmed=true
        - applies `framePromoted` (brand-color outline + shadow)
          when promoted=true
        - passes `fanChildInfo` into ActionRail
    - Every card (Root, Import, UserTurn, Send, Fan, Score) now
      forwards `data.fanChildInfo` into CardFrame.
    - FanCard's "pick: slot N" MetaRow gains a `title` attr
      clarifying "Visual focus only. Future releases will scope
      Refresh and Stack-edit to the picked attempt." (per the
      rubber-duck's E directive — sets correct operator expectation
      against the cherry-pick-metaphor disappointment).
    - MetaRow gains an optional `title?: string` prop.
    - New `StackPickButton` helper component (inside nodeCards.tsx):
      Fluent Menu with "Pick…" trigger that lists each stack member
      by slot + state. Currently-picked item shows
      `✓ (picked)` and clicking unpicks. Renders ONLY when the
      fan is collapsed (stack state) AND onPickFanChild is wired.

  - frontend/src/components/Tree/nodeCards.styles.ts (modified)
    - New `frameDimmed` (opacity: 0.4) and `framePromoted`
      (brand-color border + 2px shadow) slots.

  - frontend/src/components/Tree/fanPick.test.tsx (NEW)
    - 25 tests across seven describes:
        1. adapter — fanChildInfo on fan children (4 tests)
           - present on fan children, absent on non-fan children
           - promoted/dimmed flags reflect promotedChildSlotIndex
        2. computeStackAggregate — members list (2 tests)
        3. CardFrame — fan-child dim / promoted (3 tests)
        4. ActionRail — Pick toggle (7 tests)
           - presence / suppression (callback absent, non-fan child,
             callback wired)
           - aria-label "Pick this attempt" vs "Unpick"
           - click semantics (pick own slot; unpick when promoted)
        5. FanCard — pick MetaRow tooltip (2 tests)
        6. FanCard — collapsed-stack Pick popover (5 tests)
           - presence/suppression, menu opens with N items, click
             invokes onPickFanChild, promoted item shows "(picked)"
             + unpicks on click
        7. TreeCanvas — Pick round-trip via per-child icon (2 tests)
           - click invokes callback with slot index
           - promoted child renders data-promoted=true; siblings
             render data-dimmed=true

  - frontend/src/components/Tree/fanStack.test.ts (modified)
    - Two existing strict-equality assertions on
      computeStackAggregate updated to include the new `members`
      field.

Notable shape decisions (with reviewer rationale)

  Per a rubber-duck pre-implementation review:

  - VERDICT FROM REVIEWER: ship-with-these-specific-changes.

  - Toggle on the per-child icon, NOT separate Unpick affordance
    on the FanCard. Reviewer (B): "The operator's mental model is
    'I picked child microsoft#3.' The reversal is 'unpick child microsoft#3,' not
    'go to the parent and clear its pick field.'" Filled-when-
    picked / outline-when-not is the radio-with-clear primitive
    every operator already knows.

  - Collapsed-stack Pick popover NON-NEGOTIABLE per reviewer (C):
    "The most common workflow is run-N-attempts, pick the best.
    Hiding Pick when the stack is collapsed → four clicks per
    decision → feature gets used twice then abandoned." The popover
    is two clicks (open, pick) and keeps the canvas stable.

  - `CheckmarkCircleRegular` / `CheckmarkCircleFilled` glyph pair
    per reviewer (F): "The most boring choice and the most honest
    one: 'this one is selected as the chosen attempt.' Doesn't
    oversell the V1.0 semantics, pairs cleanly as filled/outline
    for toggle state, doesn't collide with the V1.1 SendCard
    stubs (🎯 Change-target, ★ Pin-as-main-path)."

  - Both `promoted` AND `dimmed` on adapter output (D): two flags
    emitted from the adapter, CardFrame reads both. Reviewer:
    "Don't derive promoted at render time from data.dimmed ===
    false && parent.promotedChildSlotIndex !== null; just emit
    both flags from the adapter."

  - MetaRow tooltip with explicit V1.0-visual-only note (E):
    "Pick (V1.0): visual focus only. Future releases will use this
    to scope Refresh and Stack-edit." Sets correct operator
    expectation against the cherry-pick-metaphor disappointment.

  - Single callback `onPickFanChild(fanId, slotIndex | null)`
    instead of per-action pair (Pick/Unpick). Host's tree-edit
    logic is one function; splitting would force two wrappers.
    Null = unpick is the explicit signal.

  - Spec confirms single-select: "Promotion stays single-valued
    per FanNode; branching is the answer when the operator wants
    'but I also want to see what attempt microsoft#7 leads to.'" V1.1
    doesn't change this — gesture model carries forward.

  Other decisions:

  - StackAggregate.members in slot-index order (sorted by adapter)
    so the popover displays attempts in the order operators
    expect ("attempt #0, microsoft#1, microsoft#2…"). Slot indices match what the
    runner uses for hashing and what the FanCard MetaRow displays.

  - StackMember includes `state` so the popover items can show the
    per-attempt status (clean / failed / running). The operator's
    "pick the best" workflow needs to see which attempts succeeded
    before clicking. PR5f shows the state name; PR5g+ may swap to
    a per-state glyph.

  - Auto-clear on promoted-child delete (per reviewer G3) is
    documented as a host-side contract in the onPickFanChild
    JSDoc. The UI doesn't have to know about it — the host's
    tree-mutation layer enforces. If the host violates, the
    FanCard MetaRow shows "pick: slot N" referring to a missing
    slot but doesn't crash.

  - The collapsed-stack popover trigger uses the
    CheckmarkCircleRegular icon AND a "Pick…" label (with
    ellipsis). The ellipsis follows the standard convention that
    a click opens a menu rather than committing an action.

TDD narrative

  Single test file (fanPick.test.tsx) with 25 cases drove the
  whole PR. RED was a stack of TS errors:
    - 'fanChildInfo' missing from data types (every kind variant)
    - 'members' missing from StackAggregate
    - 'onPickFanChild' missing from ActionCallbacks

  Implementation in order:
    1. Extend StackAggregate with members + StackMember
    2. Add FanChildInfo + widen TreeFlowNode data
    3. Build fanChildIndex in adapter + compute per-node info
    4. Add onPickFanChild to ActionCallbacks
    5. Extend ActionRail with fanChildInfo prop + Pick toggle
    6. Extend CardFrame with fanChildInfo (data attrs + classes +
       passthrough to ActionRail)
    7. Thread data.fanChildInfo through every card's CardFrame call
    8. Add StackPickButton helper + wire into FanCard
    9. Add MetaRow title prop + V1.0 tooltip text on FanCard

  All 25 tests green on first run.

  Two existing fanStack.test.ts strict-equality tests broke
  because of the new `members` field. Fixed with one
  multi-replace.

Defects surfaced during TDD

  - The strict `toEqual<StackAggregate>` pattern in pre-existing
    fanStack tests is brittle against future StackAggregate
    extensions. Considered loosening to `toMatchObject` but kept
    strict — the strict shape forces every future contributor to
    explicitly think about what's in the aggregate. Cheap to
    update; loud about new fields.

  - The original PR5f sketch had Unpick as a separate ✕ button on
    the FanCard MetaRow. Reviewer's B finding rerouted to a toggle
    on the per-child icon. The mental-model argument was decisive.

  - The original PR5f sketch hid Pick entirely when the stack was
    collapsed. Reviewer's C finding flagged this as the dominant-
    workflow killer. Added the popover, which costs ~50 LOC and
    saves operators ~3 clicks per Pick decision.

Verification

  Tests: 1042 frontend passing (1017 prior + 25 new). Backend
    unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 96.99 / 91.78 / 100 /
    98.29 (was 96.91 / 90.47 / 100 / 98.05 in PR5e — branch
    coverage improved).
      actionRail.tsx: 94.11/95.45/100/100 (one uncovered line is
        the showPick-false early-return of onPickClick — a no-op
        guard that runs only if React invokes a stale handler)
      conversationTreeToReactFlow.ts: 96/92.1/100/96.92 (the
        uncovered branches are the FanChildInfo orphan-fallback +
        the no-op-when-not-fan-child path)
      fanStack.ts: 96.2/89.58/100/98.24 (unchanged uncovered
        line is the orphan-fan total=0 early-return)
      nodeCards.tsx: 98.55/94.66/100/100 (uncovered branches
        live in defensive null-coalesce on optional props
        — defensible)
      All other modules: 100/100/100/100

Open rubber-duck items still pending (unchanged from PR5e)

  - DTO original_prompt_id nullability.
  - Citation-strip discipline (legacy partition.ts + wave.ts +
    shim.ts refs; new code stays clean).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs.
  - shim drain loop call-stack serialization.

  New from PR5f review (deferred):
  - PR5e Stack expand-on-Pick-attempt vs popover-instead. Reviewer
    flagged that auto-expand-on-Pick-intent would also work; we
    chose the popover instead because it keeps the canvas stable.
    If operators report "I want to see the expanded children when
    I pick," revisit then.
  - Keyboard accelerator for Pick (tabIndex + 'P' shortcut).
    Reviewer noted this is genuinely good for power operators;
    deferred to PR5f.1 or a follow-up sub-PR since it's additive
    to the visible affordance.

Next slice

  PR5g — Buchheim-Walker layout via d3-hierarchy. Overrides the
  PR5a (0,0) placeholder positions with computed coordinates so
  the tree renders top-to-bottom without manual zoom. The layout
  pass also needs to account for the collapsed-stack FanCard
  height (taller when stack summary + Pick popover are present)
  so siblings don't overlap.
V1.0 layout: plain `d3-hierarchy.tree()` over the adapter's node +
edge output. Overrides the (0,0) placeholder positions emitted by
the adapter (PR5a/d) with computed coordinates so the canvas renders
top-down without manual zoom. Main-path pinning is V1.1 and not part
of this PR. The collapsed-fan filter (PR5e) already trims hidden
descendants from the layout input, so d3-hierarchy never sees them.

What ships

  - frontend/src/components/Tree/layoutTree.ts (NEW)
    - `layoutTree(nodes, edges, options?)` → `Map<nodeId, {x, y}>`.
      Pure function over the adapter's output; no React, no DOM, no
      hidden state.
    - Builds the parent→child relation from `edges` rather than from
      each node's domain-level `parentId`. This makes the function
      consume the SAME filtered view react-flow gets — collapsed-fan
      descendants were already dropped by the adapter.
    - Uses `d3-hierarchy.stratify()` to build the hierarchy from the
      flat node + edge lists, then `tree().nodeSize([w, h])` for
      Buchheim-Walker layout with configurable per-node block size.
    - Defaults: horizontalSpacing=220 (matches the card min-width),
      verticalSpacing=140 (generous room for card height + action
      rail + meta rows). Both overridable via the options bag.
    - Defensive fallbacks:
        - empty input → empty Map
        - cycle / orphan-only input (no roots) → every node at (0,0)
          so the canvas doesn't crash; bug surfaces visually
    - `LayoutNode`, `LayoutOptions` types exported for the
      TreeCanvas consumer + future testing.

  - frontend/src/components/Tree/layoutTree.test.ts (NEW)
    - 17 tests across eight describes:
        1. coverage — every node gets a coord; filtered subset only;
           single-node tree; zero-node defensive
        2. top-down orientation — root at smallest y; siblings share y
        3. linear chain — vertical line (every node shares x)
        4. sibling placement — distinct x; middle child centered over
           parent; left-to-right ordering matches insertion
        5. determinism — identical input → identical output
        6. nested subtree separation — sibling subtrees do not collide
        7. configurable spacing — vertical + horizontal scale correctly
        8. TreeCanvas integration probe — produces non-(0,0) coords

  - frontend/src/components/Tree/TreeCanvas.tsx (modified)
    - New `useMemo` over `layoutTree(rawNodes, edges)`. Maps each
      adapter-emitted node through the position lookup; falls back to
      the adapter's placeholder for any node the layout pass didn't
      cover (defensive — should never happen).
    - The memo's deps are the adapter-output references, NOT the
      tree/collapsedFanIds — so a re-render that doesn't change
      shape (e.g., a callback-prop change) doesn't re-layout.

  - frontend/jest.config.ts (modified)
    - Added `moduleNameMapper` entry redirecting `d3-hierarchy` to
      its UMD bundle at /node_modules/d3-hierarchy/dist/. The npm
      package ships ESM as its main entry; ts-jest's transform only
      hits `.tsx?` and Jest's CJS require trips on the ESM `import`
      statements. The UMD bundle works under CJS without any
      transformer. Production (Vite) keeps the ESM path — only the
      jest transform sidesteps it.

  - frontend/package.json (modified)
    - `d3-hierarchy` ^3.1.2 as a runtime dep.
    - `@types/d3-hierarchy` ^3.1.7 as a devDep.
    - One new transitive (d3-hierarchy itself; no further chain).

Notable shape decisions

  - Edges as the parent-relation source. The adapter's node has a
    domain `node.parentId` deeply nested in `data.node`, but the
    edge list is the post-filter view (collapsed-fan descendants
    are absent from BOTH nodes AND edges). Consuming edges keeps
    the layout pass aligned with the adapter's filter rules without
    needing the layout to re-implement them.

  - `tree().nodeSize([w, h])` instead of `tree().size([W, H])`.
    `nodeSize` makes the per-node block fixed and lays out within
    that — the canvas can grow to fit the tree. `size` would scale
    the whole layout to fit a fixed bounding box, which clips on
    large trees. The spec calls for "tight packing"; nodeSize is
    the right primitive.

  - No main-path pinning (V1.1). Per spec §4.3, V1.0 ships layer 2
    (plain Buchheim-Walker) only. Layer 1 (main-path centerline
    pinning) requires the SendCard's ★ Pin icon which is also V1.1.
    Both land together in V1.1 — adding layer 1 here would create
    a V1.0 surface that nothing else V1.0 reaches.

  - No adaptive collapse (the third layer per §4.3). PR5e already
    ships the Fan-Children Stack collapse (the only "adaptive"
    layer V1.0 needs); the adapter pre-filters its descendants
    before this layout pass sees them.

  - Defensive cycle fallback (no roots → every node at origin)
    rather than throwing. The spec doesn't require the canvas to
    crash on a malformed view; visual overlap is more diagnostic
    than a runtime exception that breaks the whole tree view.

  - moduleNameMapper redirect for d3-hierarchy instead of widening
    transformIgnorePatterns. The transform path required adding
    `d3-.*` to the negative-lookahead AND extending the `transform`
    config to cover `.jsx?` (ts-jest doesn't transform JS by
    default). The mapper is one config line and doesn't perturb
    other modules' transform pipeline.

  - Multi-root forest path explicitly dropped. The original sketch
    had a "lay each root out, translate them so they don't overlap"
    loop. The adapter never produces multi-root trees (V1.0 domain
    contract: one root per tree), so the path was dead code. The
    cycle fallback covers the "no roots" case; the impossible
    "multiple roots in a well-formed tree" case is treated
    identically (each root laid out at the origin; visual overlap
    surfaces the malformation).

TDD narrative

  Single test file (layoutTree.test.ts) with 16 cases drove the
  implementation. RED was TS2307 on './layoutTree'.

  Implementation had two pivots:

    1. First implementation pass used `import { stratify, tree }
       from 'd3-hierarchy'`. Jest barfed: d3-hierarchy is ESM-only
       and ts-jest's transform only hits TypeScript files. Tried
       widening transformIgnorePatterns to allow-list `d3-.*` —
       didn't help because ts-jest still doesn't have a `.js`
       transform. Switched to moduleNameMapper redirecting to the
       UMD bundle.

    2. The "asymmetric subtree separation" test as originally
       written expected u2 to sit fully OUTSIDE u1's subtree
       x-range. Wrong: Buchheim-Walker keeps siblings at the same
       depth (u2 at depth 1, u1's grandchildren at depth 4); u2's
       x can sit between u1's grandchildren x-range because they're
       at different y's. Rewrote the test to assert disjoint-x at
       the SAME depth (u1 vs u2 at depth 1).

  Added the malformed-cycle coverage test after the initial
  implementation to push branch coverage on the no-roots fallback
  above the 85% threshold.

Defects surfaced during TDD

  - The "different y means non-colliding" insight from the second
    pivot is important for the V1.1 main-path-pinning layer
    (different y is exactly what main-path will exploit). Worth
    a note for the V1.1 implementer.

  - d3-hierarchy's ESM-only publishing pattern WILL recur every
    time we add a d3-* package. The moduleNameMapper indirection
    is per-package; documented inline so the next d3 package gets
    the same treatment.

  - The first-write of the layout pass took ~1ms for a 60-node
    tree (measured ad-hoc via `console.time` during local
    bring-up). Well within budget; the V1.0 1000-node soft cap
    is a non-issue at this perf level. Worth re-measuring if
    PR6's wave-status banner forces frequent re-layouts.

Verification

  Tests: 1058 frontend passing (1042 prior + 16 new + 1 extra for
    coverage = 17 new layoutTree tests). Backend unchanged.
  Lint: clean.
  Type-check: clean (main + contract).
  Coverage: src/components/Tree directory 96.49 / 90.57 / 100 /
    98.55 (was 96.99 / 91.78 / 100 / 98.29 in PR5f — statements
    + lines slightly higher, branches dipped because of new
    defensive fallbacks).
      layoutTree.ts: 93.33/82.6/100/100 (the uncovered branches
        are defensive guards: visibleIds.has() short-circuits + the
        out.has() dup-suppression — both unreachable in well-formed
        input).
      All other modules: unchanged from PR5f.
  Global coverage gate: passes (no threshold-fail at end of
    coverage run).

Open rubber-duck items still pending (unchanged from PR5f)

  - DTO original_prompt_id nullability.
  - Citation-strip discipline (legacy refs in runner files).
  - CI gate for the 126 latent test type errors.
  - PR1 backward-compat fallback corpus verification.
  - PR4e cancelQueued-race.
  - dispatch.ts branch coverage at 50% (pre-existing).
  - Spec drift docs.
  - shim drain loop call-stack serialization.
  - PR5f deferred items (auto-expand on Pick-attempt, keyboard
    accelerator).

Next slice

  PR5 is now feature-complete (PR5a-g + b.1 review). Per the
  rubber-duck schedule, fire the reviewer on the full UI surface
  (PR5a-g) before starting PR6. PR6 wires:
    - cost-guardrail modal (intercepts Refresh at confirmThreshold)
    - ↻ cost-preview tooltip
    - wave-status ribbon (canvas-level)
    - wave-complete toast with the 5-bucket summary
    - [Retry failed] button + reflog drawer

  Before PR6, the integration question to settle: does the layout
  pass need to re-run on per-leaf state pings (e.g., a Send
  flipping to "running" during a wave)? Today it does NOT — the
  layout memo deps are the adapter output references, and state
  changes don't change shape (so the adapter returns the same
  reference). Verify this stays true when PR6 wires the
  runner-state-sink integration into TreeCanvas.
…hape (PR5h.1 review)

Per PR5 rubber-duck reviewer findings B + D (bundled fix): the adapter
mixed pure shape mapping with render-time policy (collapse filter +
stackedSummary), and the TreeCanvas layout pass re-ran on every tree
ref change. The latter becomes a 60-leaf-wave layout cliff once PR6
wires the runner sink — every state flip would re-layout the canvas.

- `applyStackCollapse(shape, tree, collapsedFanIds): TreeFlowAdapterResult`
  new module owning the Fan-Children Stack collapse policy:
    - filters descendants of collapsed fans + edges into/out of them
    - clones collapsed fan nodes with `stackedSummary` attached (no
      input mutation)
    - identity behaviour: empty `collapsedFanIds` returns the input shape
- `conversationTreeToReactFlow(tree)` simplified to a pure shape pass:
    - `TreeFlowAdapterOptions` interface removed; no `collapsedFanIds`
      option; no `stackedSummary` attachment; no `computeStackAggregate`
      import
    - still attaches `parentKind` per edge and `fanChildInfo` per fan-
      child (these are stable derivations from the input, not UI policy)
- `TreeCanvas` new pipeline: `adapter → applyStackCollapse → layout`,
  with `useShapeMemoizedLayout` hook keying layout on a derived
  shape-key (`nodes.length:ids|edges.length:ids`). Cached positions
  are returned by reference across renders where shape is unchanged.

- `fanChildInfo` stays in the adapter (not in collapse pass). It
  depends on `promotedChildSlotIndex` (per-Pick state), so the
  adapter is NOT shape-reference-stable across Pick clicks. That's
  fine: the layout memo uses a shape-key (string), so even when the
  adapter returns a new ref, layout doesn't re-run unless the
  shape-key changes. Putting `fanChildInfo` in collapse would force a
  deeper split (separate decoration pass) for no operator-observable
  benefit.
- `applyStackCollapse` early-returns the input shape on empty
  `collapsedFanIds` — preserves shape identity for the common case,
  saves a defensive clone-and-filter pass.
- `useShapeMemoizedLayout` uses `useRef`+conditional-assignment-during-
  render rather than `useMemo` over the shape-key. `useMemo` would
  recompute when the underlying `nodes`/`edges` refs change (state
  flips); the ref cache returns the cached Map even when callers pass
  new arrays.
- `hidden as unknown as ReadonlySet<string>` cast at the call site:
  react-flow's `Edge.source/target` are plain strings; `hidden` is
  branded `ConversationTreeNodeId`. Membership is structural at
  runtime; local cast avoids per-call brand annotations.

- RED: created `applyStackCollapse.test.ts` (10 tests covering
  identity, single + multiple fan collapse, descendant filtering,
  edge filtering, `stackedSummary` aggregation, and explicit purity
  on shape and tree). Confirmed TS2307 cannot-find-module via
  `tsc --noEmit -p tsconfig.test.json`.
- GREEN: implemented `applyStackCollapse.ts`. 10/10 pass.
- RED for memoization: added 2 tests to `TreeCanvas.test.tsx`
  spying on `layoutTree`. The "runs once across state-only re-renders"
  test asserts call count stays constant after 2 re-renders with new
  tree refs but same shape (id, kind, edge structure). The "re-runs
  when shape changes" test asserts call count grows when a node is
  added.
- GREEN: refactored `TreeCanvas` to the new pipeline + extracted
  `useShapeMemoizedLayout`. Both memoization tests pass.
- Migrated the moved collapse-option describe block out of
  `conversationTreeToReactFlow.test.ts` (those tests now live in
  `applyStackCollapse.test.ts`). Updated `layoutTree.test.ts`'s one
  callsite that used the old `{ collapsedFanIds }` option to call
  through `applyStackCollapse` instead.

- Initial type-check failure: my `hidden.has(e.source)` and
  `hidden.has(n.id)` calls in `applyStackCollapse.ts` were branded-
  vs-string mismatches (react-flow's `Edge.source` is `string`;
  `hidden` is `ReadonlySet<ConversationTreeNodeId>`). The original
  adapter avoided this because it filtered `tree.edges` directly
  (branded). Resolved with a local `hiddenStrings` view cast at the
  function head, keeping the brand discipline at the boundary.

- `npm test --no-coverage`: 1064 passed, 1064 total (was 1059;
  +10 new applyStackCollapse tests; -7 moved out; +2 new TreeCanvas
  memo tests)
- `npm run type-check`: 0
- `npm run type-check:contract`: 0
- `npm run lint`: 0 warnings
- `npm run test:coverage`:
    - `TreeCanvas.tsx`: 100/90/100/100 (was 100/100/100/100; the
      new hook has a one-branch first-call path uncovered)
    - `applyStackCollapse.ts`: 97.43/95/100/100
    - `conversationTreeToReactFlow.ts`: 95.12/89.47/100/94.59
      (slightly down from 96.49 because the dropped collapse code
      lost some branches it used to cover)
    - `layoutTree.ts`: 93.33/82.6/100/100 (unchanged)
    - All exceed 85/85/90/90 threshold.

PR5h.2: `InsertEdge` discriminant-lie fix — replace `kind: 'fan_attempt'`
typed-literal-for-disabled-items with a discriminated `InsertMenuOption`
union (`{ disabled: false; kind; label }` vs `{ disabled: true; label;
reason }`). Reviewer Finding H#1; ~10 LOC plus tests.

From PR5 review:
- PR5h.2: InsertEdge discriminant safety (H#1)  -- NEXT
- PR5h.3: edgeInsert.test.tsx vacuous-pass guard (J#4)
- PR5h.4: action rail hover-gate CSS (J#6)
- PR5h.5-7: editing affordances (UserTurn Edit + RootPrompt Edit +
  UserTurn Converter palette) -- per user Q1 hybrid decision
- Per-child Pick toggle drift -- amend spec §2.4 end-of-V1.0 doc pass
  (user Q2 = keep + amend)
- V1.1 disabled stubs skipped (user Q3 = c; spec §2.4 amendment)
- `💬 View raw response` dropped from V1.0 (user Q1 hybrid)
- TreeCanvas synchronous-setState-during-render (rev H#2) -- track
- Auto-collapse seed only re-runs on tree.id change, not shape change
  within same tree (rev J#1) -- track for PR6
- `toFlowEdge`'s `parentKind ?? 'root_prompt'` silent default (rev J#2)
- Tooltip `relationship="description"` on icon-only buttons (rev J#3)
- Card decomposition (450 LOC; rev J#5) -- defer to start of PR6a
- Spec drift docs (drain-outside-lock, retry-failed-in-shim,
  LockAcquireResult DU, SendCard inline lastError) -- end-of-V1.0
- shim drain loop call-stack serialization -- PR6 wave-status surface
- Stack-collapse persistence across un-stackable transitions -- PR6
- dispatch.ts 50% branch coverage -- pre-existing PR4c2; not gating
…ty (PR5h.2 review)

Per PR5 reviewer Finding H#1: the V1.1-disabled "Fan out: prompt
(coming later)" and "Fan out: target (coming later)" items each
carried `kind: 'fan_attempt' as const` — a typed lie ("discriminant
is unused on disabled items"). The compiler stayed happy because the
old `InsertMenuOption` shape required `kind: EdgeInsertKind` and
treated `disabled` as optional. When V1.1 adds `'fan_prompt'` and
`'fan_target'` to `EdgeInsertKind`, a flag-flip from `disabled: true`
to `disabled: false` would silently dispatch the wrong axis until
someone notices the literal-`'fan_attempt'` left behind.

## What ships

- `InsertMenuOption` is now exported and a discriminated union:

      type InsertMenuOption =
        | { readonly disabled: false; readonly kind: EdgeInsertKind; readonly label: string }
        | { readonly disabled: true; readonly label: string; readonly disabledReason: string }

  The disabled arm has NO `kind` field. Enabling a disabled item
  forces a same-commit choice of a real `kind`; an editor that
  merely flips the discriminant gets a type error.
- All `menuForParent` callsites updated to construct each option
  with the explicit `disabled: false | true` discriminant. Disabled
  items lose their stale `'fan_attempt' as const` placeholder.
- Render handlers narrow via `if (opt.disabled === false) handleSelect(opt.kind)`
  in both basic and fanAxes loops, replacing the previous truthy-
  short-circuit `() => !opt.disabled && handleSelect(opt.kind)`.
  The explicit narrowing satisfies the discriminated union — the
  short-circuit form does not (TS sees `opt.kind` as possibly absent
  on the disabled arm).

## TDD narrative

- RED: created `insertMenuOption.test.ts` with 5 cases:
    1. disabled item without kind compiles.
    2. enabled item with kind compiles.
    3. disabled item with kind fails to compile (asserted via
       `@ts-expect-error`).
    4. enabled item without kind fails to compile (asserted via
       `@ts-expect-error`).
    5. narrowing: `opt.kind` accessible after `opt.disabled === false`.
  `tsc --noEmit -p tsconfig.test.json` reported TS2459 (InsertMenuOption
  not exported) + TS2578 unused @ts-expect-error directives (because the
  unresolved import collapsed the type to `any`, swallowing the errors
  the directives expected).
- GREEN: refactored `InsertEdge.tsx` to export the union + update all
  construction + render-time narrowing. 5/5 new tests pass.

## Verification

- `npm test --no-coverage`: 1069 passed (was 1064; +5 insertMenuOption)
- `npm run type-check`: 0
- `npm run type-check:contract`: 0
- `npm run lint`: 0 warnings

## Next slice

PR5h.3: strengthen `edgeInsert.test.tsx` "disabled V1.1 items do NOT
invoke onEdgeInsert when clicked" — add `expect(disabledFan).toBeDefined()`
before the click so the test stops passing vacuously when no disabled
item is rendered (reviewer Finding J#4).

## Open rubber-duck items still pending

(unchanged from PR5h.1 commit body; tracking continues to PR5h.7)
…PR5h.3 review)

Per PR5 reviewer Finding J#4: two tests in edgeInsert.test.tsx had
vacuous-pass shapes that would survive a regression removing the V1.1
disabled stubs.

## What ships

- "V1.1 axes (Fan prompt, Fan target) render disabled": the old for-loop
  body fired `expect()` only when an item matched the regex; an empty
  filter result passed the test silently. New shape filters first, asserts
  `v11Items.length >= 2`, then loops.
- "disabled V1.1 fan-axis items do NOT invoke onEdgeInsert when clicked":
  the old `if (disabledFan !== undefined) { ... }` skipped the assertion
  when no disabled item rendered. New shape asserts
  `expect(disabledFan).toBeDefined()` before the click and non-null
  asserts at the click site.

## TDD narrative — proving the strengthening is meaningful

After landing the change, I temporarily stripped both V1.1 disabled
items (the V1_0_FAN_AXES "prompt"/"target" entries + the user_turn
fan-prompt entry) from `InsertEdge.tsx` and re-ran the edge-insert
suite. Both strengthened tests failed honestly:

  Tests: 2 failed, 21 passed, 23 total

  > expect(disabledFan).toBeDefined()
                       ^
  at src/components/Tree/edgeInsert.test.tsx:288:25

  > expect(v11Items.length).toBeGreaterThanOrEqual(2)

Before the strengthening, both would have passed silently with the same
production change.

## Verification

- Reverted the temp strip via `git checkout`; original 23/23 green again.
- `npm test -- --testPathPatterns=edgeInsert`: 23 passed.

## Next slice

PR5h.4: action rail hover-gate CSS — match spec §2.2's "rail floats below
each node card on hover/focus" by adding the visibility-on-hover gate
via `:hover [data-tree-action-rail]` + `[data-selected="true"]
[data-tree-action-rail]` selectors. Reviewer Finding J#6; ~5 LOC + 2
tests (default opacity 0, hover/selected → opacity 1).

## Open rubber-duck items still pending

(unchanged from PR5h.1/h.2 commit bodies; tracking continues to PR5h.7)
…4 review)

Per PR5 reviewer Finding J#6: spec §2.2 says "rail floats below each
node card on hover/focus" but PR5c shipped with `opacity: 1` always
("always-visible until PR5e/PR5f wire the hover behavior"). PR5e/f
landed without that follow-up. Operator-visible drift; cheap to fix.

## What ships

- `actionRail.styles.ts`: rail default `opacity: 0` + 120ms opacity
  transition. The fade is gentle enough to feel intentional (no
  hard-snap on hover) and short enough to not lag a keyboard-walker.
- `nodeCards.styles.ts`: frame style adds three nested selectors via
  Griffel's `&` reference:
    - `&:hover [data-tree-action-rail]` → opacity 1
    - `&:focus-within [data-tree-action-rail]` → opacity 1 (keyboard
      walker hits a rail button → frame's focus-within fires → rail
      stays visible while the button has focus)
    - `&[data-selected="true"] [data-tree-action-rail]` → opacity 1
      (selected card keeps its rail visible regardless of pointer)
- Two new behavioral tests in `actionRail.test.tsx`:
    - default opacity is 0 when card is unselected
    - opacity flips to 1 when frame carries `data-selected="true"`
  jsdom + Griffel honor the attribute-selector branch; `:hover` and
  `:focus-within` are CSS-only and verified via code review +
  Playwright follow-up (jsdom can't simulate either reliably).

## Notable shape decisions

- Hover-gate lives on the FRAME style (parent), not the rail style
  (child). Griffel can't write "if ancestor :hover" from the child;
  the parent's `&:hover [data-tree-action-rail]` selector is the
  only honest expression.
- `transitionProperty: 'opacity'` + `transitionDuration: '120ms'` is
  a visual nicety, not a behavior. It costs nothing in jsdom (no
  transitions fire) and avoids a jarring snap when hovering away.
- Did NOT touch the rail's `[data-tree-action-rail]` attribute or
  any other DOM contract — only the visual opacity changed.

## TDD narrative

- Tests written first: 2 new cases in `actionRail.test.tsx §5`.
  Default-hidden + data-selected-visible. Both fail before the CSS
  change (opacity is `'1'` for both cases under the old `opacity: 1`
  rule).
- GREEN after applying the styles change.

## Verification

- `npm test --no-coverage`: 1071 passed (was 1069; +2 hover-gate)
- `npm run type-check`: 0
- `npm run type-check:contract`: 0
- `npm run lint`: 0 warnings

## Next slice

PR5h.5: UserTurn `✏ Edit text inline` — first of three V1.0 editing
affordances per the Q1 hybrid scope. Inline `<Textarea>` swap on the
user-turn card, sets `node.state = 'edited'`, propagates downstream
via the runner sink (the sink path is already wired through the
ActionCallbacks `onEditParams` slot — actually, no: PR5c shipped
`onEditParams` only as part of the common rail design; the inline-
edit path needs its own callback. Confirm before writing.).

## Open rubber-duck items still pending

PR5h mechanical fixes complete after this. Remaining V1.0 work:
- PR5h.5: UserTurn ✏ Edit text inline
- PR5h.6: RootPrompt ✏ Edit prompt + target + system prompt
- PR5h.7: UserTurn ⚡ Converter palette
- All Q-decision spec amendments (per-child Pick, dropped 💬 raw response,
  no V1.1 stubs) → end-of-V1.0 doc pass
- Other carry-forwards unchanged
spencrr added 14 commits June 15, 2026 13:43
Wires the last PR7-review dead seam — useAutoReverse — to a real
consumer: the History tab's "Open as tree" action (spec §5.12). This
completes the §13.1a/§5.12 host-integration surface.

Per spec line 18 + §5.12, V1.0 "Open as tree" ships linear+converter
reconstruction (multi-conversation fanout detection is V1.1) — so this
path uses useAutoReverse (single-AR linear), distinct from the reload
path's fan-aware reconstruction (PR7g).

What ships:

- useAutoReverse: preserves the AR's conversation_tree_id as the
  reconstructed tree's id (spec §13.1). A V1.0+ AR (label present) keeps
  that id so subsequent reload/refresh stay consistent; a pre-V1.0 AR
  (no label) keeps the freshly-minted id from linearChainFromMessages.

- TreeRunnerHost: new `openFromAttackResultId?` prop drives
  useAutoReverse; an effect pushes the reconstructed tree via
  onTreeChange, gated on the tree id (applied once per reconstructed AR,
  so the inline onTreeChange re-firing each render doesn't re-push).
  `autoReverseApi?` test-injectable defaults to the module attacksApi.

- AttackTable: optional `onOpenAttackAsTree?` → a BranchRegular icon
  button next to the existing "Open attack" button. Renders only when
  the callback is provided (implicitly flag-gated). Threaded through
  AttackHistory.

- App (thin glue): `openTreeFromArId` state + `handleOpenAttackAsTree`
  (sets the AR id + switches to the tree view); passes
  onOpenAttackAsTree to AttackHistory only when treeUiEnabled, and
  openFromAttackResultId to the host.

Notable shape decisions:

- The id-preservation lives in useAutoReverse (which already fetches the
  AR) rather than the host, so the hook is self-consistent and unit-
  tested. Without it, opening a V1.0+ AR would mint a fresh id that
  disagrees with the AR's labels → reload would find no rows and fall to
  greenfield. That's a real bug, fixed here.
- The host reuses its existing onTreeChangeRef mirror (not a render-time
  ref write) for the push effect; lastOpenedTreeIdRef gates re-pushes.
- "Open as tree" intentionally uses linear reconstruction for ALL ARs
  per §5.12 V1.0 scope; the fan-aware path is reserved for in-session
  reload (§9.4.1 / PR7g). Attempt-fan trees opened from History show the
  linear base — acceptable + spec-sanctioned for V1.0.

TDD narrative:

1. useAutoReverse: +2 tests (preserve conversation_tree_id; mint fresh
   when absent). RED → GREEN (override tree.id from ar.labels).
2. TreeRunnerHost: +2 tests (auto-reverse emits tree via onTreeChange;
   no-op when openFromAttackResultId null). RED → GREEN (prop + effect).
3. AttackTable: +2 tests (button absent without callback; present +
   fires, row-open suppressed). RED → GREEN (prop + button).
4. App + AttackHistory: thin glue (untested per the agreed approach).

Verification:

- jest AttackTable|AttackHistory|useAutoReverse|TreeRunnerHost: 91/91.
- npm test: 1374 pass / 70 suites (was 1368).
- type-check / lint clean.

This closes both PR7-review dead seams (onGuardedSwapReady in PR7i.3a,
useAutoReverse here). Remaining: converter + nested fan-aware reload
reconstruction (deferred V1.1 follow-up).
…ter fans (PR7g slice 3)

Extends fan-aware reload (PR7g slice 2 did attempt fans) to a single
root-level `converter` fan, using the flattened topology the operator
chose. Nested / multi-axis fans remain on the honest degraded-banner
path.

Investigation first (Explore subagent, cited): the converter-fan reverse
shape is NOT uniquely determined by the forward tree — V1.0's no-cache
dispatch sends prepended_conversation=[], so the authored `u_above`
level is never persisted. Operator decision: flatten (attach the fan
directly to root); re-execution is equivalent (each slot re-dispatches
its converter). Per-leaf message fetches accepted for V1.0.

What ships:

- autoReverse.reconstructTreeWithFans gains `converterResolver?` +
  `onConverterDivergence?`. When the single fan is a root-level
  converter fan AND a resolver is supplied, it assembles:
    root_prompt(text) → fan(converter, variants) →
      [user_turn(text, converterPipeline=slot converters) → send] × N
  - root/user_turn text comes from the base leaf's first-turn
    ORIGINAL value (the authored prompt, not the converted gibberish).
  - per-slot converters come from reconstructVariantPayloads via the
    resolver (most-frequent consensus; onConverterDivergence per
    disagreeing slot).
  - bails to degraded (returns null → linear) when the base lacks a
    leading user turn or slots aren't contiguous 0..N-1 (no tombstone
    guessing in V1.0).

- useReloadReconstruction: when the leaf set is a single root-level
  converter fan, pre-fetches each member leaf's messages
  (buildConverterResolver) and reads each one's first-user-turn
  converter_identifiers into a resolver keyed by AR id. Non-converter
  shapes skip the N fetches. Surfaces a console.warn (not the degraded
  banner) when reconstructed-but-divergent slots exist.

Notable shape decisions:

- The pure function stays pure: it takes a synchronous resolver; the
  async N-leaf fetch lives in the hook. This keeps reconstructTreeWithFans
  unit-testable without mocking an API.
- Slot contiguity gate (0..N-1) avoids guessing deleted-slot tombstone
  semantics; non-contiguous → degraded.
- Corrected the prior PR7g comment that claimed the base leaf's
  converter "survives via linearChainFromMessages" — it does NOT
  (root_prompt has no converterPipeline field; a first-turn converter
  is dropped by linear promotion). The converter now survives via this
  fan-aware path; the linear-drop for non-fan converted first-turns
  remains a documented V1.x gap (root prompts don't model converters).

TDD narrative:

1. autoReverse: +5 tests — converter fan WITH resolver reconstructs the
   flattened shape; original_value (not converted) used for text;
   per-slot divergence → most-frequent + callback; non-contiguous slots
   degrade; (kept) no-resolver degrades.
2. RED: 3 new tests failed (still degrading); 2 already passed.
3. GREEN: assembleRootConverterFan; 46/46 autoReverse pass.
4. useReloadReconstruction: flipped the old "converter degrades" test to
   "converter reconstructs (no banner) + fetches each member leaf";
   added a nested-fan test to preserve degraded-banner coverage. 55/55.

Verification:

- jest autoReverse|useReloadReconstruction: 55/55 pass.
- npm test: 1379 pass / 70 suites (was 1374).
- type-check / lint clean.

Remaining fan-reconstruction work (deferred V1.1): nested fans,
multi-axis-at-same-position fans, and the linear converter-on-root-prompt
modeling gap.
Code-review fixes:\n- resolve pending cost-guardrail approvals on unmount so waves do not hang behind a vanished modal\n- prevent cost and dirty-edit dialogs from stacking in the shared modal slot\n- cancel active waves on TreeRunnerHost unmount and keep StrictMode/Fast Refresh from closing the live lock manager\n- route Open-as-tree through the dirty-edit guard and suppress stale fragment reload while explicit AR reconstruction is in flight\n- page reload reconstruction with backend-safe limit=100 and fail soft when one converter leaf fetch fails\n\nBrowser-driven fixes:\n- mount Tree View directly when the URL carries conversation_tree_id\n- add React Flow controls and minimap\n- enable local node dragging while keeping ad-hoc connecting disabled and edge tab stops suppressed\n- increase default vertical layout spacing to avoid editor/response-card overlap\n- show assistant response previews on response cards from both reconstruction and live dispatch\n- wire integrated root/user-turn editing with stale propagation\n- make response-card refresh run the subtree so interior stale responses are actionable\n- compose sequential sink mutations so live response previews survive recordExecution -> clean transitions\n\nLive validation:\n- created a benign live attack against OpenAIChatTarget_gpt-4o_rr\n- opened it as a tree, edited root prompt/target, refreshed the response, and verified POST /api/attacks + POST /messages succeeded with the card updating in-session\n- verified dirty-edit guard, response preview rendering, controls/minimap, local dragging, and overlap checks in browser\n\nVerification:\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n- changed suites in-band: 10 passed / 232 tests\n- full Jest: 71 passed / 1403 tests (parallel worker teardown warning remains)\n\nOpen design follow-ups:\n- backend AttackSummary does not expose target_registry_name, so historical tree reload/open cannot recover a refreshable target without operator edit\n- edge handles currently sit visually inside cards; consider moving handles to card exterior in the visual polish pass\n- consider response truncation plus expanded/linear-detail view for long outputs\n- decide whether action rails should always show, remain hover/focus gated, or sit over the node edge
What ships:\n- wires response-card Add follow-up prompt action through TreeRunnerHost\n- wires response-card Fan out response attempts and Fan out converters actions\n- wires the existing edge insert menu callback into host-owned structural reducers\n- adds pure tree reducers for append-child, insert-between, and attempt/converter fan wrapping\n\nBehavior:\n- Add follow-up creates an edited UserTurn under the response\n- Fan out response attempts wraps the response edge in an attempt Fan and adds a fresh stale Send slot\n- Fan out converters wraps the response edge in a converter Fan and adds a new editable UserTurn -> stale Send branch\n- Edge insert menu now mutates the tree instead of remaining presentational\n\nValidation:\n- browser-verified Add follow-up prompt, attempt fan, and converter fan creation on a live tree\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n- changed suites in-band: 11 passed / 262 tests\n- full Jest: 71 passed / 1410 tests (parallel worker teardown warning remains)\n\nStill not wired:\n- converter catalog fetch + UserTurn converter palette host persistence\n- fan-child Pick/Unpick host persistence\n- clone/branch tree action\n- delete node/subtree action\n- open path in linear/chat view\n- refresh cost preview tooltip
What ships:\n- fetches converter instances in App and passes them into TreeRunnerHost\n- wires UserTurn converter palette selections through host state\n- wires fan child Pick/Unpick through host state\n- wires refresh cost preview labels using the subtree estimator that matches the Refresh action\n\nValidation:\n- changed suites in-band: 13 passed / 305 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n\nStill intentionally not wired:\n- clone/branch tree action\n- delete node/subtree action\n- open path in linear/chat view\n- deeper visual polish for edge handles/action rail placement/long-message expansion
What ships:\n- moves React Flow connection handles to the card perimeter instead of visually inside the node content\n- styles handles with Fluent/brand-aware colors and a small outline\n- floats the action rail below the card edge as a compact toolbar while preserving hover/focus/selected reveal behavior\n\nValidation:\n- focused visual/component suites in-band\n- changed suites in-band: 14 passed / 334 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n\nRemaining design follow-ups:\n- decide whether the action rail should eventually always show instead of hover/focus reveal\n- add long-message expand/detail behavior or explicit open-linear path
What ships:\n- wires Clone tree / Branch from here to create a new client-side tree with parentConversationTreeId\n- wires Delete for non-root subtrees\n- wires Open in linear view to a lightweight right-side path drawer\n- adds pure reducers for clone and subtree delete\n\nValidation:\n- changed suites in-band: 14 passed / 340 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n\nRemaining follow-ups:\n- richer long-message expansion/detail behavior\n- fuller branch-from-node semantics beyond whole-tree clone\n- explicit delete confirmation modal if product wants destructive-action confirmation before V1.0
What ships:\n- hides Delete on root prompt cards\n- routes non-root Delete actions through a confirmation dialog\n- preserves the shared modal slot priority with cost and dirty-edit dialogs\n\nValidation:\n- changed suites in-band: 14 passed / 341 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint
What ships:\n- adds Playwright coverage for Tree View greenfield\n- covers History Open as tree reconstruction with response previews\n- covers add follow-up prompt\n- covers response attempt fan creation\n- covers converter fan creation\n\nValidation:\n- VITE_ENABLE_TREE_UI=true playwright test e2e/tree.spec.ts --project mock: 4 passed\n- npm run type-check\n- npm run type-check:contract\n- npm run lint
What ships:\n- themes React Flow minimap, controls, and attribution with Fluent tokens\n- removes white minimap/control chrome in dark mode\n\nBrowser validation:\n- dark mode minimap: rgb(31, 31, 31)\n- dark mode controls: rgb(41, 41, 41)\n- light mode minimap/controls use light neutral surfaces\n\nVerification:\n- TreeCanvas focused tests\n- npm run type-check\n- npm run lint
Captures the next checkpoint after PR7 quality-gate work: target recovery, attempt fan count, branch-from-here subtree semantics, long-response inspection, converter fan UX, auto-layout, action rail/handle polish, delete details, past runs, wave toast, and browser coverage exit criteria.
@spencrr spencrr requested a review from romanlutz June 15, 2026 21:37
@spencrr spencrr closed this Jun 15, 2026
@spencrr spencrr reopened this Jun 15, 2026
@romanlutz

Copy link
Copy Markdown
Contributor
image

Why are there two plus signs? Would it be the same action?

@romanlutz

Copy link
Copy Markdown
Contributor

This could be amazing for TAP!

@romanlutz

Copy link
Copy Markdown
Contributor

Does it work for multimodal content?

@romanlutz

Copy link
Copy Markdown
Contributor

How do I switch back and forth from this to the single chat view?

@spencrr

spencrr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Why are there two plus signs? Would it be the same action?

Currently there are a lot of cases in the UI that really add to the complexity. In this case, we have the (converter/no converter) which really should be collapsed into an option on the converter fan to use/not use the nullconverter. (likely as a checkbox)

@spencrr

spencrr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

This could be amazing for TAP!

Yeah effectively this is manual TAP!! Curious to think about what we can do to adapt a whole scenario to a conversation and be able to play with the internal messages afterward. More thought required...

@spencrr

spencrr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Does it work for multimodal content?

MVP no, but no reason not to.

@spencrr

spencrr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

How do I switch back and forth from this to the single chat view?

Another shortcoming is how the tree is currently persisted (via labels which is very hacky). Ideally the backend just has this data model directly.

My long-term vision here is to unify the UIs into one tab with two splits (one side chat, the other tree) and of course could support hiding the tree view for linear chats or user preference. The current chat split in the tree tab (see on the right side) is not up to feature parity with the chat tab.

@spencrr

spencrr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

How do I switch back and forth from this to the single chat view?

My recommendation for now is to just use the chat split in the tree view for most things and if you cannot do a specific feature, then just fallback the entire conversation to the chat tab.

@spencrr

spencrr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the comments, @romanlutz !

@romanlutz

Copy link
Copy Markdown
Contributor

This could be amazing for TAP!

Yeah effectively this is manual TAP!! Curious to think about what we can do to adapt a whole scenario to a conversation and be able to play with the internal messages afterward. More thought required...

But a scenario implicitly means many different attacks that wouldn't be represented together(?)

@romanlutz

Copy link
Copy Markdown
Contributor

How do I switch back and forth from this to the single chat view?

Another shortcoming is how the tree is currently persisted (via labels which is very hacky). Ideally the backend just has this data model directly.

My long-term vision here is to unify the UIs into one tab with two splits (one side chat, the other tree) and of course could support hiding the tree view for linear chats or user preference. The current chat split in the tree tab (see on the right side) is not up to feature parity with the chat tab.

Labels? The prompt IDs and original prompt IDs are tracked on message pieces, right? And related conversations are tracked on attack result. What else is needed?

@romanlutz

Copy link
Copy Markdown
Contributor

How do I switch back and forth from this to the single chat view?

My recommendation for now is to just use the chat split in the tree view for most things and if you cannot do a specific feature, then just fallback the entire conversation to the chat tab.

Ok but what's the mechanism to switch? Is there a button?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants