[DRAFT] [FEAT]: GUI Conversation Tree UI by spencrr · Pull Request #2014 · microsoft/PyRIT

spencrr · 2026-06-15T21:37:52Z

Draft: Complete Tree UI MVP

Warning

This is more of a reference PR than a good starting point for a real implementation.
There are a lot of dead-end code paths and over-build complexities!

Just try running locally with VITE_ENABLE_TREE_UI=true uv run dev.py

Summary

This PR wraps up the Tree UI MVP as a browser-usable operator experience for opening historical attacks as editable conversation trees, inspecting and extending paths, creating/pruning fan branches, applying converter transforms, and recovering from common edit/refresh mistakes.

The MVP keeps the tree model primarily client-owned while integrating with existing attack history, conversations, targets, converters, and refresh dispatch APIs.

High-Level Concepts

Conversation Trees

The UI represents an attack as a ConversationTree made of typed nodes:

root_prompt
user_turn
converter
send
fan
score
import_message

Edges encode parent/child relationships and fan slot indices. The client reducer owns structural edits, stale propagation, branch/clone semantics, fan creation, prune-to-picked behavior, and local recovery states.

Path Chat

The tree canvas now pairs with a selected-path chat pane. Operators can focus a node/path, inspect the selected path as a linear conversation, add follow-up prompts, and see pending responses without losing the tree context.

Refresh Waves

Refresh runs through the existing runner wave model: the client selects dispatchable sends, partitions clean context versus stale suffix, builds labels, and dispatches through the attack API. The UI now makes wave state and preflight failures easier to understand before a backend call is attempted.

Historical Reconstruction

Opening an attack as a tree reconstructs tree shape from backend conversations and labels. Single-conversation attacks become linear trees; multi-conversation attacks can be merged into path branches. Target registry names are recovered from AttackSummary.target.target_registry_name when available so recovered trees can refresh without empty-target failures.

UI Captures

Merged tree with path chat

Shows the loaded historical attack opened as a merged tree, with the path-chat pane available for focused linear inspection.

Follow-up prompt creates pending response / Attempt fan pruned to picked path

Follow.Up.Prompts.mp4

Converter transform branch

Converters.mp4

Notable Details

Adds target_registry_name to target summary DTOs and frontend mirrors.
Hydrates root target registry names during Open-as-tree and reload reconstruction.
Shows an explicit No target chip and blocks refresh with a clear preflight modal when the target is missing.
Adds configurable attempt fan creation with a validated 2-50 count picker.
Adds true non-root Branch from here semantics: keep root-to-selected path plus selected subtree, exclude unrelated siblings.
Adds converter transform nodes and a ConverterCard, including direct baseline behavior for converter insertion.
Adds path-chat focus actions and visible path chat layout.
Adds prune-to-picked fan recovery.
Improves the detail drawer with full text, copy, execution metadata, target/converter metadata, and latest-error separation.
Updates Tree UI browser coverage, including a new MVP acceptance spec with screenshots.
Updates the V1 shipability plan and preserves PR7 design/review artifacts.

Tests

Validated locally with:

git diff --check
npm run type-check
npm test -- --runInBand --silent --json --outputFile=/tmp/tree-mvp-jest.json ...
- 14 suites passed
- 377 tests passed
uv run pytest tests/unit/backend/test_mappers.py::TestAttackResultToSummary
- 26 passed
VITE_ENABLE_TREE_UI=true npm run test:e2e -- e2e/tree.spec.ts e2e/tree-mvp.spec.ts --project=mock --workers=1
- 9 browser tests passed

Areas For Improvement

Backend Conversation Tree Persistence

The MVP still relies heavily on frontend-owned tree state plus attack labels for reconstruction. A stronger backend model would persist first-class conversation tree records:

conversation_tree
conversation_tree_node
conversation_tree_edge
node kind, params, state, version, and parent ids
fan slot ids and tombstones
parent tree/source conversation lineage
durable layout/manual position metadata if desired

Backend Wave Representation

Waves are currently client-orchestrated and reflected through labels/execution records. A future backend wave model could persist:

wave id
trigger kind
selected node set
operator and operation labels
dispatch start/completion timestamps
per-leaf outcomes
retry/cancel/queued state
parent conversation tree id
generated attack/conversation ids

Richer Reconstruction

Historical reconstruction now covers more MVP paths, but fully faithful reconstruction of nested fan topology, converter lineage, deleted slots, and older attack records should eventually be backend-assisted instead of inferred from labels and conversation messages.

Collaboration And Concurrency

The current model is suitable for a single operator session. Persisted node versions plus optimistic concurrency would make collaborative editing, cross-tab recovery, and stale write detection more reliable.

Backend-Owned Tree Search And Audit

Once trees and waves are first-class backend entities, History could filter and display tree-level state directly: latest wave, failed leaves, parent tree lineage, target lineage, unresolved stale nodes, and branch ancestry.

Appendix

Prompt To Attempt Fan

flowchart TD
    P[Prompt] --> AF{{Attempt fan}}

    AF --> A1[Attempt 1]
    AF --> A2[Attempt 2]
    AF --> A3[Attempt N]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    classDef fanNode fill:#ffffff,stroke:#1f6b84,color:#1f6b84,stroke-width:2px;
    class P,A1,A2,A3 treeNode;
    class AF fanNode;

Prompt To Converter Fan

flowchart TD
    P[Prompt] --> CF{{Converter fan}}

    CF --> C1[Converter 1]
    CF --> C2[Converter 2]

    C1 --> R1[Response 1]
    C2 --> R2[Response 2]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    classDef fanNode fill:#ffffff,stroke:#1f6b84,color:#1f6b84,stroke-width:2px;
    class P,C1,C2,R1,R2 treeNode;
    class CF fanNode;

Prompt To Converter Settings Fan

flowchart TD
    P[Prompt] --> SF{{Converter settings fan}}

    SF --> S1["Base64Converter<br/>encoding = b64encode"]
    SF --> S2["Base64Converter<br/>encoding = urlsafe_b64encode"]
    SF --> S3["Base64Converter<br/>encoding = b64decode"]

    S1 --> R1[Response 1]
    S2 --> R2[Response 2]
    S3 --> R3[Response 3]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    classDef fanNode fill:#ffffff,stroke:#1f6b84,color:#1f6b84,stroke-width:2px;
    class P,S1,S2,S3,R1,R2,R3 treeNode;
    class SF fanNode;

Response Branches Into Follow-Ups

flowchart TD
    P[Prompt] --> R0[Response]

    R0 --> F1[Follow up 1]
    R0 --> F2[Follow up 2]

    F1 --> R1[Response]
    F2 --> R2[Response]

    classDef treeNode fill:#1f6b84,stroke:#0b2f3d,color:#fff,stroke-width:2px;
    class P,R0,F1,F2,R1,R2 treeNode;

Similar Projects

Captures the rev-18 design state across all three GUI tree-view docs before PR1 (backend operator-validation relocation) starts landing implementation. Rev 18 closures (per 01 §0): - Q.S.1 DECIDED: V1.0 ships without intra-wave memoization; Crescendo cost cliff is documented in §1.2 instead. - Q.S.2 DECIDED: operator-as-tag (honor-system); §9.4.5 scaled back to relocation-only, no anonymous-rejection. Preserves the no-labels early-return. - Q.S.3 remains a V1.0 gate item pending the Q.S.4 Crescendo experiment. - Rubber-duck cheap wins: per-leaf ExecutionRecord timing fields, per-WaveEvent emittedAt, version: number on ConversationTreeNodeBase for V2 forward-compat, i18n string-registry V1.0 commitment, FanNode polymorphism honest naming, cost-preview tooltip on the refresh affordance, permanent failure class surfaced distinctly, client-side telemetry-vs-privacy line in §15. - F7 mechanical sweep: backend .py:L<n> citations refreshed. These docs are the contract that PR1-PR7 implement against.

V1.0 tree-UI PR1 per doc/gui/design/01_tree_primitives.md §9.4.5. Problem ------- Today's check at pyrit/backend/services/attack_service.py reads from piece.labels["operator"], which is written by an attack_mappers.py path marked `removed_in="0.16.0"`. After that deprecation lands, the piece-label check silently no-ops and the server-side operator isolation disappears for tree-UI traffic, leaving only the UI posture (client-side, bypassable via direct REST call). Change ------ Relocate the source of truth to `AttackResult.labels["operator"]` so the check survives the 0.16.0 piece-label-write deprecation. The signature changes from `(conversation_id=, request=)` to `(ar=, request=)`: the caller (add_message_async) already has the AR in scope, so passing it directly avoids a duplicate lookup and makes the dependency explicit. Backward-compat fallback ------------------------ When ar.labels has no operator key, fall back to reading from existing piece labels — matches the pre-relocation behavior for legacy ARs that were tagged at the piece level only. Bounded fallback: it dies with the deprecated piece-label write path in 0.16.0. The fallback was chosen over pure relocation per user input: if any production AR exists without an AR-level operator label, pure relocation would silently disable enforcement for those rows. The 3 LOC + 2 tests cost is well below the silent-disablement risk. Honor-system preserved (Q.S.2) ------------------------------- The no-labels early-return is preserved by design. Anonymous requests (request.labels absent / empty / missing "operator" key) pass unchallenged. The operator is a tag the operator picks — not an auth claim. The V1.0 posture defends against accidental mis-attribution and casual cross-operator extensions; motivated bypass is out of scope until V1.1+ multi-operator collaboration revisits. TDD --- Wrote 11 tests in `TestValidateOperatorMatch` + `TestAddMessageOperatorIntegration` covering: - Three honor-system early-returns (no labels, empty labels, no operator key). - Four AR-first paths (match passes; mismatch raises; precedence rule when both sources present; no-enforcement when nothing to compare). - Three backward-compat fallback paths (mismatch raises; match passes; empty everywhere passes). - One integration test through add_message_async confirming the call site is wired with `ar=ar`. All 11 failed for the right reasons against the original implementation (TypeError on `ar=` kwarg; DID NOT RAISE on the AR-only mismatch case). All pass after the relocation. Backend suite green: 652 passed, 4 skipped, 0 regressed. PR sequencing ------------- Per design doc §9.4.5, this PR must merge before the V1.0 GUI PR. The PyRIT version bump that signals "tree-UI safe" is a separate concern tracked by the §9.4.5 startup version gate in the upcoming PR3.

…essagePiece DTO V1.0 tree-UI PR2 per doc/gui/design/01_tree_primitives.md §9.4.4 (b). Problem ------- The response DTO `MessagePiece` at pyrit/backend/models/attacks.py drops two domain fields the V1.0 tree-UI must read at reload time: 1. `original_prompt_id` — the lineage-root piece id, preserved across `Message.duplicate()` so descendants share the same root. The resolver in 03 §4.1 reads it from prepended pieces to keep lineage chains intact when re-prepending clean-prefix history. 2. `converter_identifiers` — the sequential converter pipeline that produced the piece's converted_value. Without it, reload- reconstruction (§9.4.1) renders UserTurnNodes with empty converter pipelines indistinguishable from "no converter ever applied", and the next Refresh silently fires without the operator's authored converters. Also load-bearing for `Fan(axis='converter')` variant-payload reconstruction (§9.3.1) which derives `variants[s].payload.converters` from the fan-child leaf's first user-turn `converter_identifiers`. Change ------ - Add `original_prompt_id: str | None = None` (defaults to None for defensive null-handling, though persisted domain pieces always have a non-null value via `_set_original_prompt_id_default`). - Add `converter_identifiers: list[ComponentIdentifierField] = []` (uses the same annotated alias the domain `MessagePiece` uses; the PlainSerializer flattens each ComponentIdentifier to the wire shape the frontend reads via `ComponentIdentifierField.model_dump()`). - Update `pyrit_messages_to_dto_async` to populate both. The `original_prompt_id` is cast to `str()` since the domain field is `uuid.UUID | None`; the `converter_identifiers` is a defensive `list(...)` copy. Reusing ComponentIdentifierField (rather than defining a parallel ComponentIdentifierDto) keeps the DTO surface honest: the wire shape matches the domain shape's `model_dump()` output, so the round-trip contract is structural rather than maintained-in-two-places. The PR2 contract spelled out in §9.4.4: empty list (not None) means "no converter applied"; the field being declared is what makes that distinguishable from "DTO missing the field" on the TypeScript side. TDD --- Wrote 6 tests: - 4 in `TestPyritMessagesToDto`: exposes original_prompt_id; handles None defensively; exposes converter_identifiers with round-trippable shape; empty list when domain has no converters. - 2 in new `TestMessagePieceDtoDefaults`: direct DTO instantiation asserting defaults `[]` and None, plus a JSON-round-trip via model_dump() proving the frontend gets the flat shape. All 6 failed for the right reasons against the original mapper (KeyError 'original_prompt_id' on dumped DTO; absent field on direct construction). All pass after the DTO + mapper changes. Backend suite green: 658 passed (6 new), 4 skipped, 0 regressed. PR sequencing ------------- Per §9.4.5 PR sequencing: PR2 ships before the GUI PR3 so the V1.0 frontend types can reference these fields. Build-time check (the auto-reverse code reads them; TS fails if absent) is the mandatory enforcement; PR2's landing makes the gate pass.

… DTO fields V1.0 tree-UI PR3a per doc/gui/design/01_tree_primitives.md §9.4.4 (a). The TS-side counterpart of backend PR1 (a22854cb1) + PR2 (d1bbeed0d): extends the frontend API surface to match the new wire shape. Types added / extended ---------------------- - `CreateAttackRequest.prepended_conversation?: PrependedMessageRequest[]` — the runner's central dispatch field; carries clean-prefix history per §4.1 / §3.3. Optional so the existing chat tab's `source_conversation_id` + `cutoff_index` path keeps working unchanged. - `PrependedMessageRequest` (new) — mirrors the backend's PrependedMessageRequest. `role` is the four-value ChatMessageRole literal; multimodal turns bundle multiple pieces into one message. - `ComponentIdentifier` (new) — flat shape from `ComponentIdentifier.model_dump()`; `class_name` + `class_module` + `params` are required (the V1.0 runner reads these for ConverterRef reconstruction in §9.3.1). `hash` / `pyrit_version` / `eval_hash` / `children` are declared optional so the wire payload type-checks regardless of which optional fields the backend chooses to populate. - `BackendMessagePiece.original_prompt_id: string | null` — required on every PR2-or-newer payload; null only on defensive-test inputs. - `BackendMessagePiece.converter_identifiers: ComponentIdentifier[]` — required; empty list = no converter applied (distinguishable from field-missing by being present). Contract tests -------------- `frontend/src/types/treeUi.contract.test.ts` — 13 tests using TS `satisfies` for compile-time shape verification + runtime sanity for the optional-field defaults. The "contract" is that backend payloads matching the documented PR2 shape produce usable typed values; if the frontend types drift, the satisfies clauses fail at compile time. Test infra bug fixed -------------------- Latent bug in tsconfig.test.json: the file `extends` tsconfig.json, which has `"exclude": ["src/**/*.test.ts", ...]`. Per TypeScript's extends semantics, the inherited `exclude` keeps applying unless the child overrides it — so the child's `include` patterns were no-ops and `tsc -p tsconfig.test.json` compiled only jest.config.ts. ts-jest doesn't type-check at run time, so test-file type drift went uncaught indefinitely. Fix: add `"exclude": []` to tsconfig.test.json so the `include` patterns take effect. The PR3a contract tests rely on this: running `npx tsc -p tsconfig.test.json` is what proves the `satisfies` clauses are actually checked. Pre-existing test type errors surfaced -------------------------------------- The fix surfaces pre-existing type errors in other test files (api.test.ts, AttackHistory.test.tsx, services/api.ts) that have been latent since the config bug was introduced. These are intentionally NOT fixed in this PR — they're unrelated to the tree-UI work and need their own focused fix. `npm run type-check` (the existing CI-wired script targeting tsconfig.json) is unchanged and continues to pass. The tree-UI test surface is validated via `npx tsc -p tsconfig.test.json | grep treeUi.contract` (clean). TDD --- Wrote treeUi.contract.test.ts first, ran type-check, watched 9 specific type errors fire ("Property 'X' does not exist on type Y" / "Module has no exported member 'Z'"). Added the types; type errors cleared; 13 runtime tests pass; existing 662 frontend tests unaffected; lint clean; main `npm run type-check` clean. PR sequencing ------------- PR3a is a pure frontend type extension; nothing else consumes the new types yet. PR3b (tree-UI domain types — ConversationTree, ConversationTreeNode, ExecutionRecord, WaveEvent, etc.) will follow, then PR4 (runner core) will consume both PR3a and PR3b types to implement the V1.0 dispatch loop.

V1.0 tree-UI PR3b. Translates the design doc set into TypeScript: type model from doc/gui/design/01_tree_primitives.md §4–§6 / §13 (data model, lifecycle, propagation, workspace, undo) and runner interfaces from doc/gui/design/03_runner.md §2 / §6 (Runner, RunnerStateSink, CostGuardrail, CrossTabLockManager, WaveEvent). What ships ---------- `frontend/src/runner/treeTypes.ts` (single file; PR4 will add `runner.ts` + helpers alongside it): - Branded id types `ConversationTreeId`, `ConversationTreeNodeId` — type-level disambiguation with zero runtime cost. Catches "passed node id where tree id was expected" at compile time. - Lifecycle: `NodeState` (the 7 values from §6.1), `NodeFailureClass` (transient / rate_limited / permanent / blocked per §6.1 + §3.3a), `ApiErrorReason` (the structured `{message; failure_class}` sink reason from §3.3a `_format_api_error`). - Shared types: `PromptDataType`, `ConverterRef` (stored id or inline spec per §4.6), `PieceSpec`, `WaveTriggerKind` (the closed V1.0 + V1.1/V2 enum from §6.2), `ExecutionRecord` (with the rev-18 timing triple `dispatchedAt` / `targetFirstByteAt` / `completedAt`), `ReflogEntry` (per-tree pinned wrapper around immutable ExecutionRecord per §6.5 sharing semantics). - Node taxonomy: `ConversationTreeNodeBase` + six discriminated variants (`RootPromptNode`, `ImportMessageNode`, `UserTurnNode`, `SendNode`, `FanNode`, `ScoreNode`) → `ConversationTreeNode` union discriminated by `kind`. `NodeParams` helper alias for the undo system's snapshot needs. - FanNode surface: `FanAxis` (V1.0 attempt+converter, V1.1+ the four others), `FanVariant` (discriminated union over axis with per-axis payload shape). `promotedChildSlotIndex` + `deletedSlotIndices` tombstone array per §5.1 invariants 2 + 4. - Edge: `ConversationTreeEdge` with the slotIndex fan-discriminator that MUST be in the resolved-input hash per §5.1 microsoft#4. - Undo: `UndoOp` discriminated union per §6.9 with the rev-16 state-snapshot widening (`editParams` carries `priorState` + `priorDescendantStates` so undo restores the §6.3-cascade state, not just the named field; same for `makeCurrent` per §6.7 step 4). - Tree container: `ConversationTree` (with `parentConversationTreeId` / `parentSourceConversationId` / `undoStack`). - Workspace: `Workspace` (V1.0 minimal: `currentTree`, `recentTreeIds`, `settings`) + `WorkspaceSettings` (`reflogCapPerNode`, `confirmThresholdCount`, `suppressConfirmModalThisSession`). - Wave: `WaveEvent` discriminated union over `kind` (start / node_complete / complete / busy / queued / reflog_eviction / operator_tag_required) with required `emittedAt: string` per rev-18 Finding C.1. `complete.summary.failed` bucketed by class per rev-16 Findings 2+3; `blocked` is in-flight-cascade victims; `cancelled` is operator wave-abort; `reflog_evicted` rolls up wave-time evictions. - Runner interfaces: `Runner` (6 entry points: refresh* + cancelWave + cancelQueued + retryFailedNodes), `RunnerStateSink` (the runner's sole React-state mutation surface with the rev-15 reason semantics: string/ApiErrorReason/null/omitted, plus missing-node tolerance), `CostGuardrail`, `CrossTabLockManager` (BroadcastChannel advisory lock per §10.4). `frontend/src/runner/treeTypes.contract.test.ts` — 35 tests using `satisfies` for compile-time shape verification + runtime sanity for each variant. Validates discriminator narrowing for the 6 node kinds (switch-on-kind gives type-safe access to params[…]), the 6 FanVariant axes, the 7 WaveEvent kinds, and the 5 UndoOp kinds. TDD --- Wrote treeTypes.contract.test.ts first (importing 35 symbols from a nonexistent module). Type-check `tsc -p tsconfig.test.json` flagged "Cannot find module './treeTypes'" — the expected red. Created treeTypes.ts with each named symbol; cleared an unused-import slip (stray `./converterRef` reference); type-check returned clean for the treeTypes.contract test surface. Runtime suite: 35 passed. Aggregate frontend: 697 passed (+35), no regression; main type-check + lint clean. Scope discipline ---------------- This commit is type definitions + interface declarations only. No runner implementation; PR4 will land that in `runner.ts` alongside this file and consume both PR3a (API types) and PR3b (domain types). The deliberate split avoids a 4000-line PR4 by getting the type surface reviewer-stable first.

…(PR4a) V1.0 tree-UI PR4a — first slice of the runner. Pure functions over the ConversationTree that compute which leaf Sends are dispatchable for a given wave, plus the retry-failed pre-readiness demotion that makes [Retry failed] waves work. What ships ---------- `frontend/src/runner/readiness.ts`: - `findLeafSends(tree)` — every SendNode with no SendNode descendant. UserTurn / Fan / Score descendants do NOT make a Send interior (per the §2 vocabulary). Orphan Sends (Send with no children) are leaves per 03 §3.2. - `isLeafSend(tree, nodeId)` — predicate counterpart to the above. - `computeReady(tree, S)` — the §3.1 readiness rule literally: leaf Sends in S whose every SEND ancestor has state in {edited, stale, running, clean}. Interior Sends never enter ready (they regenerate as part of their leaf's sequence per §3.2). Failed / cancelled SEND ancestors block the leaf — the rev-15 Finding 4 anti-amplification rule that prevents single-5xx-cascades-to-N- retries against rate-limited targets. - `buildSForTree` / `buildSForSubtree` / `buildSForNode` — S construction per the three refresh scopes from 03 §2.1. - `demoteRetryFailedNodes(tree, S, sink)` — §3.1 step 2b: for `waveTriggerKind === 'retry_failed'` only, flip every S-member {failed, cancelled} node back to `stale` and clear its execution BEFORE the readiness rule runs. Uses the `null` reason sentinel (per 03 §2.2) so `lastError` clears rather than lingers. This is the mechanism that lets the [Retry failed] toast button's wave actually re-dispatch failed leaves — without it the ancestor allowlist would silently exclude them and the wave would no-op. `frontend/src/runner/testHelpers.ts` — shared across all PR4 sub-PRs: - Builders per node kind (`mkRoot`, `mkUserTurn`, `mkSend`, `mkFan`, `mkScore`, `mkImport`) that fill boilerplate (timestamps, empty history, default state) so tests name only the fields under test. - `mkEdge`, `mkTree` for graph construction. `mkTree` derives edges from `parentId` if not supplied; tests needing explicit fan slotIndices pass `overrides.edges`. - `mkExecution` for ExecutionRecord fixtures. - `mkMockSink` — recording sink that captures every call. Helpers `callsOf(method)`, `events()`, `stateChanges(nodeId)` keep test assertions terse. `frontend/src/runner/readiness.test.ts` — 30 tests covering: - findLeafSends: 7 cases including the orphan-Send edge case, fan children, Crescendo-style interior detection, Score-only descendants not making Send interior, multi-depth trees. - computeReady: 11 cases including the rev-15 anti-amplification cases (failed/cancelled ancestor blocks; edited/stale/running/clean ancestor admits), Fan/Score-transparent ancestor walking, the leaf- not-in-S edge case, empty S. - buildSForTree / Subtree / Node: 5 cases including the running/draft state exclusion, subtree scope, single-node scope. - demoteRetryFailedNodes: 5 cases including the null reason sentinel (clears lastError vs leaves it stale), ignoring non-{failed, cancelled} S members, ignoring nodes outside S, and the integration case showing computeReady admits a previously-blocked leaf after demotion. - 1 fan-slot-aware traversal smoke test confirming explicit slotIndex edges don't perturb leaf detection. TDD + scope discipline ---------------------- Tests written first against a nonexistent ./readiness module (TS2307 + implicit-any cascade as the expected red). Implementation made the file resolve and the types narrow; all 30 pass. One real bug caught in the helpers: `as const` on the shared `base()` object made `executionHistory: []` resolve to `readonly []`, unassignable to the mutable `ReflogEntry[]` field shape — fixed in the same PR (since the helper is its own file landing here for the first time). Test infra-bug from PR3a stays surfaced (pre-existing test type errors in unrelated files); the runner directory type-checks clean under `tsc -p tsconfig.test.json`. Aggregate frontend: 727 tests pass (+30), no regression; main type-check + lint clean. Next slice ---------- PR4b: `resolvePathPartition` — the pure function that walks a leaf's root-to-leaf path and partitions Sends into clean prefix (load into prepended_conversation as historical context) and fresh suffix (the N add_message calls). Builds on testHelpers; no dependency on readiness.ts. Will exercise the §5.1 invariant 5 Fan/Score transparency on the path-walk side.

…w (PR4a.1) Three concrete fixes from the post-PR4a rubber-duck pass (Opus 4.7 extra high reasoning). Each addresses a defect that would have bitten PR4b or rotted into bad test hygiene. 1. mkTree: auto-number fan-child edge slotIndex ---------------------------------------------- Before, every derived edge got `slotIndex: 0` — including all children of a FanNode. The readiness tests don't read edges so they passed, but PR4b's `resolvePathPartition` will read `edge.slotIndex` to drive the fan-child variant resolution. Fixtures built via `mkTree` would have handed PR4b bogus shapes (all attempt-fan siblings sharing slot 0, violating the §5.1 slot-stability invariant), producing test failures that look like resolver bugs but are actually fixture bugs. The fix tracks which parent ids are FanNodes and auto-numbers their child edges by ordinal. Non-fan parents stay on slot 0. Tests needing explicit slot indices still pass `overrides.edges`. 2. MockSink: delete stateChanges helper ------------------------------------- The `stateChanges(nodeId)` helper baked in a query shape — "give me the sequence of states for this node" — that tempted tests to assert exact transition sequences (`expect(stateChanges('s')).toEqual(['running', 'clean'])`). That kind of assertion locks the runner into a specific transition order and breaks the moment a legitimate intermediate transition is added. `events()` was also dropped: thin sugar for `callsOf('emitWaveEvent')` with no payoff. Callers can compose `callsOf('setNodeState').filter(...)` when they need a per-node view, which is rare enough to not deserve a helper. Also dropped the unused `ALL_WAVE_TRIGGER_KINDS` re-export (kitchen sink waiting to grow). 3. demoteRetryFailedNodes test: actually compose with computeReady ---------------------------------------------------------------- The previous "integration check" test built TWO trees by hand — one with failed states and one with the desired post-demotion states — then ran computeReady against the second. The demoter could have written 'staale' (typo) and the test would still pass because the hand-built tree had the correct 'stale' state. Test-that-passes, not test-that-proves. The new version runs the demoter against a recording sink, projects the recorded `setNodeState` calls onto a copy of the original tree, then runs computeReady over the projection. If the demoter writes anything but `stale`, the projected tree differs from the hand-built one and computeReady's result diverges. The composition is honest now. Verification: 30 readiness tests pass, 727 frontend tests pass, no regression. lint + type-check clean. Other rubber-duck items not in this commit (filed for later): - Trim contract-test runtime expect boilerplate (PR4a.2, this cycle). - Add narrow CI gate for contract-test type-check (PR4a.3, this cycle). - Tighten DTO original_prompt_id to non-nullable str + add contract test proving the validator guarantee (separate decision; the doc spec'd nullable). - Q.S.1 cost-cliff regression test (lands in PR4c). - Drop new doc citations starting PR4b; full citation strip at end of V1.0.

… (PR4a.2) Two coupled changes from the rubber-duck review. 1. Trim `treeTypes.contract.test.ts` runtime-expect boilerplate ------------------------------------------------------------ The previous file was 694 lines with 35 tests, most matching the shape `const x = { ... } satisfies SomeType; expect(x.field).toBe(...)`. The `satisfies` clause was doing all the real work; the `expect` was asserting a literal typed two lines up — type theater that would never fail unless someone deliberately broke the literal. Rewrote to 11 tests focused on what genuinely catches future bugs: - Four discriminator-narrowing tests (ConversationTreeNode kind, FanVariant axis, WaveEvent kind, UndoOp kind). Each exercises the switch-on-discriminator pattern the runner / UI rely on; lost narrowing here would silently degrade dispatch sites to `any`. - Three default-value contracts where null-vs-absent matters at runtime (SendNode empty params → inherits target; FanNode promotedChildSlotIndex `null` vs `0` for Pick semantics; ExecutionRecord timing-triple nullability for pre-target-call failures). - Three interface-shape stubs (Runner, RunnerStateSink with all three reason shapes, CostGuardrail + CrossTabLockManager). - One forward-compat assertion (WaveTriggerKind admits V1.1+ markers). Plus a short block of type-only assertions (`type _Assert... = X extends Y ? true : never`) for structural drift that runtime tests don't reach (branded id types on ConversationTree, ReflogEntry shape, ConverterRef union, Workspace shape). Net: -614 / +383 lines. Trim ratio ~50%, but the kept tests are genuinely load-bearing. 2. Narrow CI gate for the satisfies clauses ----------------------------------------- PR3a added `"exclude": []` to tsconfig.test.json so the contract tests' satisfies clauses type-check. But CI (frontend_tests.yml) only runs `npx tsc --noEmit` against tsconfig.json — the satisfies clauses were unenforced in CI. Type theater. Per-user direction (narrow gate), added tsconfig.contract.json that includes ONLY the tree-UI contract files + their type dependencies (treeUi.contract.test.ts, treeTypes.contract.test.ts, treeTypes.ts, testHelpers.ts, types/index.ts). Wired: - new `npm run type-check:contract` script for local DX - new CI step `Run tree-UI contract type check` in frontend_tests.yml The pre-existing test type errors that PR3a surfaced (126 errors across 6 component-test files) stay latent — they predate this work and fixing them would balloon scope. They are not gated and not regressed by this change. 3. Real bug found by the trim --------------------------- The new "ExecutionRecord timing triple admits null per-field" test failed against `mkExecution` because `overrides.dispatchedAt ?? ISO_FIXED` collapses explicit `null` overrides into the default ISO value. Fixed mkExecution to use spread-merge (`{ ...defaults, ...overrides }`), which preserves `null` distinct from `undefined`. The kind of bug no operator would have seen until PR4c started constructing failure- path execution records — caught at the right layer. Verification: 703 frontend tests pass (was 727; trim cut 24 trivial tests), lint clean, both type-check + type-check:contract green. Other rubber-duck items still pending: - DTO original_prompt_id nullability tightening (separate concern; the doc spec'd nullable, deferred to a focused discussion). - Q.S.1 cost-cliff regression test (lands in PR4c). - Stop adding new doc citations starting PR4b. - Full citation strip at end of V1.0.

…lker (PR4b) Second slice of the runner. Pure function over the tree: walks a leaf Send's root-to-leaf path and produces the dispatch plan that PR4c will turn into one `create_attack` + N `add_message` calls. What ships ---------- `frontend/src/runner/partition.ts`: - `rootToLeafPath(tree, leafId)` — walks parents back to the root, reverses. Throws if `leafId` is not in the tree. - `isStaleForResolver(send)` — predicate: state in {edited, stale, failed, cancelled} OR execution is null. The null clause is the safety net for failed/cancelled (which null their execution by contract) and for freshly-added draft Sends that have no execution yet. - `resolvePathPartition(tree, leafId)` — the main walker. Output: - `prepended: PrependedMessageRequest[]` — turns from the clean prefix (Sends whose params still match their executions). Each clean Send contributes its input UserTurn message + its stored assistant response message; both load into `prepended_conversation`. - `freshSuffix: FreshSuffixEntry[]` — the first stale Send and everything after, down to the leaf. Each entry is (userTurn, fanVariant, sendNode); PR4c turns each into one `add_message`. - `treePathSegments: Array<[FanAxis, number]>` — (axis, slotIndex) pairs for every Fan ancestor, in topo order. PR4c's _build_labels will JSON-encode this into the `tree_path` AR label. - `target: string` — the resolved target_registry_name (per-Send override wins over root's value). Notable shape decisions ----------------------- - Synthetic UserTurn from Root. When a Send's first non-Fan, non-Score ancestor is the RootPromptNode (i.e., no operator-authored UserTurn between root and Send), the walker synthesizes a UserTurn-shaped object carrying root's text + attachments. Marked with `synthetic: true` so PR4c can detect and avoid double-wrapping; the dispatch code can read uniform fields (role, text, attachments) regardless. - `fanVariant` on FreshSuffixEntry vs. `treePathSegments`. Two different things on purpose: - `treePathSegments` records every Fan ancestor on the path. Used for the `tree_path` label (round-trips fan structure for reload). - `fanVariant` carries the variant the runner needs at dispatch time *for this specific Send*. Set when a Fan sits between the Send's input UserTurn and the Send itself (the attempt-axis- directly-above-Send pattern). Cleared by the intervening UserTurn for the converter-axis-with-per-child-UserTurn pattern, where the variant data lives on the child UserTurn's authored `converterPipeline` (fan-child materialization at create time). Convergent: tree_path is always present; fanVariant is only present when the runner needs it to disambiguate the Send beyond what's already on the input UserTurn. - Clean-prefix assistant message carries `original_prompt_id` per piece but defers the actual piece content to PR4c's piece cache. The partition is pure tree-walking; piece data lives in a separate cache the dispatcher hydrates from `GET /attacks/{id}/messages` at wave start. - ImportMessageNode on the dispatch path throws. V1.0 does not extend imported context via the runner's dispatch loop; if a future caller wires that path, this throw makes the gap loud rather than silent. TDD --- Tests written first against a nonexistent ./partition module (TS2307 + implicit-any cascade). 25 cases covering: - rootToLeafPath: topo order, Fan/Score ancestors included, error on unknown id. - isStaleForResolver: each of {edited, stale, failed, cancelled}, no-execution case, clean-with-execution case, running-with-null. - Single-Send chain (all-stale). - Root-promoted-to-UserTurn (no UT between root and first Send). - systemPrompt → leading system message; absent → no system message. - All-clean upstream + edited leaf: prepended = (user-turn + assistant- response), freshSuffix = [leaf]. - Stale interior + leaf: prefix empty, both stales in freshSuffix. - Clean prefix + stale interior + leaf: prefix loaded, both stales in freshSuffix. - Defensive: failed-with-execution still goes to freshSuffix (state check wins, retries always re-dispatch). - Fan(attempt)-directly-above-Send: input UT is from above the Fan, fanVariant captures (axis, slot), tree_path has one segment. - Fan(converter)-with-per-child-UT: input UT is the per-child UT, fanVariant null (data on child UT's pipeline), tree_path has one segment. - Score ancestor: pass-through; UT and variant unchanged. - Nested fans: tree_path accumulates in topo order. - SendNode target override wins; absence inherits from root. - Throws on non-Send target. - Throws on interior-Send target (the runner's dispatch loop is documented as only calling this for leaves; precondition fails loud). - tree_path produces JSON-serializable shape; empty array for fan- less leaf. One real design clarification surfaced during TDD: the converter-fan test originally asserted `freshSuffix.fanVariant = (converter, slot)`, which contradicts the spec's variant-clearing-on-UserTurn rule. Fixed the test to reflect the right semantics (variant data lives on the materialized child UserTurn's pipeline; fanVariant is null at that Send; tree_path still captures the fan ancestor for label round-trip). Verification: 728 frontend tests pass (+25), no regression. lint, type-check, type-check:contract all clean. Next slice ---------- PR4c will wire the dispatch sequence (one `create_attack` + N `add_message` calls against a mock API client) consuming the partition output, plus `_build_labels` (with the tree_path JSON-encoding from treePathSegments), `_format_api_error` failure classification, the 200-message cap short-circuit, the labels-divergence invariant test, and the Q.S.1 cost-cliff regression test that the rubber-duck flagged.

Pure helpers the dispatcher (PR4c2) calls per leaf. Split out so the orchestrator stays small and these are testable without any API-client mocking. No I/O; no React. What ships ---------- `frontend/src/runner/dispatchHelpers.ts`: - `buildLabels(args)` — the source of the labels-divergence invariant. Produces the Record<string, string> attached to every create_attack and add_message call in one leaf's dispatch sequence. The dispatcher calls this once per dispatch and reuses the result; identical labels across all N+1 calls fall out by construction. Schema: operator (required; hard-asserts non-empty) operation (empty string permitted) conversation_tree_id (stringified) wave_id (uuid from the wave) wave_trigger_kind (the closed-enum value) tree_path (JSON-encoded array of [axis, slot]) parent_conversation_tree_id (OMITTED when null; written only when the tree is a clone) The "omit-on-null" rule for parent_conversation_tree_id is a real contract: writing the empty string would surface a self-parent in History "Open clones of T" — actively wrong. Omission is the honest "no parent" signal. The empty-operator hard-assert is defense-in-depth. The runner's entry-point shim's tag-hygiene gate is the load-bearing check; the assert exists so a future refactor that bypasses the gate panics rather than silently writing operator:'' ARs (which destroy audit attribution). - `parseTreePathLabel(label)` + `isTreePathLabelValid(label)` — JSON decoder + validator for the tree_path round-trip. Fail-soft on malformed input (empty array) so older clients encountering a future encoding don't hard-crash. The validator is for tests / defensive code paths that want to distinguish well-formed empty from malformed. - `formatApiError(error, callName)` — failure-class classification. Maps an ApiError (already-normalized by services/errors.ts) into one of the four NodeFailureClass values: transient : 5xx, network, timeout (auto-retry-eligible) rate_limited : HTTP 429 + provider-specific (Anthropic 529, detail-body strings 'rate_limit_exceeded' or 'overloaded_error'). Retry-gated in UX until the operator manually re-triggers. permanent : 4xx other than 429. Operator-fix required. blocked : runner-synthesized for in-flight cascade victims; NOT produced by formatApiError (cascade lives in the dispatcher, PR4d). Defaults to `transient` for unclassifiable shapes. A wrongly- classified transient gives an unhelpful but harmless Retry click; a wrongly-classified permanent silently locks the operator out of recovery. Safer default. Provider-specific detection lives in a small private registry. The V1.x plan is to push detection to the backend (which knows each target's provider) once token-bucket throttling lands; until then, client-side registry keeps the runner self-contained. TDD --- Tests written first against a nonexistent ./dispatchHelpers module (TS2307 expected). 26 cases covering: - buildLabels: required keys present, treePath JSON encoded, operation='' permitted, parent_conversation_tree_id omitted-on-null vs written, empty-operator throws, identical inputs produce deep-equal labels (the divergence invariant source). - tree_path encoding round-trip: build → parse → equal; '[]' (not absent) for fan-less; fail-soft on absent/empty/malformed/ wrong-typed input. - formatApiError: transient (network, timeout, 5xx), rate_limited (429, 529, detail-body strings), permanent (4xx, operator-mismatch body, 401/403), unknown-status defaults to transient, message shape includes callName/status/detail. 26 pass first try (one lint nit fixed: unused NodeFailureClass type import — the value-returning function uses the inferred ApiErrorReason type, not the underlying class union). Aggregate frontend: 754 tests pass (+26), no regression. lint, type-check, type-check:contract all clean. Next slice ---------- PR4c2: the dispatch orchestrator. Consumes the partition output (PR4b) + these helpers + a mocked API client to drive one leaf's `create_attack` + N `add_message` sequence. Tests the 200-cap short-circuit, the labels-divergence invariant at the call site, the Q.S.1 cost-cliff regression (60-leaf attempt-fan with 10-deep shared stale prefix = 600 add_message calls under V1.0's no- intra-wave-memoization rule), and the mid-chain partial-commit failure semantics.

…ssages (PR4c2) The heart of one leaf's HTTP lifecycle. Consumes PR4b's resolvePathPartition output + PR4c1's helpers + a mocked-or-real RunnerAttacksApi to drive the backend interaction. The only place in the runner that talks to the API. What ships ---------- `frontend/src/runner/dispatch.ts`: - `RunnerAttacksApi` interface — the 2-method slice of the existing `attacksApi` the runner depends on. Production wires this to services/api.ts; tests pass a recording mock. - `dispatchLeaf(args)` — orchestrates one leaf: 1. Defense-in-depth: throws on empty operator (the entry-point shim's tag-hygiene gate is upstream; this fires if a future refactor bypasses the gate). 2. Calls `resolvePathPartition` to compute clean prefix + fresh suffix + tree_path segments + resolved target. 3. 200-message cap short-circuit. Fails the leaf with a permanent failure-class and a recovery-pointing message; NO backend call fires. Recovery is branch-from-midpoint. 4. Builds labels ONCE (the labels-divergence invariant source). 5. Marks every fresh-suffix Send `running` atomically so siblings observing in-flight state see them together. 6. `create_attack` with prepended_conversation + labels. 7. For each fresh-suffix entry, `add_message` (send=true) with the same labels + the resolved converter_ids; extracts new assistant pieces by turn_number diff; records ExecutionRecord on the Send; flips to `clean`. 8. On API error mid-sequence: failing Send → failed (execution cleared); later Sends roll back to `stale` (executions cleared); earlier successful Sends keep their clean state + executions. - `LeafDispatchOutcome` discriminated union: `{ kind: 'success', leafId, callsIssued }` `{ kind: 'failed', leafId, failedNodeId, failureClass, partialAttackResultId }` The cascade-on-failure layer (PR4d) will consume this to drop sibling leaves whose paths include the failed ancestor. Notable shape decisions ----------------------- - `asApiError` private helper. The shared `toApiError` from services/ errors normalizes axios + Error + string throws but treats an already-normalized ApiError as an unknown object (falling into the "anything else" branch that loses the status code). The dispatcher catches both raw axios errors (production) and pre-normalized ApiErrors (tests, upstream layers that re-throw). The passthrough fixes both without modifying the shared helper. - New pieces extracted via turn_number diff. `AddMessageResponse.messages.messages` carries the ENTIRE conversation; the dispatcher holds a per-sequence `priorMaxTurnNumber` watermark and collects assistant pieces from turns strictly above it. Bounded O(messages-per-AR) per call, hard-capped at 200 by the backend; cheap. - ExecutionRecord built in-runner. The runner mints the executionId via `crypto.randomUUID()` (with a Math.random() fallback for very old browsers); records dispatchedAt / targetFirstByteAt / completedAt as the same timestamp (single-turn synchronous dispatch; full per-call timing is a streaming-target enhancement out of V1.0 scope). - The 200-cap is enforced on `prepended_conversation` ONLY, matching the backend Pydantic max_length=200. add_messages extend the AR's conversation past 200 messages cleanly; only the clean-prefix load trips the cap. TDD --- Tests written first against a nonexistent ./dispatch module. 16 tests: - Happy path (single-Send chain): 1 create_attack + 1 add_message; correct request shapes; running→clean transition; ExecutionRecord carries waveId/waveTriggerKind/AR id/conversation id/pieceIds. - Multi-Send chain with clean prefix: prepended carries (user, assistant) pairs from each clean Send; one add_message per stale Send in path order; clean Sends are untouched. - Multi-stale chain: each stale Send becomes its own add_message. - Labels-divergence invariant at the CALL SITE: 4-stale chain produces 5 requests; all five labels dicts are deep-equal. Plus parent_- conversation_tree_id written only on clones. - tree_path label populated from fan ancestors (axis, slot). - 200-cap short-circuit: 101 clean Sends → 202 prepended turns; leaf fails with permanent class and recovery-pointing message; no backend call fires. - create_attack failure: every fresh-suffix Send fails; no add_message. - Mid-chain add_message failure: failed Send → failed; later Sends → stale; earlier Sends stay clean; partialAttackResultId surfaced. - Classification round-trip: 429 → rate_limited; 400+operator-mismatch → permanent. - Tag-hygiene defense-in-depth: empty operator throws synchronously. - The Q.S.1 cost-cliff REGRESSION: 60 sibling leaves with a 10-deep shared stale prefix produce 60 create_attacks + 660 add_messages = 720 total calls. Pins V1.0's no-intra-wave-memoization invariant via test rather than absence-of-code; a well-meaning future contributor who adds memoization sees this drop to ~71 and the assertion fires loudly. - Outcome shape: success carries leafId + callsIssued; failure carries failedNodeId + failureClass + partialAttackResultId (null when create_attack failed before any AR was created). Three real defects surfaced during TDD: 1. `toApiError` doesn't recognize already-normalized ApiError throws → all classification tests returned 'transient'. Fixed via asApiError passthrough. 2. The Q.S.1 test originally had `Fan → Send` directly (no UserTurn between), which violates the §5.1 microsoft#5 invariant and makes the resolver throw. Real test bug: rewrote to put a UserTurn above the Fan (the correct attempt-fan shape where all siblings share an input UT). 3. mkTree's auto-numbered fan-child edges (PR4a.1) earn their keep here: without that fix, all 60 fan-children would have shared slot 0 and the partition's tree_path would have been wrong. Verification: 770 frontend tests pass (+16), no regression. lint, type-check, type-check:contract all clean. Next slice ---------- PR4d: the in-flight cascade. When `dispatchLeaf` returns `kind: 'failed'`, the dispatch loop drops every sibling leaf in `ready` whose root-to-leaf path includes the failed Send → `stale` with `failure_class: 'blocked'`. Plus `cancelWave(treeId)` (flips a per-wave flag at `ready.popNext()` boundaries) and `cancelQueued(treeId)` (drops queued waves without aborting the active one).

Wraps PR4c2's per-leaf dispatcher in the wave-level loop. Owns: - the concurrency cap (maxParallel, default 4) - in-flight cascade: when a leaf's dispatch fails on a shared interior Send, not-yet-dispatched siblings drop to `blocked` rather than independently retrying the same failure - operator cancellation via a per-wave controller checked at each ready-pop boundary; in-flight HTTP completes (V1.0 UI-level cancel contract), not-yet-dispatched leaves transition to `cancelled` - wave-event emission (start with estimated call count, one node_complete per dispatched leaf, complete with bucketed summary) What ships ---------- `frontend/src/runner/wave.ts`: - `WaveDispatchController` + `createWaveController()` — factory for the per-wave cancel handle. PR4e's entry-point shim creates one per wave and registers it in the active-wave map so `runner.cancelWave(treeId)` can flip it. - `WaveSummary` — the wave-complete tally shape. Mirrors `WaveEvent.complete.summary` (succeeded, failed bucketed by class, blocked, cancelled, reflog_evicted) and is also the return value of runWave so callers can read it without subscribing to the event stream. - `runWave(args)` — the loop: 1. Compute initial ready set via PR4a's computeReady; estimate calls upfront for the start event. 2. Emit `start` event with estimatedCalls + wave metadata. 3. Drain ready into inflight up to maxParallel; wait on Promise.race; on each completion tally outcome + emit node_complete; on failure run the cascade. 4. On cancellation: skip further picks; let in-flight complete naturally (their natural outcomes are tallied, not clobbered with `cancelled` — the execution-clobber gate). 5. Cancel-tally: leaves still in `remaining` after the loop exits via cancellation get marked cancelled in the sink + outcome map. 6. Emit `complete` event with the bucketed summary; return it. Notable shape decisions ----------------------- - Cascade is "drop from ready", not "drop from running". When a leaf's dispatch fails on an interior Send, only siblings still in the remaining-set get blocked. Siblings already in flight continue and may independently fail on the same Send (counted as their own per-leaf transient failures, not as cascade-blocked). - The cascade walks `remaining`, not the whole tree. Each leaf in `remaining` is checked against the failedSendId via rootToLeafPath containment. Bounded by remaining-leaf count × path depth; cheap at V1.0 sizes. - Cancel uses a controller flag, not an AbortController. JS native AbortController is bigger surface than needed and doesn't compose naturally with `dispatchLeaf`'s in-flight contract (which is "let the HTTP complete"). The controller's role is purely "don't pick more leaves"; in-flight ones decide their own fate. - The cancel-tally step uses `'transient'` as the placeholder failure_class in lastError because NodeFailureClass doesn't carry a 'cancelled' variant (cancelled is a NodeState, not a failure class). The tally bucket separately accounts for cancelled leaves; the failure_class in lastError is informational only for cancelled state. - The summary is returned directly AND emitted via the complete event. PR4e's shim will use the return value; the UI subscribes to the event stream. Both paths see the same data; no divergence risk because there's one source of truth (the outcomes map). Test scaffolding ---------------- A controllable mock API client (`mkControllableApi`) gives tests precise control over per-leaf timing via deferred Promise resolutions (`releaseNext` / `failNext` / `pendingCount`). The wave loop is deterministic given the resolution order; tests use a poll-based `waitFor` helper to wait for specific predicates rather than fixed microtask-flush counts (which proved racy when the dispatchLeaf chain has more await hops than the flush counter — caught and fixed during this PR's TDD cycle). TDD --- Tests written first against a nonexistent ./wave module (TS2307 + missing-module cascade). 16 tests covering: - Empty S: no dispatches; zero summary; start + complete only. - Single leaf happy path: one dispatch; summary.succeeded=1; events in order (start → node_complete → complete) with the right metadata (waveId, triggerKind, estimatedCalls, treeId). - 3-leaf fan all succeed: summary.succeeded=3. - Concurrency cap: maxParallel=2 with 5 ready leaves; inflight.max=2 throughout the wave. - Default maxParallel=4 when omitted. - Wave-event ordering: start first; one node_complete per leaf with 'success' or 'failure' outcome; complete last with bucketed summary. - In-flight cascade (maxParallel=1, shared interior Send fails): sibling leaves drop to `blocked` with failure_class='blocked' on lastError; no further API calls fire; summary 1 failed + 2 blocked. - In-flight cascade ONLY blocks not-yet-dispatched siblings: with maxParallel=3 and 3 sibling leaves all dispatching, each leaf's failure is independent; no blocked count; summary 3 failed. - Mixed cascade across two fan subtrees sharing one ancestor Send: cascade walks across fan boundaries; both subtrees' siblings get blocked. - Cancel before any dispatch: all leaves cancelled; no API calls. - Cancel mid-wave: in-flight completes; remaining → cancelled. - Execution-clobber gate: in-flight leaves that complete AFTER cancel still record their executions; tallied as succeeded, not cancelled. - Controller omitted defaults to never-cancelled. - Summary failure-class bucketing across mixed outcomes (1 each of transient + rate_limited + permanent). Two real test-infrastructure defects surfaced during TDD: 1. `flushMicrotasks(4)` was too few hops for the dispatchLeaf chain's await sequence (await createAttack → await addMessage → record → wrap → race → loop → pick next → await createAttack → await addMessage). Bumped default to 32 and added a poll-based `waitFor` helper for tests that need a specific predicate satisfaction. 2. The original concurrency-cap test tried to interleave releases and pending-count assertions, which raced the loop. Simplified to drain all 5 deferreds + assert inflight.max at the end — the invariant the test was actually trying to prove. Verification: 786 frontend tests pass (+16), no regression. lint, type-check, type-check:contract all clean. Next slice ---------- PR4e: 5-step entry-point shim. Wraps runWave with: 1. Tag-hygiene gate (operator non-empty) 2. Cross-tab lock acquire (mock for now; real BroadcastChannel in PR4f) 3. Cost guardrail modal 4. Per-tree wave queue check 5. Wave start + try/finally lock release Plus the active-wave map so `runner.cancelWave(treeId)` can lookup the controller created by the shim. Then PR4f wires the real BroadcastChannel lock + queue drain, completing the runner.

…V1.0 (PR4d.1) Critical fix. Rubber-duck reviewer caught: partition.ts's clean-prefix branch was writing the literal placeholder string `'<deferred: resolved from piece cache>'` as the `original_value` of every assistant piece loaded into prepended_conversation. The piece cache the design assumes (per 03 §3.3a `_load_piece_as_request`) doesn't exist. Every clean- prefix dispatch in production would send fabricated assistant history to the LLM — either backend-rejected (validation) or model-accepted and reasoned-against (silently corrupting target responses). Tests passed because no assertion looked at `pieces[i].original_value`. The fix ------- Two options were viable: A. Build the piece cache (~150 LOC: new module, wave-start integration, GET /attacks/{id}/messages per distinct source AR, cache by piece_id, partition reads from cache). Closes the design's contract; preserves the per-leaf-edit cost optimum. B. Drop the clean-prefix branch entirely. Every Send on the path enters freshSuffix and re-fires against the target. ~10 LOC change. Operators pay full re-dispatch cost on every wave; correctness is restored because every assistant message the target sees was actually generated by the target. V1.0 ships option B per operator's explicit choice — aligns with Q.S.1 "dumb but correct" V1.0 discipline. The clean-prefix optimization returns in V1.x with the piece cache. Operator-visible cost: editing only a leaf at the bottom of a 10-deep clean chain now costs 11 calls (1 create_attack + 11 add_messages) instead of the 2 calls the clean-prefix model would have produced. ~5× hot-path regression on edit-leaf-only. The wave-summary's call count reflects this honestly; the cost-guardrail modal intercepts expensive refreshes. What ships ---------- - `partition.ts`: clean-prefix branch removed. Every Send on the path enters freshSuffix unconditionally. `prepended` carries at most the system message (when RootPromptNode.params.systemPrompt is set). Dead helpers deleted: `userTurnMessage`, `assistantResponseMessage`, the placeholder-string code. File-header docstring rewritten to document V1.0 reality and the V1.x migration path. - `isStaleForResolver` kept but re-documented: it's no longer on the resolver's hot path under V1.0; retained for defensive callers (UI cost preview, V1.x cache layer). - `partition.test.ts`: three test cases updated to assert V1.0 behavior ("all-clean upstream", "clean Send + stale interior", "failed Send with execution"). All now expect empty prepended + every Send in freshSuffix. The implementation defect that the original tests silently tolerated (placeholder string in original_value) is now unreachable. - `dispatch.test.ts`: "multi-Send chain with clean prefix" rewritten to assert V1.0's empty-prepended behavior + N add_messages for the full chain. "200-message cap" rewritten — the cap is unreachable in V1.0 normal traffic (prepended ≤ 1), so the test now documents contract (100-deep chain dispatches successfully because prepended is empty) + leaves a placeholder for the V1.x cache-layer expansion. Design doc updates ------------------ - `01 §1.2 known limitations` gains a new entry: "No clean-prefix optimization in V1.0 — every dispatch re-fires the full chain from the root." Spells out the ~5× hot-path cost, the V1.x migration, and the correctness rationale. Updates the 200-turn-cap entry to note the cap is unreachable in V1.0 by construction. - `03 §4.1` gains a "V1.0 implementation reality" subsection before the resolver pseudocode. Names the two options considered, the choice rationale, the cost trade-off, and the V1.x migration shape. The pseudocode below is left intact as the eventual V1 model. Verification: 787 frontend tests pass, no regression. lint, type-check, type-check:contract all clean. The Q.S.1 cost-cliff test in dispatch.test.ts remains green at 720 backend calls per 60-leaf attempt-fan wave — the shared prefix was already stale, so it was already in freshSuffix under the prior buggy behavior; no math change. The rubber-duck's other items (asApiError move, synthetic discriminator, dead helper deletion, cancel-tally reason, reflog_evicted TODO, dispatcher tag-gate redundancy) land in PR4d.2 to keep this commit focused on the single load-bearing fix.

Six smaller items from the rubber-duck review bundled into one focused cleanup. None are behavior changes; each is a layer / discipline fix. 1. Move ApiError pass-through to services/errors.ts ------------------------------------------------ The dispatcher had a private `asApiError` helper that duck-checked the ApiError shape and passed it through `toApiError`. Reviewer caught: the shape is OWNED by services/errors.ts; the dispatcher was re-implementing a check at the wrong layer. Moved the passthrough into `toApiError` itself as an early-return: any already-normalized ApiError gets returned verbatim (idempotent). Two new test cases in errors.test.ts pin (a) the referential-equality pass-through and (b) that plain objects lacking the ApiError shape still fall into the unknown-throw branch. The dispatcher's `asApiError` + the private `isAlreadyApiError` helper are deleted; dispatcher catches just call `toApiError(raw)` now. 2. Synthetic UserTurn → real discriminator ---------------------------------------- `SyntheticUserTurnFromRoot` carried a `synthetic: true` field that the dispatcher's `piecesForUserTurn` / `resolvedConverterIds` ducked against via `(ut as { synthetic?: boolean }).synthetic === true`. That's a runtime check across a declared union — the worst of both worlds. Fixed: the synthetic shape now uses `kind: 'synthetic_user_turn_from_root'`, which matches the `kind` field on real `UserTurnNode` (`'user_turn'`), making the union a real discriminated union. Consumers narrow via `if (ut.kind === 'synthetic_user_turn_from_root') ...` and TypeScript narrows the rest. All `as`-casts deleted. 3. Delete dead helpers -------------------- - `parseTreePathLabel` and `isTreePathLabelValid` in dispatchHelpers.ts: no production callers; the reviver was speculative for unrealized future needs. Tests for both also deleted; one kept-and-rewritten test pins `buildLabels` output shape ('[]' for fan-less leaves, JSON-encoded for nested fans) — that's the actual contract. - `cryptoRandomUuid` Math.random() fallback in dispatch.ts: jsdom + modern Node both have `crypto.randomUUID`; the fallback was an unreachable RNG dependency adding coverage gaps. Replaced with a direct call. 4. Cancel-tally honest reason --------------------------- `runWave`'s cancel-tally was writing `{ message: '...', failure_class: 'transient' }` on cancelled leaves' `lastError`. State='cancelled' is what consumers actually read; the structured failure_class on a not-failed node was confusing. Switched to a plain string reason `'wave cancelled by operator'`. The sink normalizes string reasons to transient internally, but consumers reading lastError.failure_class in a cancelled-state node now see the documented "string was passed" path rather than a hand-coded structured form. 5. Reflog_evicted TODO -------------------- `runWave`'s summary hardcodes `reflog_evicted: 0` until the reflog GC layer lands. Added a TODO(reflog) comment explaining the migration shape so the placeholder doesn't rot into a forgotten zero. 6. Drop dispatcher's redundant tag-gate assertion ----------------------------------------------- `dispatchLeaf` had its own `if (!args.operator) throw` check. But `buildLabels` (called downstream) also asserts the same. Two defense- in-depth checks for the same precondition is one too many — pick the source. Kept `buildLabels`'s assert (it's the actual label-build site where the consequence of a missing operator would silently destroy audit attribution). Dropped dispatcher's; tests covering the synchronous throw on empty operator still pass because the throw happens via buildLabels at the same callsite, returns the same error class. Verification: 787 frontend tests pass (same count as PR4d.1 — the new errors.test.ts pass-through tests offset the deleted dead-code tests). Coverage: 93.22%/85.71%/92.23%/95.2% globally; runner directory 93.93/85.2/94.2/94.98 — all above the 85/85/90/90 thresholds. The two remaining gaps in dispatch.ts (200-cap branch + the inline- converter inner if) are defensible: the cap is unreachable in V1.0 by construction (per PR4d.1) and the inline branch fires only for ConverterRef shapes that don't carry a `converterId` (rare in V1.0). Lint, type-check, type-check:contract all clean. Open rubber-duck items still pending: - DTO original_prompt_id nullability tightening (V1.0 ships nullable per the spec; not yet relitigated). - Citation-strip discipline: still in-progress; partition.ts and wave.ts source comments retain a few inline section refs that should clear at end-of-V1.0. - The `reconcileTransformStates` / `reconcileAllTransforms` calls (03 §3.3 try/finally + §3.1 step 6) are NOT implemented. PR4e or a dedicated follow-up needs them or UserTurn/Score nodes that need state reconciliation after the wave stay stuck `stale`. - `WaveEvent.operator_tag_required` is in the type union but never emitted; PR4e's shim wires the gate.

The runner's entry-point shim that wraps runWave with the canonical five-step ordering from 03 §2.1: tag-hygiene gate → cross-tab lock acquire → cost-guardrail modal → per-tree wave-queue check → wave start. Steps 2–5 sit inside try/finally so the lock releases on every exit path; drain runs OUTSIDE the lock so each drained wave can acquire its own. This commit also lands the wave-end reconcileAllTransforms pass (§3.1 step 6) and the buildSForRetry helper that the retry-failed scope requires. What ships - frontend/src/runner/shim.ts - createRunnerShim(deps) → RunnerShim with refreshNode / refreshSubtree / refreshTree / retryFailedNodes / cancelWave / cancelQueued. Per-tree active-wave + queue maps live in the closure so cancel{Wave,Queued} can look up controllers and dropped requests. - The five steps in order: 1. Operator tag missing → emit operator_tag_required, no acquire, no release. Returns. 2. lockManager.acquire returns 'busy' → emit busy event, return (no release; we don't hold the lock). 3. costGuardrail.approve(estimatedCalls, triggerKind) → reject returns via outer finally (lock released). 4. currentWaveByTree.has(treeId) → enqueue {waveId, triggerKind, scope, leafCount}, emit queued event with queueDepth, return via outer finally. 5. createWaveController + runWaveStarter → set currentWaveByTree → await settled → reconcileAllTransforms → delete currentWaveByTree → outer finally releases lock → drain queue OUTSIDE the lock. - cancelWave(treeId): looks up active controller, .cancel(), then awaits settled (swallowing rejection) so the public contract "resolves when the wave fully settles" holds. - cancelQueued(treeId): splices the queue, emits a synthetic complete event with summary.cancelled = leafCount per dropped wave (the §10.3 contract — operator sees the queued banner reconcile). - Dependencies are injected (operatorProvider, treeProvider, sink, lockManager, costGuardrail, runWaveStarter, uuid, optional now) so tests mock every boundary and the shim's orchestration is the only thing under test. - frontend/src/runner/reconcile.ts - reconcileAllTransforms(tree, treeId, sink) walks every transform-class node (user_turn / fan / score) once and flips stale → clean when the parent is clean. Single pass; transforms whose parent flipped this pass stay stale for a follow-up wave (catches the operator-typical Score-as-Send-sibling case the per-dispatch path-scoped reconcile cannot reach). - frontend/src/runner/readiness.ts - buildSForRetry(tree, nodeIds): S = {nodeIds} ∪ {failed/cancelled Send ancestors on each input's root-to-leaf path}, deduped. Walks transparently through UserTurn / Fan / Score ancestors (only Send state counts). Silently skips missing nodeIds (UI race tolerance). - demoteRetryFailedNodes signature evolved from void to return ConversationTree: fires sink calls (React-state side) AND returns a new tree with demoted nodes flipped to stale + execution null + lastError null (the pure-data side runWave's computeReady reads). Returns the input tree by identity on no-op so caller memoization stays valid. Notable shape decisions - Drain runs OUTSIDE the outer lock-release. The spec's literal pseudocode in §2.1 nested the drain inside the outer try, which would deadlock on real BroadcastChannel-keyed locks (re-entered drained waves would block trying to acquire the same lock). The correct read is: each drained wave gets its own acquire-release cycle. Drain reachability after the outer try is the right signal — every early-exit path (tag-gate, busy, missing tree, cost-cancel, enqueue) returns from inside the try (bypassing the drain block); a step-5 exception propagates through the finally and exits the function before drain runs. No bookkeeping flag needed. - waveId minted per shim entry, not per queue request. The §10.3 spec mints one waveId per entry; for an enqueued wave the queued event carries it. When the queued wave later drains, its re-entered shim invocation mints a fresh waveId for the dispatch itself. That's what the literal spec does and what the V1.0 UI needs (the queue banner reads queueDepth, not per-wave tracking). - Retry-failed shim DEMOTES via demoteRetryFailedNodes and uses the RETURNED tree for runWaveStarter. The §3.1 step 2b demotion is spec'd as inside the dispatch loop, but doing it in the shim keeps runWave dumb (it doesn't need to know about waveTriggerKind === 'retry_failed' semantics). The dual sink-call + returned-tree shape on demoteRetryFailedNodes is the source of truth for both surfaces. - cancelQueued emits the complete event itself. The shim is the only place that has the per-queued-wave waveId + leafCount; the mocked runWaveStarter for queued waves never runs, so no other layer can emit the wave's lifecycle complete event. This matches the §10.3 contract literally. - Missing-tree path: silent return. The UI is the only legitimate caller of these entry points and always passes a treeId that matches the current tree. If the operator races a tree-delete against a refresh click, the shim no-ops rather than crashes. The lock IS released (we acquired before the lookup, per the spec's step ordering). - The shim doesn't depend on runWave's wave.ts directly — it takes a RunWaveStarter dependency (the function signature mirrors RunWaveArgs minus the sink/api/operation, which production wires in via a thin adapter). This is how the shim tests assert the five-step ordering without mocking the whole dispatcher. TDD narrative Started with readiness.test.ts: added 8 cases for buildSForRetry (the entry point's pre-S helper) + 4 cases for the demoteRetryFailedNodes return-tree contract. RED was tsc TS2724 ('buildSForRetry' not exported) + TS2339/TS2345 on the void return. Implemented in readiness.ts; 43 readiness tests pass (28 prior + 15 new). reconcile.test.ts next: 12 cases pinning the transform-flips-when-parent-clean rule, the Send-untouched invariant, idempotency on clean transforms, and the wide-tree walk that catches sibling-of-Send Scores. RED was TS2307 on './reconcile'. Implemented reconcile.ts; 12/12 green. shim.test.ts is the bulk of the PR: 39 cases across 12 sections — tag-gate, lock-busy, cost-cancel, queue enqueue/drain, wave-start args, lock release on every exit path (success / starter throws / cost cancel / queue / tag abort / busy abort), S construction per entry point, waveTriggerKind mapping, cancelWave behavior, cancelQueued behavior, wave-end reconcile, per-tree isolation, missing-tree no-op. RED was TS2307 on './shim'. Implemented; 38/39 green on first run. Defects surfaced during TDD - "cancelQueued does NOT affect the active wave" timed out on first pass because shim.refreshTree synchronously returns a Promise that suspends at the first await (lock acquire); the second invocation hadn't reached the enqueue step by the time cancelQueued ran, so the queue was empty and the no-op cancel didn't prevent the second wave from later draining normally — but the test only resolveNext'd once, leaving the drained second starter pending forever. Fix: waitFor 'queued' event before cancelQueued. Real bug surfaced: the race exists in production too, and operators who fire cancel-queued before the queued event has emitted will see the queued wave drain anyway. Acceptable for V1.0 (the queued event is the UI's signal to enable the cancel chip), but worth a note for PR4f when real BroadcastChannel-keyed locks change the timing. - Initial implementation carried a `waveDispatched` flag through the outer try/finally to guard the drain. ESLint's no-useless-assignment rule correctly flagged the initialization as dead — the drain block is only reachable on the step-5 success path (returns inside the outer try propagate through finally and exit; exceptions propagate through both finallys and exit). The flag was a cargo-culted pattern from C-style exit-code tracking; JavaScript's try/finally semantics already encode the right signal in reachability. Dropped the flag. - Wave-end reconcile uses treeProvider for a fresh lookup rather than the tree object we built S against. Reason: in production the wave's sink writes flow into the React state container that treeProvider reads, so the post-wave snapshot reflects the dispatcher's Send state-flips and the reconciler walks the correct world. In tests with closed-over fixed trees, this is a no-op (same reference returned) — same answer either way. Verification Tests: 851 frontend passing (787 prior + 64 new: 15 readiness + 12 reconcile + 39 shim — see test counts below per file). Backend unchanged (~658 passing). Lint: clean. Type-check: clean (main + contract). Coverage: src/runner directory 94.88 / 87.29 / 94.04 / 96.05 against the 85/85/90/90 thresholds — shim.ts at 98.7/93.33/90.9/ 100, reconcile.ts at 94.73/90/100/100, readiness.ts at 94.53/89.04/92.3/96.15. Pre-existing 126 latent test type errors untouched (the narrow contract-test gate stays green). Per file: - readiness.test.ts: 43 tests (28 prior + 15 new) - reconcile.test.ts: 12 tests (new file) - shim.test.ts: 39 tests (new file) Next slice PR4f: replace mock CrossTabLockManager with real BroadcastChannel('pyrit-runner') keyed on conversation_tree_id (per 01 §9.4.3 / 03 §10.4). Real-lock timing will shift the cancelQueued race noted above — worth retesting the drain semantics with the polyfilled BroadcastChannel under jsdom. Together PR4e + PR4f complete the runner core; the rubber-duck reviewer fires after PR4f per the template. Open rubber-duck items still pending - DTO original_prompt_id nullability (since PR3a; not yet re-litigated). - Citation-strip discipline (partition.ts + wave.ts inline section refs still present; end-of-V1.0 strip). - reconcileTransformStates (path-scoped per-dispatch variant) NOT implemented in this PR — wave-end reconcileAllTransforms covers the canvas-stale-transform-after-wave bug; path-scoped variant is a perf optimization for the incremental UI update that V1.0 can defer. - CI gate for the 126 latent test type errors (deferred; narrow contract-test gate stays in place). - PR1 backward-compat fallback corpus verification (needs prod DB access).

…mantics (PR4f) The cross-tab advisory lock implementation per 01 §9.4.3 / 03 §10.4. Replaces the mock CrossTabLockManager that the shim consumed in PR4e with a real BroadcastChannel('pyrit-runner')-keyed lock; adds the `broadcast-channel` npm polyfill for jsdom (simulate mode) so two LockManager instances in the same jest process talk to each other the same way two browser tabs talk through the native BroadcastChannel. Also adds the queue-drain stale-set regression test that pins the §10.3 "stale-set recomputed at wave-start" contract. What ships - frontend/src/runner/crossTabLock.ts - createBroadcastChannelLockManager(options) returns BroadcastChannelLockManager — a CrossTabLockManager + close() + exposed tabId. Options: channelName (default 'pyrit-runner'), tabId (auto-mint via crypto.randomUUID), acquireTimeoutMs (default 50 per §9.4.3), logger (default console), uuid (replaceable for tests). - Wire format on the channel (per §9.4.3 rev-10 correctness note): { type: 'lock_request', treeId, requestId, tabId } { type: 'lock_busy', requestId, holderTabId } { type: 'lock_released', treeId } Request/reply correlation rides on `requestId` — MessagePort transfer-list does NOT work with BroadcastChannel. - Single onmessage dispatcher with a set of subscribers; the persistent holder-response handler and the per-acquire busy listener both register through it. One handler = consistent behavior across native and the npm polyfill (which have different onmessage calling conventions; see Defects below). - Same-tab reacquire short-circuits: heldLocks.has(treeId) ⇒ { acquired: true, holderTabId: null } immediately, no message round-trip. Per the §9.4.3 protocol — otherwise we'd race our own holder-response handler. - Graceful degradation when BroadcastChannel is undefined (Safari ≤15.3): warn once, then always-acquired. Operators on legacy Safari accept the V1.0 fork-bomb risk; everything else keeps working. acquire/release/close all no-op safely. - frontend/src/runner/treeTypes.ts - CrossTabLockManager.acquire return type changed from 'acquired' | 'busy' to a discriminated union: type LockAcquireResult = | { acquired: true; holderTabId: null } | { acquired: false; holderTabId: string } so the busy reply's holderTabId flows through the shim into the WaveEvent { kind: 'busy', holderTabId } the UI consumes. - frontend/src/runner/shim.ts - Consumes the new LockAcquireResult shape: on !acquired, emit busy with lockResult.holderTabId (was hard-coded '' in PR4e). - frontend/src/runner/shim.test.ts - Updated mkControllableLockManager to take ReadonlyArray< LockAcquireResult> and default to the acquired shape. - Tightened the busy-event test to assert holderTabId propagation. - NEW test: "drained re-entry recomputes S from the LATEST tree state, not the snapshot at enqueue" — flips treeProvider's tree between enqueue and drain; asserts the drained wave's starter call carries S computed from the post-edit tree (per §10.3 stale-set-recomputed-at-wave-start contract). Pins the PR4e shim's correctness against the rev-15 reviewer Finding 5 concern that prompted the §10.3 → §2.1 unification. - frontend/src/setupTests.ts - Loads the `broadcast-channel` npm polyfill globally with `enforceOptions({ type: 'simulate' })`. Simulate mode keeps the transport in-process — required for jest's parallel test workers, which would otherwise step on each other via the polyfill's file-RPC default (broadcast-channel/methods/node.js). - frontend/package.json + package-lock.json - broadcast-channel ^7.3.0 added as devDependency per spec §9.4.3 ("V1.0 commits to polyfilling via the broadcast-channel npm package (~5 KB) loaded in the jest setup file"). Production bundles use the browser's native BroadcastChannel; the polyfill is dev-only. Notable shape decisions - Discriminated union for LockAcquireResult instead of a bare { acquired: boolean; holderTabId: string | null }. The narrowing `if (!r.acquired) { r.holderTabId is string }` falls out automatically; no nullability dance at every callsite. - Single subscriber-set dispatcher rather than calling addEventListener/removeEventListener directly on the channel. Two reasons: (a) the native API supports addEventListener but the npm polyfill's `onmessage` handler calling convention differs from native (raw data vs MessageEvent — see Defects). Normalizing in one place at the top-level onmessage = setter handles both backends; (b) the protocol has one persistent handler (lock- request responses) + N ephemeral handlers (per-acquire busy listeners). The subscriber-set is the natural fit for that structure. - simulate-mode polyfill instead of the polyfill's default `node` mode. The `node` method uses file-based RPC under /tmp — fine for cross-process tests but a leak vector for jest's parallel-worker layout. `simulate` is in-process, deterministic, and ~5ms per message hop (the polyfill's SIMULATE_DELAY_TIME constant — comfortably below the test's 20ms acquireTimeoutMs). - The lock manager exposes `tabId` directly (not behind a getter) so the busy modal's "another tab (id: …)" label can read it without an extra API. Production passes the manager's tabId to the modal's "this tab" hint. - `acquireTimeoutMs` defaults to 50ms per §9.4.3. Configurable for tests + a future operator preference if the latency proves to be annoying (V1.x). Imperceptible vs a typical 10+ second wave. - close() removes the onmessage handler and clears subscribers before calling channel.close(). The polyfill throws on double-close; we swallow that defensively (try/catch around channel.close()). - The release wire message (`lock_released`) is posted but the spec's "Wait" auto-acquire flow is NOT wired in V1.0 — that's a UI affordance (per §9.4.3 "Wait listens for the lock_released message"). The spec also notes "Refresh anyway" as the operator override path; both are UI-layer work that lands with PR5/PR6. The runner's lock-side of the protocol (post on release) is correct today; the listener side wires in with the UI. TDD narrative Started with the shim interface update (LockAcquireResult shape) to let the lock tests reference the new type. Updated treeTypes.ts + shim.ts + shim.test.ts mock + the busy-event assertion in one pass. Shim suite stayed 39/39 green through the interface change. Then wrote crossTabLock.test.ts: 13 cases across single-instance lifecycle, two-instance contention (busy reply, release-then- reacquire, three-way contention), per-tree isolation, same-tab reacquire, BroadcastChannel-absent degradation, close-stops- responding, auto-mint tabId. RED was TS2307 on './crossTabLock'. Implemented crossTabLock.ts. First run: 9/13 green — all single-instance + degradation + close tests passed; all four cross-instance tests failed. The reason was the polyfill calls `onmessage(data)` with the raw user payload, but the native API calls `onmessage(MessageEvent)` with a wrapper. My initial implementation treated the argument as MessageEvent always and read `.data` — which on the polyfill returned undefined. Fixed by normalizing at the dispatcher (`instanceof MessageEvent` test decides whether to unwrap). All 13 green. Added the drain-stale-set regression test to shim.test.ts. First run: failed on a test bug (used `sink.calls` instead of destructured `callsOf` helper). Fixed; 40/40 shim green. Defects surfaced during TDD - The `broadcast-channel` npm polyfill's onmessage handler is called with the RAW user data (not a MessageEvent). Native BroadcastChannel calls it with a MessageEvent that wraps the data in `.data`. This is a polyfill bug or design choice (the polyfill's documentation doesn't surface it loudly), and the contention tests caught it immediately — the runner's filter `data.type === 'lock_request'` was reading `.type` off a MessageEvent (which has `.type === 'message'`, not the wire `.type === 'lock_request'`), so the holder-response handler never fired and busy replies never went out. The fix is a one-liner normalization at the onmessage setter; the runner code stays clean. Documented in the BroadcastChannelLike type comment. - The polyfill's simulate transport adds a 5ms delay per postMessage. Two hops (A→B request, B→A response) is 10ms. Our default 50ms acquireTimeoutMs is fine for production, and tests use 20ms which is also comfortable. Worth noting if future tests get flaky on slower CI: bump the test-mode timeout, don't lower the polyfill delay. - The polyfill's `BroadcastChannel.close()` may throw on double- close. The shim's try/finally pattern could trigger a double close if a lock manager is reused across shim invocations (today it isn't, but defensively the close() wraps channel.close in try/catch). Silent swallow — closing an already-closed channel has no observable effect. - Same-tab reacquire (heldLocks.has) is a load-bearing short- circuit, not an optimization. Without it, the holder-response handler would fire for the tab's own lock_request and the acquire would resolve to busy against itself. The §9.4.3 pseudocode includes this check ("if (heldLocks.has(treeId)) return 'acquired'"); the test "same-tab reacquire" pins it. Verification Tests: 865 frontend passing (851 prior + 14 new: 13 lock + 1 shim drain). Backend unchanged (~658 passing). Lint: clean. Type-check: clean (main + contract). Coverage: src/runner directory 94.5 / 86.57 / 93.87 / 96.11 against the 85/85/90/90 thresholds — crossTabLock.ts at 91.66/81.39/92.85/96.61 shim.ts at 98.7/93.33/90.9/100 (others unchanged) Two uncovered crossTabLock.ts lines: - line 93 (release no-op in degraded path) is covered by the added release call in the BroadcastChannel-absent test. - line 207 (defaultUuid's crypto-randomUUID-absent fallback) is intentional dead branch for non-Node environments. Next slice PR4f closes the runner core (PR4a-f). Per the rubber-duck template, fire the reviewer now before starting PR5 (react-flow scaffold). Reviewer scope: PR4e + PR4f + the readiness/reconcile/ lock/shim files; specific concerns — A. shim's drain-outside-lock decision (worth revisiting against the §10.3 spec literal?) B. CrossTabLockManager interface change (discriminated union vs other shapes — premature?) C. polyfill choice (npm vs in-process shim — leaner alternative?) D. wave-end reconcile via fresh treeProvider lookup (correctness against the React state container's update timing) E. queue-drain stale-set test honesty (does it prove what it claims, or is it a test-that-passes?) F. coverage gaps (defensible vs material) G. spec drift since PR4d.1 H. citation-strip discipline (still pending end-of-V1.0) Plus the standard rubber-duck items (J. anything else, hidden time-bombs, etc.). Open rubber-duck items still pending - DTO original_prompt_id nullability (since PR3a; not yet re-litigated). - Citation-strip discipline (partition.ts + wave.ts inline section refs still present; end-of-V1.0 strip). - reconcileTransformStates (path-scoped per-dispatch variant) not implemented; wave-end reconcileAllTransforms covers the canvas- stale-transform-after-wave bug. - CI gate for the 126 latent test type errors (deferred). - PR1 backward-compat fallback corpus verification (needs prod DB access). - PR4e race: cancelQueued fired before the queued event has emitted would let the queued wave drain anyway. Acceptable for V1.0 (the queued event is the UI's signal to enable the cancel chip); document for PR5/PR6 UX work.

Address must-fix items from the post-PR4e+PR4f rubber-duck review. The runner core remains feature-complete; this commit hardens the read-back semantics, kills a maintenance hazard, tightens the lock interface, and adds defense-in-depth tests for the labels-divergence invariant. What ships (per reviewer finding) Finding D — wave-end reconcile read-back race [LOAD-BEARING] The PR4e implementation re-read deps.treeProvider AFTER runWave settled to get the post-wave tree snapshot for reconcileAllTransforms. Reviewer flagged: production wires treeProvider to React state, and React 19's setState commits are queued at microtask boundaries — `await settled` may resume before React has committed the wave's setState calls, leaving treeProvider returning the STALE pre-wave tree. Reconcile would then walk the wrong world and miss every transform the wave's Send-completions just unblocked. Fix: the shim now wraps deps.sink in a per-wave recorder (createStateRecorder) and passes the wrapped sink to runWaveStarter via the new `sink` field on RunWaveStarterArgs. The recorder captures every setNodeState call into a per-wave Map<NodeId, NodeState>; after `await settled`, the shim constructs the post-wave tree via applyStateRecorder (input tree + overlay of captured states) and feeds THAT to reconcileAllTransforms. No React-state read-back; no timing dependency. The recorder forwards every other sink method (recordExecution / clearExecution / setReflogPinned / emitWaveEvent) untouched so React state stays in sync. Tests pin the new contract: - "reconcile reads POST-WAVE state via the recording sink, not a treeProvider snapshot" — tree with a stale Send + stale Score child; the test starter writes setNodeState(Send, clean) through args.sink; reconcile flips the Score to clean. Without the recorder, the treeProvider (fixed-tree closure) would return Send=stale and Score would stay stale. - "starter receives a sink that is a wrapper (not the bare deps.sink reference)" — defense-in-depth against a future refactor that reverts the wrapping. - "recording sink forwards every sink method to the underlying deps.sink" — pins that recordExecution / clearExecution / setReflogPinned / emitWaveEvent are NOT swallowed. Finding C — polyfill swap (npm → in-process) The broadcast-channel npm package's onmessage(data) calling convention differs from native onmessage(MessageEvent), forcing a normalization shim in crossTabLock.ts. Reviewer flagged: (a) the normalization is a maintenance hazard (a future browser MessageEvent subclass would break the `instanceof` check silently); (b) simulate mode bypasses structured-clone serialization, so non-JSON-serializable fields added to WireMessage later would pass tests but fail in production; (c) 7 transitive deps for a 25-line problem. Fix: 25-line in-process BroadcastChannel shim in setupTests.ts that delivers a real MessageEvent on postMessage, matching native semantics exactly. Removes the broadcast-channel devDep (and its 7 transitive packages), removes the eventOrData normalization shim from crossTabLock.ts, and tightens BroadcastChannelLike back to `onmessage: (event: MessageEvent) => void`. Production code unchanged. Spec note: 01 §9.4.3 says "V1.0 commits to polyfilling via the broadcast-channel npm package." Spec wins on settled commitments by default; reviewer's case for the in-process shim ranged across three orthogonal concerns (test fidelity, maintenance burden, dep surface) and was strong enough to override. Spec amendment for end-of-V1.0 doc pass. Finding B — LockAcquireResult union simplification PR4f's discriminated union was: { acquired: true; holderTabId: null } | { acquired: false; holderTabId: string } Reviewer flagged the `holderTabId: null` on the acquired-true variant as junk data — invites a reader to think it carries meaning when it doesn't. `acquired: true` already says "no holder." Fix: tightened to `{ acquired: true } | { acquired: false; holderTabId: string }`. Shim consumer unchanged (already only reads holderTabId on the !acquired branch). All call-sites updated (crossTabLock.ts, shim.test.ts mock, crossTabLock.test.ts assertion, treeTypes.contract.test.ts). Finding J — acquire-after-close throws Reviewer flagged: PR4f's closed-manager acquire returned { acquired: true } silently. A closed manager has no holder-response handler and no peer subscription — returning "acquired" is a silent lie that produces a phantom cross-tab race. Fail-loud over silent-lie. Fix: acquire on a closed manager throws Error('cross-tab lock manager is closed'). Both the native-BC path and the BroadcastChannel-absent degraded path track a `closed` flag and throw consistently. Release stays best-effort no-op on closed managers (release is the cleanup path; throwing there would surface inside the shim's outer finally and cascade with the wave settlement). Tests added: "acquire after close throws" + "release after close is a no-op (NOT throw)." Finding J (also) — parentConversationTreeId preservation contract Reviewer flagged: the shim reads `tree.parentConversationTreeId` off the post-demotion tree returned by demoteRetryFailedNodes and forwards it to runWaveStarter for label-divergence compliance. If a future demoteRetryFailedNodes refactor lost the spread of tree-level fields, parentConversationTreeId becomes undefined and the labels-round-trip integration test would fail in CI — but no unit test would catch the regression on its own. Fix: contract test in readiness.test.ts — "preserves tree-level fields on the returned tree (id, edges, parentConversationTreeId, undoStack, etc.)" — verifies the demoted tree carries id / parentConversationTreeId / parentSourceConversationId / displayName / rootId / createdAt / edges / undoStack identical to the input. Pin lands before any regression. Finding A — drain-block structural invariant comment Reviewer reading: the drain-outside-the-lock decision is correct, but the commit body's "prevents a deadlock" argument is wrong (the lock manager's same-tab reacquire short-circuits would handle that). The real reason is cross-tab fairness — drain-inside makes other tabs wait for N-deep queues; drain- outside lets them interleave. Reviewer also flagged the "no bookkeeping flag needed" property as load-bearing on the structural invariant "every early-exit uses `return`" — a future refactor that replaces a guarded return with an else-branch would silently start draining on the early-exit path. Fix: extended the comment block on shim.ts's drain block to state the cross-tab-fairness rationale (corrects the original commit body's deadlock argument) AND name the return-on-early-exit structural invariant explicitly. The next refactor reader has explicit guidance. Notable shape decisions - createStateRecorder is a local helper (not exported). The shim is the only producer/consumer of the recorder lifecycle; making it a public utility would invite reuse in places where the React-state-read-back race doesn't apply. - applyStateRecorder returns the input tree by identity when states.size === 0 (no-op waves). Preserves caller-side identity- based memoization the same way demoteRetryFailedNodes does on no-op demotions. - The recording sink wraps deps.sink but does NOT track recordExecution. Reason: state transitions are always via setNodeState; recordExecution attaches the execution record THEN setNodeState flips state to 'clean'. Capturing setNodeState only is sufficient for the reconcile use case AND keeps the recorder small. If a future caller needs to see the execution map, extend then; don't speculate. - The in-process BroadcastChannel polyfill uses queueMicrotask for delivery (matching native sync-emit-but-async-receive semantics). Tests await two microtask hops to settle a round-trip (A request → B response). Determinism in parallel-worker jest is preserved because state is purely in-memory per-process. - Acquire-after-close throws but release-after-close is no-op. The asymmetry is intentional and documented inline: release runs from the shim's outer finally; throwing there would cascade with the wave settlement and turn one issue into two. Acquire runs from a fresh shim entry; throwing there surfaces the bug at the right callsite. - Drop the broadcast-channel npm dep + its 7 transitives entirely (was added in PR4f, removed in PR4e+f.1). Net dep change for PR4e+f vs PR4d.2: zero. Defects surfaced during TDD - Initial D-fix attempt forgot to update RunWaveStarterArgs's public type, causing the test's custom starter to read args.sink from a type that didn't declare it. Caught at type-check; added `sink` to the args interface. - The shim's import block accidentally pulled in ApiErrorReason after a copy/paste from treeTypes. ESLint's no-unused-vars caught it; cleaned up. - When swapping the polyfill, the existing "after close, the closed manager no longer responds as holder" test continued to pass without the in-process shim's listeners-set teardown behaving correctly. Verified by running the test BEFORE removing the npm dep; both implementations satisfy the contract identically. - Section 11 comment header in shim.test.ts got dropped during the section-10 edit. Restored. Verification Tests: 871 frontend passing (865 prior + 6 new: 3 shim D-proving + 1 lock close-throws + 1 lock release-after-close + 1 readiness parentConversationTreeId contract). Backend unchanged (~658 passing). Lint: clean. Type-check: clean (main + contract). Coverage: src/runner directory 95.13 / 87.42 / 96.26 / 96.58 (was 94.5 / 86.57 / 93.87 / 96.11). All modules above the 85/85/90/90 thresholds. crossTabLock.ts at 94.66/85.36/100/98.36 (functions to 100) shim.ts at 98.91/96.87/95/100 (branches improved) readiness.ts at 94.53/89.04/92.3/96.15 (unchanged) reconcile.ts at 94.73/90/100/100 (unchanged) wave.ts at 100/93.1/100/100 (unchanged) partition.ts at 93.75/85.1/100/94.59 (unchanged) dispatchHelpers.ts at 97.43/89.47/100/97.22 (unchanged) dispatch.ts at 85.33/50/88.88/88.23 (UNCHANGED; pre-existing low-branch debt from PR4c2; tracked as open item) Open rubber-duck items still pending Carried forward (not addressed in PR4e+f.1): - DTO original_prompt_id nullability (since PR3a; not re-litigated). - Citation-strip discipline (partition.ts + wave.ts inline section refs; end-of-V1.0 strip). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification (needs prod DB access). - PR4e cancelQueued-race: cancelQueued fired before the queued event emits lets the queued wave drain anyway. Acceptable for V1.0; PR5/PR6 UX should document. New from PR4e+f.1 review: - dispatch.ts branch coverage at 50% (pre-existing from PR4c2, not introduced by PR4e/f). dispatch.ts owns create_attack + add_message sequencing and partial-commit semantics; the low branch coverage on mid-chain failure paths is genuine debt. Tracked for follow-up; not gating PR5. - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim, LockAcquireResult DU): three sentence-length amendments owed to 01 §9.4.3, 03 §2.1, 03 §3.1 step 2b. End-of-V1.0 doc pass. - shim drain loop serializes ALL queued waves under the first shim invocation's call stack. If operator clicks Refresh 5 times: the FIRST call's promise doesn't resolve until ALL 5 waves complete. Contract decision deferred to PR6 (which adds the wave-status banner that consumes this). Next slice PR5 — react-flow UI scaffold. Sub-PRs PR5a-g per plan. Runner core (PR4a-f + PR4e+f.1) is now feature-complete and the rubber- duck checkpoint is closed. The runner-shape PR5 depends on is frozen: ConversationTree as plain object, RunnerShim's six-method surface, RunnerStateSink + WaveEvent stream contracts.

…pter (PR5a) First slice of PR5 (the react-flow UI). Adds the pure adapter that maps a domain ConversationTree onto react-flow's Node[]+Edge[] shape, plus a minimal `TreeCanvas` scaffold component that mounts ReactFlow with the adapter's output. Per-node components (PR5b), action rails (PR5c), edge `+` chip (PR5d), Stack rendering (PR5e), Pick/Unpick (PR5f), and layout (PR5g) land in subsequent slices. What ships - frontend/src/components/Tree/conversationTreeToReactFlow.ts - conversationTreeToReactFlow(tree) → { treeId, nodes, edges } - One react-flow Node per ConversationTreeNode (1:1, no restructuring). Discriminated-union typing on the result (TreeFlowNode = Node<{node:RootPromptNode},'root_prompt'> | ... | Node<{node:ScoreNode},'score'>) so PR5b's node components can register by kind and narrow params via a switch. - Each node's `data.node` is the SAME ConversationTreeNode reference (not a clone) so downstream useMemo hooks in node components can identity-check against unchanged-tree renders. - Each node carries a placeholder `{ x: 0, y: 0 }` position; PR5g's d3-hierarchy layout pass overrides on render. - One react-flow Edge per ConversationTreeEdge with type='smoothstep' (orthogonal routing, per 02 §4.4). - Edge data carries `slotIndex` so the PR5e Fan-Children Stack predicate and PR5f Pick/Unpick (writes promotedChildSlotIndex) can read it directly without walking back to the source ConversationTreeEdge. - Stable edge id from the source `ConversationTreeEdge.id` (load-bearing for react-flow's reconciler — id changes force unmount/remount and kill the PR5d edge-hover state). - Exhaustive kind switch with `never`-typed default arm — compile-time guard against adding a new ConversationTreeNode kind without an adapter arm. - frontend/src/components/Tree/TreeCanvas.tsx - Scaffold component: takes a ConversationTree, calls the adapter, mounts ReactFlow + ReactFlowProvider. - `useMemo` on the adapter call so tree-prop identity changes (not just shape changes) drive re-adaption. - `data-tree-id` + `data-testid="tree-canvas"` on the wrapper div for test introspection AND so PR5b+ can route action- rail callbacks back to the runner shim by reading the ancestor tree id. - nodeTypes + edgeTypes deliberately not registered yet — the scaffold uses react-flow's default node renderer (shows the node id) until PR5b's per-kind components land. Commented stubs flag where PR5b/5d will plug in. - fitView prop on for the scaffold so single-tree-fits-canvas works without manual zoom. Notable shape decisions - The adapter is a pure function over tree shape; no react, no hooks, no closures over runner state. PR5g's layout pass will wrap this output and add positions; per-node interactivity (action rail callbacks, edge `+` chip) lands in the node / edge components and routes through props supplied at the TreeCanvas boundary. - Discriminated-union node typing rather than a single Node<{node: ConversationTreeNode}>. Without the discrimination, every node component would need an internal `if (node.kind === ...)` narrow before reading params. The kind-discriminated union lets PR5b register node components as `nodeTypes: { root_prompt: RootPromptCard, ... }` and have TypeScript narrow `data.node` to the right type at the component boundary. - TreeFlowEdgeData is `{ slotIndex: number }`, NOT the full `ConversationTreeEdge`. Reasons: (a) slotIndex is the only field a Stack / Pick consumer reads; (b) extending TreeFlowEdgeData later is a non-breaking type change (edge consumers read specific keys, not the whole object); (c) keeps the adapter output minimal. - Placeholder positions at (0,0) instead of computing a quick initial layout. react-flow tolerates same-position nodes (they overlap at the origin). PR5g's layout pass owns positioning end-to-end; computing a throwaway layout here would be wasted work. The scaffold's `fitView` keeps the overlap from breaking the canvas mount visually until layout lands. - TreeCanvas doesn't take a sink, runner, or any callbacks yet — pure-render shell. PR5b will widen the prop set as action-rail callbacks land; defining the prop surface speculatively would invite over-specifying before we know what the components need. TDD narrative conversationTreeToReactFlow.test.ts: 17 cases pinning node 1:1 mapping, kind → type passthrough, ImportMessageNode-as-root, data.node identity preservation, placeholder positions, edge 1:1 mapping, edge id stability, slotIndex on edge data, smoothstep edge type, fan-children edges with auto-numbered slot indices, explicit edges with non-default slotIndex, root-only tree (zero edges), input-mutation safety, wide multi-fan-path tree, treeId on result, kind-discriminated type narrowing. RED was TS2307 on './conversationTreeToReactFlow'. Implemented; 17/17 green. TreeCanvas.test.tsx: 4 cases — node count per tree, treeId attribute on wrapper, tree-swap survives without losing nodes, wide tree mounts cleanly. RED was TS2307 on './TreeCanvas'. Defects surfaced during TDD - First TreeCanvas test pass used `[data-id]` as the node-card selector. react-flow tags connection handles with `data-id` too (e.g., "1-r-null-target"), so the selector matched 6 DOM nodes per card (1 wrapper + 2 handles + handles on the handles). Switched to `[data-testid^="rf__node-"]` which matches only the card wrapper. 3/4 tests went green to 4/4. - First implementation pass used `): JSX.Element` as the TreeCanvas return type. Main tsconfig (vs tsconfig.test.json) doesn't include the legacy global JSX namespace; eslint no-undef caught it. Dropped the explicit return type to match the existing component convention (ConnectionBanner, ErrorBoundary, etc.). Verification Tests: 892 frontend passing (871 prior + 21 new: 17 adapter + 4 scaffold). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 90.9 / 85.71 / 100 / 90.47 — clear of the 85/85/90/90 global thresholds. conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71 (uncovered lines: the `never`-typed default arm of the kind switch — compile-time-unreachable; istanbul still counts it) TreeCanvas.tsx: 100/100/100/100 Dependencies - @xyflow/react ^12.11.0 added as a runtime dep (replacement for the older `reactflow` package; v12+ ships under the @xyflow scope). Adds 19 transitive packages. Next slice PR5b — per-kind node components (RootPromptCard, UserTurnCard, SendCard, FanCard, ScoreCard, ImportMessageCard). Each card reads `data.node` (typed to its kind), renders a Fluent UI card, and exposes hooks for the PR5c action rail. The scaffold's nodeTypes prop wires up once components are ready. Open rubber-duck items still pending (unchanged from PR4e+f.1) - DTO original_prompt_id nullability. - Citation-strip discipline (partition.ts + wave.ts + shim.ts have inline section refs; end-of-V1.0 strip). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim, LockAcquireResult DU). - shim drain loop call-stack serialization.

Six per-kind node cards (RootPromptCard, ImportMessageCard, UserTurnCard, SendCard, FanCard, ScoreCard) registered against the TreeCanvas's `nodeTypes` prop. Each card displays the kind-specific summary fields a tree-viewing operator needs to see at a glance — prompt text + target on the root, role + text + converter count on user turns, target override + lastError on sends, axis + variant count + pick indicator on fans, scorer type + V1.0 render-only hint on scores. Action rails (PR5c), edge `+` chip (PR5d), Stack rendering (PR5e), Pick/Unpick (PR5f), and layout (PR5g) land separately. What ships - frontend/src/components/Tree/nodeCards.tsx - RootPromptCard: prompt text (truncated body) + target chip; no target handle (root has no parent). - ImportMessageCard: source conversation id + cutoff index; no target handle. - UserTurnCard: text body + role + optional converter-count chip; both handles. Body carries the full text on the `title` attribute so hover discoverability works even when the body is line-clamp truncated. - SendCard: optional target override + lastError surface (visible only when state is 'failed' AND lastError !== null); both handles. - FanCard: axis + variant count + Pick indicator (when promotedChildSlotIndex is non-null); both handles. - ScoreCard: scorer type + V1.0 render-only hint (per 02 §2.2 ScoreNode rail: configure-scorer is V1.1). - Shared CardFrame component renders the kind label, the state badge (7-state color map: draft / clean / edited / stale / running / failed / cancelled), and the react-flow Handles (target on top, source on bottom; toggleable via props since root + import don't have parents). - Shared MetaRow for "label: value" key-value pairs in a consistent layout. - frontend/src/components/Tree/treeNodeTypes.ts - Kind → component map: `{ root_prompt, import_message, user_turn, send, fan, score }`. Lives in its own module so eslint's react-refresh/only-export-components rule stays happy (mixing the registry with components defeats HMR). - frontend/src/components/Tree/TreeCanvas.tsx - Wired `nodeTypes={treeNodeTypes}` into ReactFlow. The scaffold's commented-out stub from PR5a is now live; cards render in place of react-flow's default node renderer. - frontend/src/components/Tree/nodeCards.test.tsx - 34 tests across six per-card describes + a registry integration describe. Each card describe asserts: - kind label rendered - kind-specific fields rendered - state badge renders the current state - handle visibility per parent expectations Registry tests mount cards via TreeCanvas to prove the `nodeTypes` wiring is intact end-to-end (missing entries would fall back to react-flow's default node renderer, which only shows the node id — the test asserts specific card content like "Root prompt" + "Send", which only render when the registry is properly registered). Notable shape decisions - Cards are read-only display only in PR5b. No callbacks, no edit handlers, no action rail. PR5c widens the prop set as action-rail callbacks land; defining the surface speculatively would over-specify before we know what the components need. - State badge uses a 7-state inline color map rather than Fluent UI's intent-based MessageBar. Reason: state colors are a domain-specific visual language (clean=green, edited=yellow, stale=orange, running=blue, failed=red) that doesn't map onto Fluent's success/warning/error intents. The inline map keeps the color choices in one place and visible in the test grep. - CardFrame takes showTargetHandle + showSourceHandle as explicit props with defaults of true. Root + import set showTargetHandle={false} explicitly. The visual reads correctly without the top handle (no half-edge stub) and the type-system enforcement falls out of CardFrame's prop signature. - SendCard's lastError block renders inline (red panel) rather than as a popover or tooltip. Operators scanning a failed fan want the failure reason at a glance; clicking through to a tooltip adds friction without value. The PR6 wave- complete toast covers the wave-level summary; the per-card lastError covers the per-leaf detail. - FanCard's Pick indicator renders as a MetaRow ("pick: slot N") rather than a special chip. The §3.3 Pick semantic is one of several fan-card facts (axis, count, pick); rendering them uniformly as MetaRows keeps the card visually consistent. PR5f's Pick/Unpick interaction lands here, but it's affordance — not display — so the visual stays this PR. - ScoreCard's "V1.0: displays scores attached to upstream pieces" hint prevents the V1.0 operator from expecting to click and configure (the affordance is V1.1 per 02 §2.2 + the spec's render-only contract per 01 §4.5 / 03 §3.2). Surfacing the limitation on the card is cheaper than a tooltip on the (currently-not-rendered) configure icon. - NodeProps generic uses `Extract<TreeFlowNode, { type: 'X' }>` so each card's props type is exactly the corresponding adapter output node. This is what makes `data.node: RootPromptNode` (vs `ConversationTreeNode`) without a runtime narrow inside the card — the discriminated-union shape PR5a set up pays off here. TDD narrative Single test file (nodeCards.test.tsx) with 34 cases organized into seven describe blocks (one per card + the registry). RED was TS2307 on './nodeCards'. Implementation took two passes: first pass put treeNodeTypes in nodeCards.tsx; lint flagged react-refresh/only-export-components because the file mixed component exports with a non-component constant. Moved treeNodeTypes into its own module (treeNodeTypes.ts); 55/55 green. Defects surfaced during TDD - Initial registry placement in nodeCards.tsx triggered the react-refresh/only-export-components warning. Mixing exports breaks HMR for the components (the bundler can no longer fast-refresh just the changed component). Split into a separate treeNodeTypes.ts module — net +1 file, +0 LOC of logic, and HMR works. - First lint pass also caught that `): JSX.Element` is not valid under the main tsconfig (no global JSX). PR5a hit the same issue; convention is to omit the explicit return type on React components. Verification Tests: 926 frontend passing (892 prior + 34 new). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 96.07 / 93.1 / 100 / 96 — clear of the 85/85/90/90 thresholds. nodeCards.tsx: 100/95.45/100/100 (one uncovered branch is the inline color map's defensive fallback for a state not in the union — unreachable) treeNodeTypes.ts: 100/100/100/100 TreeCanvas.tsx: 100/100/100/100 conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71 (unchanged from PR5a; the never-typed default arm) Next slice PR5c — per-node action rail (icons + tooltips per 02 §2.2). Adds onRefresh, onBranch, onDelete, onOpenLinear callback props to TreeCanvas → cards. The rail itself is a small floating row positioned by react-flow's NodeToolbar component (or a custom absolute-positioned div if NodeToolbar's defaults don't fit). Action wiring routes through props supplied at the TreeCanvas boundary; the actual runner calls land in PR7 when persistence + auto-reverse make a complete-cycle integration possible. Open rubber-duck items still pending (unchanged from PR5a) - DTO original_prompt_id nullability. - Citation-strip discipline. - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim, LockAcquireResult DU). - shim drain loop call-stack serialization.

Address must-fix + should-fix items from the post-PR5a+PR5b rubber-duck review. The Tree UI scaffold and per-kind cards stay feature-equivalent; this commit fixes a load-bearing theme bug, removes coupling to react- flow internals, threads the `selected` prop through every card so PR5c's action rail can read it without rewriting card prop surfaces, and strips spec citations that regressed past the PR4b directive. What ships (per reviewer finding) Finding B.1 — dark-theme-only baked colors [LOAD-BEARING] Reviewer flagged: PR5b's `STATE_COLORS` constant hardcoded dark hexes (#1e3a1e for clean-green, #3a1e1e for failed-red, etc.). App.tsx toggles webLightTheme ↔ webDarkTheme; switching to light theme rendered cards as dark-on-near-white stripes — clean-green looked like a smudge, failed-red was unreadable. Real runtime defect; the cards were the only component in the workspace ignoring the theme contract. Fix: replace `STATE_COLORS` with `STATE_BADGE_TOKENS: Record< NodeState, { background, foreground }>` keyed on Fluent palette tokens (colorPaletteGreenBackground2 / Foreground2 pairs, etc.). Both light + dark themes auto-adapt. Matches the rest of the codebase's status-chip convention (Chat/TargetBadge.styles.ts). Finding C / J.3 — CardFrame inline styles + missing `selected` Reviewer flagged: (a) PR5b cards used inline `style={{...}}` for every visual decision. PR5c's action rail will need `:hover` + `[data-selected]` pseudo-classes that inline styles can't express, forcing a useState(hover) + onMouseEnter retrofit on every card. (b) react-flow passes `selected: boolean` to every node component, but PR5b cards dropped it on the floor (the test helpers even passed `selected: false` already). PR5c's selection-visual + action-rail-visibility-when- selected would have rewritten every card's prop surface. Fix: new `nodeCards.styles.ts` companion module using `makeStyles` + Fluent tokens for every card visual. CardFrame now accepts `selected?: boolean` (default false at the destructure site — one default site, not per-card), threads it through to `data-selected` on the wrapper, and applies the `frameSelected` class for the brand-color outline. Every card receives + forwards `selected` from NodeProps. PR5c becomes "add the action rail," not "rewrite six cards." Finding E — data-testid="rf__node-*" couples to react-flow internals Reviewer flagged: PR5a's TreeCanvas test used `container.querySelectorAll('[data-testid^="rf__node-"]')` — a private testid scheme that could silently shift with a @xyflow minor-version bump. Fix: CardFrame emits `data-tree-node-id={nodeId}` on the wrapper. TreeCanvas tests now select via `[data-tree-node-id]`, which is under our control and immune to react-flow renames. Also enables the PR5c action rail to find cards by node id without walking react-flow's DOM tree. Finding A — compile-time guard on the kind → component registry Reviewer flagged: PR5b's `treeNodeTypes` registry was just an `as const`. The "every kind has a registry entry" guarantee relied on the test alone — a developer adding a new ConversationTreeNodeKind without a registry entry would only discover it at jest time. Fix: `as const satisfies Record<ConversationTreeNodeKind, ComponentType<never>>` on treeNodeTypes. Compile-time completeness: tsc fails the moment a new kind lands without a registry arm. The existing runtime test is now defense-in-depth. Finding I — citation discipline regressed by 8 instances PR5a + PR5b added 8 new `02 §...` references in JSDoc + test comments after the PR4b directive ("no new citations in code"). Stripped all 8 — 5 in module-header JSDoc, 3 in test comments and titles. The doc/gui/design/ files remain the source of truth; the code just no longer mirrors specific section refs. Finding H.4 — operator-facing V1.0/V1.1 text in ScoreCard Reviewer flagged: ScoreCard's body copy was "V1.0: displays scores attached to upstream pieces (configuration is V1.1)." Operators don't know what V1.0/V1.1 means; release labels in operator-facing copy are a smell. The "scorer configuration coming later" detail belongs on the PR5c action rail's disabled `✏` icon tooltip, not the card body. Fix: strip the V1.0/V1.1 text. The ScoreCard footer now reads just "Read-only display" — enough to signal non-interactivity without naming an internal version label. Finding F — implementer's coverage rationalization was wrong Reviewer flagged: PR5b commit body claimed the missing branch was "the inline color map's defensive fallback for a state not in the union" — but there was no `??` fallback in the code; an invalid state would throw on `.background` access. The rationalization didn't match the source. Fix: actual uncovered branches found and exercised: - SendCard's `state==='failed' && lastError===null` quadrant (test "does NOT render the error panel when state is 'failed' but lastError is null") - UserTurnCard's singular-converter ternary (test "uses singular 'converter' for a one-converter pipeline") - CardFrame's `selected = false` default (test "cards default to unselected when `selected` is undefined") Finding D — adapter ↔ registry alignment defense-in-depth Reviewer flagged: the registry test could be the single point of failure for adapter/registry alignment. If the adapter emitted `type: 'rootPrompt'` and the registry keyed on `root_prompt`, only the registry test would catch it; per-card tests would still pass. Fix: new test "every kind emitted by the adapter has a registry entry (adapter ↔ registry alignment)" — builds a tree with every kind, runs the adapter, asserts every result `node.type` is a registry key. Defense-in-depth alongside the `satisfies` compile-time guard from Finding A. Finding J.2 — `mockNodeProps` generic stub builder Reviewer flagged: PR5b had six near-identical `as unknown as Parameters<typeof X>[0]` cast helpers. The `as unknown as` is a "trust me" the type-checker can't validate; adding a mandatory field to NodeProps would silently break tests. Fix: one generic `mockNodeProps<T>(id, data, selected?)` helper; the six card-specific wrappers (`rootPromptProps(node)`, etc.) are now thin invocations that supply the appropriate `T`. Notable shape decisions - `makeStyles` was chosen over emotion or styled-components because Chat/TargetBadge.styles.ts is the existing in-codebase pattern. Mixing CSS-in-JS systems would be churn for no benefit. - Griffel (Fluent's CSS engine) rejects the `borderColor` CSS shorthand for theme-token reuse reasons (so individual sides can be overridden via longhand). The `frameSelected` slot uses `borderTopColor` / `borderRightColor` / `borderBottomColor` / `borderLeftColor` instead. Same visual; lint-clean. - The inline state-badge color comes from STATE_BADGE_TOKENS at render time (one `<span style={{ background, color }}>` per card). makeStyles can't easily produce 7 dynamic-key combinations without static slot generation; the runtime lookup is cheaper and reads more naturally. Token references resolve at render against the current theme, so dark/light still both work. - The `selected` default lives in CardFrame's destructure (`selected = false`), not per-card (`selected ?? false`). One branch to cover, not six. The cards forward whatever NodeProps handed them and rely on CardFrame's default for the absent case. - Stripped the SendCard inline lastError panel? No — kept it. Reviewer flagged it as spec drift (H.2: the spec pins failure summarization at the wave-status banner, not the per-card panel). Decision: keep the inline surface because operator "why did this leaf fail" is a real read-this-card moment, and the action-rail `💬` (PR5c, drawer) is the "full raw response" surface — different need. The spec should grow an acknowledgment sentence (deferred to end-of-V1.0 doc pass with the other spec drift notes). TDD narrative Worked finding-by-finding starting from the load-bearing one (B.1 theme bug). Each finding: - identified the regression-prone surface - landed the fix - added or tightened a test that proves the fix sticks Two test failures caught regressions: 1. The "failed-with-null-lastError" test used a text-content regex that matched the state badge's "failed" string — false-positive. Fixed by switching to a class-substring scan for `errorPanel` (makeStyles preserves slot names in dev mode for debuggability, so the class contains 'errorPanel'). 2. The Griffel `borderColor` shorthand warning surfaced during the makeStyles refactor. Fixed via four longhand properties. Defects surfaced during TDD - Griffel's CSS-shorthand rejection list includes `borderColor` but NOT `border` (the full shorthand). The distinction is "shorthand that lets later rules override specific sides via longhand" — only the partial-override family is rejected. Documented inline in nodeCards.styles.ts. - The "doesn't render error panel" test originally used a text-substring scan that false-matched against the state badge's literal "failed" text. Lesson: don't text-scan for state words when the state itself is a literal element on the card. - Coverage on nodeCards.tsx's branch count moved with the `selected` consolidation. Originally each card had its own `?? false` (6 branches); consolidation to CardFrame's destructure-default left 1 branch. The "defaults to unselected" test covers it, but istanbul still scores the default-parameter as two arms. Verification Tests: 932 frontend passing (926 prior + 6 net: 4 new + 2 renamed + 0 removed). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 96.66 / 94.28 / 100 / 96.61 (was 96.07 / 93.1 / 100 / 96 in PR5b). nodeCards.tsx: 100/96.42/100/100 (one uncovered branch is the default-parameter arm of `selected = false` — istanbul over-counts default-parameter coverage) nodeCards.styles.ts: 100/100/100/100 treeNodeTypes.ts: 100/100/100/100 TreeCanvas.tsx: 100/100/100/100 conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71 (unchanged; the never-typed default arm of the kind switch) Open rubber-duck items still pending Carried forward (not addressed in PR5a+b.1): - DTO original_prompt_id nullability. - Citation-strip discipline (partition.ts + wave.ts + shim.ts still have section refs; this PR did NOT strip those, only the new ones introduced by PR5a+b). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim, LockAcquireResult DU); PR5b.1 adds another: SendCard's inline lastError surface is spec drift kept by design. - shim drain loop call-stack serialization. Deferred from this review: - Card decomposition split point (six cards + three shared primitives in one 270-line file). Reviewer J.4: "split when CardFrame's hover/select state lives in its own hook — somewhere mid-PR5c." Honored: not splitting now. - PR5g layout memo keying. Reviewer J.1 noted PR5g will be tempted to re-layout per-leaf state ping. Not addressable in PR5b; tracked for PR5g. Next slice PR5c — per-node action rail (icons + tooltips). Callback props on cards (onRefresh, onBranch, onDelete, onOpenLinear), threaded via TreeCanvas. CardFrame's `selected`-driven `data-selected` attribute is what PR5c's rail-visibility-on-hover-or-selected CSS reads. The makeStyles shell from PR5b.1 is where the hover pseudo-class lands.

…xt (PR5c) The common-to-every-node action rail (Refresh / Branch / Branch-as- subtree-stub / Delete / Open-in-linear) wired through TreeCanvas → CardFrame → ActionRail via a React context. Per-callback opt-in: undefined callbacks hide their buttons so PR5c lands the wiring without forcing every runner integration to land in the same PR. Kind-specific actions (✏ edit, ⚡ converter, ≡ role, ↻×N re-run, 📎 attachment, 🎯 target-override, etc.) defer to later sub-PRs — each needs its own state machine + dialog and is poorly served by the same minimum-viable callback bag. What ships - frontend/src/components/Tree/actionRail.tsx - ActionRail({ nodeId, callbacks, branchLabel }) — small Fluent UI Button row with five action slots: ↻ Refresh (ArrowSyncRegular) 🌿 Branch (BranchRegular; label varies by node kind) ⫝ Branch-subtree (BranchForkRegular, ALWAYS disabled — V1.1 placeholder; reserved slot per the operator-facing convention that V1.1 enablement is a state flip, not a new button) 🗑 Delete (DeleteRegular) 🔍 Open-in-linear (OpenRegular) - Each callback is optional; undefined hides the button entirely so PR5c ships before per-action runner wiring. - Wrapper emits `data-tree-action-rail` + `data-tree-node-id` for DOM scoping (PR5d's edge `+` chip will use the rail element's position as an anchor reference). - frontend/src/components/Tree/actionRail.styles.ts - Fluent makeStyles for the rail row layout (horizontal flex, tokens-spaced gap, top border via stroke2 token). Visibility defaults to always-visible in PR5c; the hover/focus visibility-flip per the design wires alongside Stack rendering (PR5e) when CardFrame grows a hover handler. - frontend/src/components/Tree/actionCallbacksContext.ts - ActionCallbacksContext (React context, default null). - useActionCallbacks() returns ActionCallbacks | null. - Lives in its own module so the adapter stays pure and a callbacks-only-change render doesn't re-run conversationTreeToReactFlow. - frontend/src/components/Tree/TreeCanvas.tsx (modified) - Added optional `actionCallbacks?: ActionCallbacks` prop. When supplied, wraps ReactFlow in <ActionCallbacksContext.Provider value={actionCallbacks}>. When omitted, provider value is null and cards skip the rail entirely (preserves the PR5a/PR5b "display only" use). - The adapter is NOT in the actionCallbacks dependency list — changes to callbacks don't perturb the tree's adapter output, preserving identity-stable nodes/edges for react-flow's reconciler. - frontend/src/components/Tree/nodeCards.tsx (modified) - CardFrame consumes useActionCallbacks(); when non-null, renders <ActionRail nodeId={nodeId} callbacks={callbacks} branchLabel={branchLabel} />. - Every card forwards its kind-appropriate `branchLabel`: RootPromptCard → "Clone tree" (the operator-facing language for a root clone), every other card → "Branch from here". - frontend/src/components/Tree/actionRail.test.tsx - 20 tests across four sections: 1. ActionRail in isolation — opt-in/opt-out rendering per callback presence, disabled Branch-subtree slot, click invocations with the right nodeId 2. TreeCanvas integration — rail per card, Refresh-click on a specific card invokes onRefresh with that card's id, root vs non-root branchLabel, callbacks-omitted renders zero rails (back-compat) 3. Accessibility — aria-label on every button, tooltip on the disabled V1.1 button, data attributes for DOM scoping 4. CardFrame integration — rail doesn't break data-tree-node-id wrapper attribute (PR5a/PR5b selector contract survives) Notable shape decisions - Context vs prop-threading for callbacks. The adapter (PR5a) is pure and identity-stable; threading callbacks through `data` on every adapter node would force re-adaption on every callback-prop change. Context lets the rail consume callbacks where it renders, leaving the adapter untouched. Trade-off: cards become non-pure (they consume context); accepted because the alternative breaks the adapter's identity contract for react-flow's reconciler. - branchLabel is a CardFrame prop, not derived inside ActionRail. The card knows the kind ("root prompt" → "Clone tree"); the rail doesn't and shouldn't. Pushing branchLabel into ActionRail via the cards keeps the rail kind-agnostic so PR5d/PR5e can reuse it on the Stack card without inventing new render rules. - Tooltip `relationship="description"` rather than `relationship="label"`. The Fluent label-relationship pattern uses aria-labelledby pointing at the tooltip content; the tooltip only renders the content when shown, so the accessible name comes from the hover tooltip alone. Under jsdom (and arguably under screen-reader-without-hover), this means the button has no name. Switched to `description` and kept an explicit aria-label on every button so the accessible name is permanent. - V1.1 Branch-subtree button always renders (disabled). Per the operator-facing convention: V1.1 enablement is a state flip, not a new affordance appearing. Keeping the slot reserved prevents an operator's muscle memory from forming around four buttons that suddenly become five. The `title` attribute carries the operator-friendly explanation ("coming in a future release") — no V1.0/V1.1 release labels in operator-facing copy. - Five callbacks, not nine. Per the rubber-duck principle of "don't speculate," PR5c ships the common-to-every-node rail only. The seven kind-specific actions (per-card edit, converter, role, re-run-N, attachment, target-override, view- raw-response) each carry their own UI work (palette, role cycler, count-prompt, etc.) and don't share a callback surface. Each lands when its full interaction is ready. - Callbacks accept nodeId only, not (treeId, nodeId). The host that mounts TreeCanvas already knows the treeId — the callbacks close over it at the consumer's call site: `onRefresh={(id) => runner.refreshNode(treeId, id)}`. Adding treeId to the rail's callback signature would be speculative complexity for a single-tree V1.0 canvas. TDD narrative Started with actionRail.test.tsx — 20 cases pinning callback invocations, opt-in rendering, the disabled V1.1 slot, and the rail's behavior when threaded through TreeCanvas. RED was TS2307 on './actionRail'. Implementation took three corrective passes: 1. Initial ActionRail used `<Tooltip relationship="label">` — caused all buttons except the first to lose their accessible name in jsdom. Switched to `relationship="description"` and kept explicit aria-label on every button. 2. userEvent.click on a button inside a react-flow node card throws "Cannot read properties of null (reading 'document')" — react-flow's pointerdown handler dereferences a null window owner inside jsdom. Switched click tests to fireEvent.click which dispatches a single MouseEvent without pointer-event machinery. 3. screen.getByRole filters out visibility-hidden elements; react-flow renders nodes with `visibility: hidden` until its layout pass runs (which jsdom never triggers). Switched integration tests to container.querySelectorAll scoped by data-tree-node-id, matching the PR5a TreeCanvas test pattern. Defects surfaced during TDD - Fluent Tooltip's `relationship="label"` is operator-hostile in non-interactive renders: the button's only accessible name is the tooltip content, but the tooltip content isn't in the DOM until hover. Screen readers without hover navigation lose the name. Documented inline in actionRail.tsx that the `description` relationship is the right choice when buttons also carry an explicit aria-label. - react-flow's `visibility: hidden` on un-laid-out nodes confuses testing-library's role-based queries. Tests using role queries inside the canvas need to use the data-tree-node-id wrapper-scoped pattern instead. Worth a note for PR5d-g test authors. - userEvent.click → react-flow pointerdown handler → NullPointerException in jsdom. fireEvent.click is the workaround for interactive testing inside the canvas. Both patterns documented inline at the test callsites. Verification Tests: 952 frontend passing (932 prior + 20 new). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 97.53 / 95.74 / 100 / 97.5 (was 96.66 / 94.28 / 100 / 96.61 in PR5b.1). actionRail.tsx: 100/100/100/100 actionRail.styles.ts: 100/100/100/100 actionCallbacksContext.ts: 100/100/100/100 TreeCanvas.tsx: 100/100/100/100 nodeCards.tsx: 100/96.66/100/100 (the rail-render branch adds a single unmeasured combination — context-null vs non-null × per-card; defensible) conversationTreeToReactFlow.ts: 85.71/85.71/100/85.71 (unchanged; never-typed default arm) Open rubber-duck items still pending (unchanged from PR5b.1) - DTO original_prompt_id nullability. - Citation-strip discipline (partition.ts + wave.ts + shim.ts legacy refs; new code stays clean). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs (drain-outside-the-lock, retry-failed-in-shim, LockAcquireResult DU, SendCard inline lastError). - shim drain loop call-stack serialization. Next slice PR5d — per-edge `+` chip + insert-on-edge popover. Adds a custom edgeTypes entry to TreeCanvas, an onEdgeInsert callback to the ActionCallbacks surface, and a Fluent Popover with kind-aware insert options per the upstream-node-kind contract. Reuses the data-tree-node-id wrapper-scoping pattern from PR5b.1 for position anchoring.

The per-edge `+` chip + insert popover per the operator-facing insert-on-edge affordance. Adds a custom react-flow edge component (InsertEdge) that wraps SmoothStepEdge with a midpoint chip; clicking the chip opens a Fluent Menu whose options vary by the upstream node's kind (only legal next-node types render, hiding illegal ones is cheaper than enabling-with-error). Selecting an option fires the host-supplied onEdgeInsert callback with a discriminant (EdgeInsertKind) naming the chosen action. What ships - frontend/src/components/Tree/InsertEdge.tsx - Custom react-flow edge component (replaces 'smoothstep' as the default edge type for Tree adapter output). - Renders BaseEdge for the orthogonal stroke + an absolute- positioned chip at the midpoint via EdgeLabelRenderer (with a fallback inline render for test environments where the EdgeLabelRenderer portal target isn't mounted). - Chip suppression rules: - onEdgeInsert callback absent → no chip - parent kind is `score` (terminal) → no chip - parent kind is `fan` (children managed via FanCard +) → no chip - menuForParent() builds the kind-aware option list: - root_prompt / import_message: Follow-up + Inject + Send + Fan attempt + Fan converter (+ V1.1 disabled stubs) - user_turn: Send + Append converter + Fan converter (+ V1.1 disabled stubs) - send: Follow-up + Inject + Score + Fan attempt + Fan converter (+ V1.1 disabled stubs) - Branded source/target back to ConversationTreeNodeId at the onEdgeInsert callback boundary so hosts receive the same brand the runner uses everywhere. - frontend/src/components/Tree/insertEdge.styles.ts - makeStyles for the chip wrapper (absolute + pointer-events:all) and the chip button (20px circular Fluent Button). - frontend/src/components/Tree/treeEdgeTypes.ts - Edge-type registry: `{ insert: InsertEdge }`. Passed to ReactFlow's `edgeTypes` prop. Sibling to treeNodeTypes.ts. - frontend/src/components/Tree/actionRail.tsx (modified) - Added `EdgeInsertKind` exported type (discriminant for onEdgeInsert). - Added `onEdgeInsert?: (parentId, childId, kind) => void` to `ActionCallbacks`. Per-callback opt-in pattern from PR5c preserves: undefined hides the chip. - frontend/src/components/Tree/conversationTreeToReactFlow.ts (modified) - Edge data now carries `parentKind: ConversationTreeNodeKind` so InsertEdge can pick the kind-aware menu without a tree lookup at render. Built once per adapter call (O(nodes) hash). - Edges now emit `type: 'insert'` (was `'smoothstep'`) so ReactFlow routes them to the InsertEdge component. - Added PLACEHOLDER_WIDTH/HEIGHT (260×80) on every node so the reconciler treats nodes as measured ahead of layout. PR5g's layout pass overrides positions; the dims unblock edge rendering in environments without real ResizeObserver. - frontend/src/components/Tree/conversationTreeToReactFlow.test.ts (modified) - Renamed the edge-type assertion to reflect the 'insert' type (was 'smoothstep' in PR5a). - frontend/src/components/Tree/TreeCanvas.tsx (modified) - Wired `edgeTypes={treeEdgeTypes}` alongside the existing `nodeTypes={treeNodeTypes}` registration. - frontend/src/components/Tree/edgeInsert.test.tsx - 23 tests across six sections: 1. chip presence/suppression (callback present/absent, context null, score/fan parents) 2. kind-aware menu options (one test per parent kind + V1.1 disabled axes) 3. callback invocation (one test per kind discriminant + disabled-items-don't-fire) 4. accessibility (aria-label "Insert after X") 5. adapter parentKind contract (edge data carries source kind; fan parents emit parentKind='fan') 6. registry smoke test (treeEdgeTypes.insert === InsertEdge) Notable shape decisions - Test approach: direct InsertEdge mount inside ReactFlowProvider, not TreeCanvas → react-flow → edge render. Reason: react-flow's edge layer is gated on full node measurement (handleBounds populated via ResizeObserver), which jsdom can't simulate cleanly without invasive setupTests changes (DOMMatrixReadOnly stub, ResizeObserver fire-on-observe, getBoundingClientRect override). Per PR5c's rubber-duck lesson "don't fight react- flow internals," the integration is covered by the adapter's `edge.type === 'insert'` assertion + the registry's `treeEdgeTypes.insert === InsertEdge` assertion; the InsertEdge component itself is tested via direct mount. - EdgeLabelRenderer fallback. Production wraps the chip in EdgeLabelRenderer (portals out of the SVG into a fixed layer above the canvas so the HTML chip renders over the SVG path). In tests the portal target (`.react-flow__edgelabel-renderer` div) doesn't exist — InsertEdge checks `useStore(s => Boolean(s.domNode?.querySelector(...)))` and falls back to inline render. Visual is identical in jsdom (no layout); in production the portal path is taken. - parentKind on edge data, not derived at render. The adapter computes parentKind once per edge (O(nodes) hash lookup); the InsertEdge consumes data.parentKind without any tree-side re-query. Keeps the edge component pure and avoids context lookups for what's essentially adapter-state. - One callback (onEdgeInsert), one discriminant (EdgeInsertKind). The host receives (parentId, childId, kind) and decides how to splice the new node. Alternative: one callback per kind (`onInsertFollowUp`, `onInsertSend`, etc.). Rejected because the host's tree-edit logic is typically a single function `insertBetween(parent, child, kindToBuild)` — splitting forces seven near-identical wrappers. - V1.1 fan axes (`fan_prompt`, `fan_target`) reserve menu slots as DISABLED items, not absent. Same operator-facing convention as PR5c's Branch-as-subtree button: V1.1 enablement is a state flip, not a new affordance appearing. Disabled-stub strings use "(coming later)" — no "V1.1" release labels in operator copy. - PLACEHOLDER_WIDTH/HEIGHT (260×80) on adapter output nodes. React-flow won't render edges until source + target nodes have measured dimensions; supplying them up-front (via node.width/height per the NodeBase interface) lets edges render before the ResizeObserver loop completes. Production cards report their real size via ResizeObserver on mount; these placeholders are the until-then value. PR5g's layout pass owns positions, not dimensions. - Fan-child edges DO emit parentKind='fan' so InsertEdge suppresses the chip even when an operator selects a fan-child edge. Adding a chip there would be operator-hostile — the + button next to a fan-child edge would compete with the FanCard's own `+ Add variant` button. TDD narrative Started with edgeInsert.test.tsx — 23 cases pinning chip presence/suppression, the per-parent menu options, callback invocation, and the adapter contract. RED: TS2305 on EdgeInsertKind + TS2353 on onEdgeInsert (member missing from ActionCallbacks). Implementation took three corrective passes: 1. Initial assumption: edge components would render inside TreeCanvas via the canvas integration. jsdom + react-flow's handleBounds gate kept edges out of the DOM entirely. Pivoted to direct-mount tests; covered the canvas-level contract via the adapter test + a registry smoke test. 2. EdgeLabelRenderer's portal target only exists inside <ReactFlow>, not the bare ReactFlowProvider used by direct mount. Added a portal-target check via useStore + an inline render fallback. Production keeps the portal path. 3. Two type-check failures from the main tsconfig (not the test config): readonly fanAxes mismatch with the InsertMenu interface (fixed: typed fanAxes as ReadonlyArray); plain string source/target passed to a branded-id callback (fixed: cast at the callback boundary). Defects surfaced during TDD - jsdom + react-flow edge measurement: the standard no-op ResizeObserver mock in setupTests.ts prevents react-flow from populating node.internals.handleBounds, which gates edge rendering entirely. Tried upgrading the ResizeObserver mock to fire on observe(); revealed DOMMatrixReadOnly is also absent from jsdom (react-flow's transform-decoder throws); reverted. The right pattern is to test components that depend on full canvas state via direct mount, not via TreeCanvas. - EdgeLabelRenderer's portal target is mounted by <ReactFlow>, not ReactFlowProvider. Tests using direct mount need either a portal-target fallback in the component (chose this) or a full ReactFlow mount with the handleBounds workaround above (rejected as too invasive). - Main tsconfig is stricter than tsconfig.test.json for branded- id types. The test passes plain strings as source/target on the synthetic EdgeProps; in production react-flow emits plain strings too, so the cast at the callback boundary is the permanent shape. Verification Tests: 975 frontend passing (952 prior + 23 new). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 97.14 / 89.87 / 100 / 97.08 (was 97.53 / 95.74 / 100 / 97.5 in PR5c). InsertEdge.tsx: 95.65/83.33/100/95.55 (two uncovered branches live in parentLabel() switch fall-throughs that the suppress-chip-on-score/fan rule make unreachable — they exist for type exhaustiveness) insertEdge.styles.ts: 100/100/100/100 treeEdgeTypes.ts: 100/100/100/100 actionRail.tsx: 100/100/100/100 (onEdgeInsert added to ActionCallbacks; no rail-side change) conversationTreeToReactFlow.ts: 90.9/77.77/100/90.47 (one uncovered branch is the `nodeKindById.get(parentId) ?? 'root_prompt'` fallback for an orphan edge — unreachable in a well-formed tree) All other modules: 100/100/100/100 Open rubber-duck items still pending (unchanged from PR5c) - DTO original_prompt_id nullability. - Citation-strip discipline (legacy partition.ts + wave.ts + shim.ts refs; new code stays clean). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs. - shim drain loop call-stack serialization. New from PR5d: - Test pattern: TreeCanvas-level integration tests can't drive react-flow's edge layer in jsdom. Documented in the edgeInsert.test.tsx header so PR5e/PR5f authors take the direct-mount approach for any test that depends on edge rendering. Next slice PR5e — Fan-Children Stack rendering. FanCard + adapter conspire to render N identical-subtree fan children as a single stacked card. Builds on the slotIndex field already on edge data (PR5a) + parentKind=fan suppression (this PR).

The Fan-Children Stack: when an attempt-axis fan has N >= 2 structurally-identical children, the FanCard renders a single inline summary ("Send ×10, 9 ✓, 1 ⚠") instead of N separate child cards on the canvas. Auto-collapse for N > 3 by default; operator-toggleable via a ⊞/⊟ button on the FanCard. Builds on the slotIndex + parentKind edge data from PR5a/d and the chip-suppress-on-parent=fan rule from PR5d. What ships - frontend/src/components/Tree/fanStack.ts - `isStackable(tree, fanId)` — predicate: fan kind + attempt axis + N >= 2 + all children's subtrees structurally identical (recursive shape + kinds; params + state may differ per the design's "execution may differ" note). - `defaultCollapsedFanIds(tree)` — subset of stackable fans with N > 3 (the auto-collapse threshold). Smaller stackable fans render expanded by default but can be manually collapsed. - `computeStackAggregate(tree, fanId)` → `{ childKind, total, byState: Record<NodeState, number> }`. Total + per-state counts feed the FanCard's collapsed body. - All functions are pure tree-walkers — no react, no DOM, no side effects. Indexed via a per-call children-by-parent map so the predicate is O(tree-size) even on deeply-nested fans. - frontend/src/components/Tree/stackCollapseContext.ts - `StackCollapseContext` (React context, default null). - `StackCollapseValue = { collapsedFanIds: ReadonlySet<NodeId>; toggleStack: (fanId) => void }`. - `useStackCollapse()` returns the value or null when no provider is mounted — cards rendered outside a TreeCanvas (per-card tests) skip the toggle entirely. - frontend/src/components/Tree/conversationTreeToReactFlow.ts (modified) - New `TreeFlowAdapterOptions { collapsedFanIds? }` parameter. - When `collapsedFanIds` is supplied, the adapter: (a) walks each collapsed fan's subtree and collects all descendant ids (`collectHiddenDescendants` BFS) (b) drops those descendants from `nodes` AND drops every edge whose source or target is hidden (c) attaches `stackedSummary: StackAggregate` to the collapsed fan's `data` so FanCard renders the stack body - The FanNode discriminant's data type widened to `{ node: FanNode; stackedSummary?: StackAggregate }`. Backwards- compatible: existing per-kind component types are unchanged. - Omitted/empty `collapsedFanIds` behaves identically to PR5d (no collapse, full tree). The EMPTY_SET sentinel skips the filter entirely for the no-op case. - frontend/src/components/Tree/nodeCards.tsx (modified) - FanCard reads `data.stackedSummary` and renders an inline `StackSummaryBody` component when present: shows "send ×10" + a status line `4 ✓, 1 ●, 1 ⚠, 5 ⧖` built from the aggregate's byState counts. - FanCard renders a `StackToggleButton` when `useStackCollapse()` returns a non-null value. Button icon flips between ArrowMinimizeRegular (currently expanded → click to collapse) and ArrowMaximizeRegular (currently collapsed → click to expand). - aria-label flips ("Collapse to stack" / "Expand stack") so the accessible name matches the action the click will perform. - frontend/src/components/Tree/nodeCards.styles.ts (modified) - Added `stackSummary` slot (dashed-border body inside the fan card; flex-col with kind label + status line), `stackKindLabel` (semibold), `stackStatusLine` (monospace, subdued color). - frontend/src/components/Tree/TreeCanvas.tsx (modified) - Owns the `collapsedFanIds: Set<NodeId>` state via useState, seeded from `defaultCollapsedFanIds(tree)`. - `lastTreeId` sentinel watches `tree.id`; when the operator swaps to a different tree, the collapse state is reseeded from that tree's default-collapsed set. (The runner mutates trees in place during waves, so we watch the id, not the reference.) - `toggleStack(fanId)` flips the fan id's membership; passed into the context value. - StackCollapseContext.Provider wraps ReactFlow alongside the existing ActionCallbacksContext.Provider. - The adapter is now called with `{ collapsedFanIds }`; the useMemo deps include the set so toggles re-adapt. - frontend/src/components/Tree/fanStack.test.ts - 20 tests across three describes: `isStackable` (positive, negative, kind/axis/N edge cases), `defaultCollapsedFanIds` (auto-collapse threshold, multi-fan), `computeStackAggregate` (mixed states, empty/non-fan defaults). - frontend/src/components/Tree/conversationTreeToReactFlow.test.ts (modified) - 7 new tests for the `collapsedFanIds` adapter option: child filtering, edge filtering, recursive subtree filter, stackedSummary attachment, no-summary-when-uncollapsed, omitted-options backwards-compat, multi-fan independence. - frontend/src/components/Tree/fanStackCanvas.test.tsx - 15 tests covering: - FanCard summary body rendering (kind × count + status line with ✓/●/⚠/⧖ counts) - FanCard toggle button presence / context-null hiding - toggle click → toggleStack(fanId) - aria-label flips between Collapse/Expand - TreeCanvas auto-collapses N>3 stackable fans - N=3 NOT auto-collapsed (boundary) - converter-axis NOT auto-collapsed (predicate excludes) - toggle round-trip: collapse → expand → collapse - tree-id-change reseeds the collapse state Notable shape decisions - Stack state lives at TreeCanvas, not on the domain tree. The collapse decision is a UI affordance, not authoring state — mutating ConversationTree to track it would leak through the runner contract + persistence layer. TreeCanvas-internal state means the host doesn't see it, and a tree-id swap correctly reseeds without needing collapse-state migration. - Adapter takes `collapsedFanIds` as an OPTION, not a required param. Existing callers (PR5a-d tests, any future caller that doesn't care about stacks) keep working. Empty set = no-op fast path; the adapter doesn't even build the hidden-id set when nothing is collapsed. - `stackedSummary` lives on `data`, not as a separate prop. The adapter attaches it to the fan node it emits; the FanCard consumes via NodeProps. This keeps the prop surface for cards stable (they all share the `NodeProps<TreeFlowNode>` shape) and the summary is automatically available via the same discriminated-union narrowing PR5b set up. - childKind in StackAggregate is nullable (returns null on empty/non-fan). Defensive against the orphan-fan case (a FanNode with zero children — which the stackable predicate rejects, so this is unreachable in practice, but the type allows a clean default). - Auto-collapse threshold is N > 3 (matches the design doc). Below threshold the stack renders expanded by default but the operator CAN collapse manually via the toggle. This matches the spec's "Collapse to Stack is auto-applied when N>3 and all children are structurally identical; otherwise expanded" language. - Status line uses ✓ / ● / ⚠ / ⧖ glyphs matching the operator-facing convention from the wave-status banner spec (PR6 will reuse these). Lumps `failed + cancelled` into the ⚠ count because both are operator-visible problem states; lumps `draft + edited + stale` into the ⧖ pending count because all three mean "not yet executed." Counts that are zero are omitted from the status line (no noise like "4 ✓, 0 ●, 0 ⚠"). - Toggle icon flip: ArrowMinimizeRegular when expanded (the action is "minimize / collapse"), ArrowMaximizeRegular when collapsed (the action is "maximize / expand"). aria- label matches: "Collapse to stack" or "Expand stack." The icons match the action operators will take, not the current state — operators click the button to do a thing, not to describe a state. - StackSummaryBody renders inside the FanCard body, not as a sibling card or a dedicated stack-card node. The spec's ASCII art shows the stack summary nested inside the fan card's border, which matches this implementation. Avoids an extra react-flow node + edge for the stack, which would double the canvas DOM cost. TDD narrative Three test files in sequence: 1. fanStack.test.ts — pure predicate + aggregate (20 tests). RED was TS2307 on './fanStack'. Implementation straightforward: recursive subtree structural-equality walk + per-axis filter. 2. conversationTreeToReactFlow.test.ts additions (7 tests) for the adapter's new option. Implementation involved widening TreeFlowNode's fan arm to carry the optional summary + adding the descendant-filter pass. 3. fanStackCanvas.test.tsx (15 tests) — the FanCard and TreeCanvas wiring. RED was on the new ⊞/⊟ button + stack-summary body (cards don't render either today). Implementation: extended FanCard with StackSummaryBody + StackToggleButton helpers; TreeCanvas owns the state + provider. All three suites green on the first implementation run after the type-check pass (one lint warning for an unused `styles` binding in FanCard — eliminated by deleting the binding). Defects surfaced during TDD - First FanCard implementation pass had both `const styles = useNodeCardStyles()` at the FanCard level AND inside the helper components, with the parent's binding unused. Lint caught it (no-unused-vars). Removed the parent's binding; each helper calls the hook itself. - Stray .github/workflows/frontend_tests.yml diff appeared in git status (a one-character whitespace change someone made out-of-band). Reverted before commit so this PR's diff is Tree-component-only. Verification Tests: 1017 frontend passing (975 prior + 42 new: 20 fanStack + 7 adapter + 15 fanStackCanvas). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 96.91 / 90.47 / 100 / 98.05 (was 97.14 / 89.87 / 100 / 97.08 in PR5d — branch + line coverage both improved). fanStack.ts: 95.83/90.9/100/98 (the one uncovered line is the orphan-fan `total === 0` early-return in computeStackAggregate — unreachable when called via the adapter because the predicate filters first) stackCollapseContext.ts: 100/100/100/100 TreeCanvas.tsx: 100/100/100/100 conversationTreeToReactFlow.ts: 94.73/89.28/100/96 (improved from 90.9/77.77/100/90.47; the lone uncovered branch is the `collapsedFanIds.size === 0` fast-path skip of `collectHiddenDescendants` — exercised by every existing test but istanbul under-counts default-param branches) nodeCards.tsx: 98.3/92.3/100/100 Open rubber-duck items still pending (unchanged from PR5d) - DTO original_prompt_id nullability. - Citation-strip discipline (legacy partition.ts + wave.ts + shim.ts refs; new code stays clean). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs. - shim drain loop call-stack serialization. Next slice PR5f — Pick / Unpick. The FanCard's MetaRow "pick: slot N" display from PR5b lands its interactive twin: clicking a fan child (in the expanded view, or a member of the stack summary) fires `onPickFanChild(fanId, slotIndex)` and writes `FanNode.params.promotedChildSlotIndex`. V1.0 visual is "dim non-promoted children to ~40% opacity" per the spec's V1.0-simplification of §3.3.

Interactive Pick / Unpick for fan children. Per-child toggle icon on the action rail (CheckmarkCircle outline = pickable, filled = currently picked). Click toggles: picking switches to the clicked slot; clicking the currently-picked icon unpicks (passes null). When the fan is in the collapsed Stack state (PR5e), the FanCard renders a "Pick…" dropdown that lists each member by slot + state; clicking an item picks (or unpicks if already promoted) without having to expand the stack first. Visual: the promoted child gets a brand-color outline (similar to selection); siblings dim to 40% opacity per the V1.0-simplification of the design's §3.3. V1.0 Pick is visual only — the runner doesn't read `promotedChildSlotIndex` yet; V1.1+ scopes Refresh and Stack- edit to the picked attempt. What ships - frontend/src/components/Tree/fanStack.ts (modified) - StackAggregate gains a `members: StackMember[]` field built in slot-index order from the tree's edges. The collapsed-stack Pick popover (PR5f) reads it without doing a tree walk at render time. Exported new `StackMember` interface. - frontend/src/components/Tree/conversationTreeToReactFlow.ts (modified) - New `FanChildInfo` interface (exported): `parentFanId`, `slotIndex`, `promoted`, `dimmed`. - Every TreeFlowNode kind's data widened with optional `fanChildInfo?: FanChildInfo`. Cards consume it; non-fan- children carry undefined. - Adapter builds a `fanChildIndex: Map<childId, {parentFan, slotIndex}>` from tree.edges (one O(edges) pass) then for each node looks up the entry + computes `promoted = parentFan.params.promotedChildSlotIndex === slot` and `dimmed = parentFan.params.promotedChildSlotIndex !== null && != slot`. No per-render tree walks. - frontend/src/components/Tree/actionRail.tsx (modified) - `ActionCallbacks` gains `onPickFanChild?: (fanNodeId, slotIndex | null) => void`. Null = unpick. - `ActionRailProps` gains optional `fanChildInfo?: { parentFanId, slotIndex, promoted }`. When supplied AND `onPickFanChild` is wired, the rail renders a CheckmarkCircle toggle button (outline = pickable, filled = picked). Click: - promoted → invokes onPickFanChild(parentFanId, null) - not promoted (or sibling picked) → invokes onPickFanChild(parentFanId, slotIndex) - aria-label flips between "Pick this attempt" and "Unpick this attempt" to match the action the click will perform. - frontend/src/components/Tree/nodeCards.tsx (modified) - CardFrame gains optional `fanChildInfo?: FanChildInfo` threaded from every card's `data.fanChildInfo`. CardFrame: - emits `data-dimmed` + `data-promoted` attributes on the wrapper for DOM scoping - applies `frameDimmed` (40% opacity) when dimmed=true - applies `framePromoted` (brand-color outline + shadow) when promoted=true - passes `fanChildInfo` into ActionRail - Every card (Root, Import, UserTurn, Send, Fan, Score) now forwards `data.fanChildInfo` into CardFrame. - FanCard's "pick: slot N" MetaRow gains a `title` attr clarifying "Visual focus only. Future releases will scope Refresh and Stack-edit to the picked attempt." (per the rubber-duck's E directive — sets correct operator expectation against the cherry-pick-metaphor disappointment). - MetaRow gains an optional `title?: string` prop. - New `StackPickButton` helper component (inside nodeCards.tsx): Fluent Menu with "Pick…" trigger that lists each stack member by slot + state. Currently-picked item shows `✓ (picked)` and clicking unpicks. Renders ONLY when the fan is collapsed (stack state) AND onPickFanChild is wired. - frontend/src/components/Tree/nodeCards.styles.ts (modified) - New `frameDimmed` (opacity: 0.4) and `framePromoted` (brand-color border + 2px shadow) slots. - frontend/src/components/Tree/fanPick.test.tsx (NEW) - 25 tests across seven describes: 1. adapter — fanChildInfo on fan children (4 tests) - present on fan children, absent on non-fan children - promoted/dimmed flags reflect promotedChildSlotIndex 2. computeStackAggregate — members list (2 tests) 3. CardFrame — fan-child dim / promoted (3 tests) 4. ActionRail — Pick toggle (7 tests) - presence / suppression (callback absent, non-fan child, callback wired) - aria-label "Pick this attempt" vs "Unpick" - click semantics (pick own slot; unpick when promoted) 5. FanCard — pick MetaRow tooltip (2 tests) 6. FanCard — collapsed-stack Pick popover (5 tests) - presence/suppression, menu opens with N items, click invokes onPickFanChild, promoted item shows "(picked)" + unpicks on click 7. TreeCanvas — Pick round-trip via per-child icon (2 tests) - click invokes callback with slot index - promoted child renders data-promoted=true; siblings render data-dimmed=true - frontend/src/components/Tree/fanStack.test.ts (modified) - Two existing strict-equality assertions on computeStackAggregate updated to include the new `members` field. Notable shape decisions (with reviewer rationale) Per a rubber-duck pre-implementation review: - VERDICT FROM REVIEWER: ship-with-these-specific-changes. - Toggle on the per-child icon, NOT separate Unpick affordance on the FanCard. Reviewer (B): "The operator's mental model is 'I picked child microsoft#3.' The reversal is 'unpick child microsoft#3,' not 'go to the parent and clear its pick field.'" Filled-when- picked / outline-when-not is the radio-with-clear primitive every operator already knows. - Collapsed-stack Pick popover NON-NEGOTIABLE per reviewer (C): "The most common workflow is run-N-attempts, pick the best. Hiding Pick when the stack is collapsed → four clicks per decision → feature gets used twice then abandoned." The popover is two clicks (open, pick) and keeps the canvas stable. - `CheckmarkCircleRegular` / `CheckmarkCircleFilled` glyph pair per reviewer (F): "The most boring choice and the most honest one: 'this one is selected as the chosen attempt.' Doesn't oversell the V1.0 semantics, pairs cleanly as filled/outline for toggle state, doesn't collide with the V1.1 SendCard stubs (🎯 Change-target, ★ Pin-as-main-path)." - Both `promoted` AND `dimmed` on adapter output (D): two flags emitted from the adapter, CardFrame reads both. Reviewer: "Don't derive promoted at render time from data.dimmed === false && parent.promotedChildSlotIndex !== null; just emit both flags from the adapter." - MetaRow tooltip with explicit V1.0-visual-only note (E): "Pick (V1.0): visual focus only. Future releases will use this to scope Refresh and Stack-edit." Sets correct operator expectation against the cherry-pick-metaphor disappointment. - Single callback `onPickFanChild(fanId, slotIndex | null)` instead of per-action pair (Pick/Unpick). Host's tree-edit logic is one function; splitting would force two wrappers. Null = unpick is the explicit signal. - Spec confirms single-select: "Promotion stays single-valued per FanNode; branching is the answer when the operator wants 'but I also want to see what attempt microsoft#7 leads to.'" V1.1 doesn't change this — gesture model carries forward. Other decisions: - StackAggregate.members in slot-index order (sorted by adapter) so the popover displays attempts in the order operators expect ("attempt #0, microsoft#1, microsoft#2…"). Slot indices match what the runner uses for hashing and what the FanCard MetaRow displays. - StackMember includes `state` so the popover items can show the per-attempt status (clean / failed / running). The operator's "pick the best" workflow needs to see which attempts succeeded before clicking. PR5f shows the state name; PR5g+ may swap to a per-state glyph. - Auto-clear on promoted-child delete (per reviewer G3) is documented as a host-side contract in the onPickFanChild JSDoc. The UI doesn't have to know about it — the host's tree-mutation layer enforces. If the host violates, the FanCard MetaRow shows "pick: slot N" referring to a missing slot but doesn't crash. - The collapsed-stack popover trigger uses the CheckmarkCircleRegular icon AND a "Pick…" label (with ellipsis). The ellipsis follows the standard convention that a click opens a menu rather than committing an action. TDD narrative Single test file (fanPick.test.tsx) with 25 cases drove the whole PR. RED was a stack of TS errors: - 'fanChildInfo' missing from data types (every kind variant) - 'members' missing from StackAggregate - 'onPickFanChild' missing from ActionCallbacks Implementation in order: 1. Extend StackAggregate with members + StackMember 2. Add FanChildInfo + widen TreeFlowNode data 3. Build fanChildIndex in adapter + compute per-node info 4. Add onPickFanChild to ActionCallbacks 5. Extend ActionRail with fanChildInfo prop + Pick toggle 6. Extend CardFrame with fanChildInfo (data attrs + classes + passthrough to ActionRail) 7. Thread data.fanChildInfo through every card's CardFrame call 8. Add StackPickButton helper + wire into FanCard 9. Add MetaRow title prop + V1.0 tooltip text on FanCard All 25 tests green on first run. Two existing fanStack.test.ts strict-equality tests broke because of the new `members` field. Fixed with one multi-replace. Defects surfaced during TDD - The strict `toEqual<StackAggregate>` pattern in pre-existing fanStack tests is brittle against future StackAggregate extensions. Considered loosening to `toMatchObject` but kept strict — the strict shape forces every future contributor to explicitly think about what's in the aggregate. Cheap to update; loud about new fields. - The original PR5f sketch had Unpick as a separate ✕ button on the FanCard MetaRow. Reviewer's B finding rerouted to a toggle on the per-child icon. The mental-model argument was decisive. - The original PR5f sketch hid Pick entirely when the stack was collapsed. Reviewer's C finding flagged this as the dominant- workflow killer. Added the popover, which costs ~50 LOC and saves operators ~3 clicks per Pick decision. Verification Tests: 1042 frontend passing (1017 prior + 25 new). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 96.99 / 91.78 / 100 / 98.29 (was 96.91 / 90.47 / 100 / 98.05 in PR5e — branch coverage improved). actionRail.tsx: 94.11/95.45/100/100 (one uncovered line is the showPick-false early-return of onPickClick — a no-op guard that runs only if React invokes a stale handler) conversationTreeToReactFlow.ts: 96/92.1/100/96.92 (the uncovered branches are the FanChildInfo orphan-fallback + the no-op-when-not-fan-child path) fanStack.ts: 96.2/89.58/100/98.24 (unchanged uncovered line is the orphan-fan total=0 early-return) nodeCards.tsx: 98.55/94.66/100/100 (uncovered branches live in defensive null-coalesce on optional props — defensible) All other modules: 100/100/100/100 Open rubber-duck items still pending (unchanged from PR5e) - DTO original_prompt_id nullability. - Citation-strip discipline (legacy partition.ts + wave.ts + shim.ts refs; new code stays clean). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs. - shim drain loop call-stack serialization. New from PR5f review (deferred): - PR5e Stack expand-on-Pick-attempt vs popover-instead. Reviewer flagged that auto-expand-on-Pick-intent would also work; we chose the popover instead because it keeps the canvas stable. If operators report "I want to see the expanded children when I pick," revisit then. - Keyboard accelerator for Pick (tabIndex + 'P' shortcut). Reviewer noted this is genuinely good for power operators; deferred to PR5f.1 or a follow-up sub-PR since it's additive to the visible affordance. Next slice PR5g — Buchheim-Walker layout via d3-hierarchy. Overrides the PR5a (0,0) placeholder positions with computed coordinates so the tree renders top-to-bottom without manual zoom. The layout pass also needs to account for the collapsed-stack FanCard height (taller when stack summary + Pick popover are present) so siblings don't overlap.

V1.0 layout: plain `d3-hierarchy.tree()` over the adapter's node + edge output. Overrides the (0,0) placeholder positions emitted by the adapter (PR5a/d) with computed coordinates so the canvas renders top-down without manual zoom. Main-path pinning is V1.1 and not part of this PR. The collapsed-fan filter (PR5e) already trims hidden descendants from the layout input, so d3-hierarchy never sees them. What ships - frontend/src/components/Tree/layoutTree.ts (NEW) - `layoutTree(nodes, edges, options?)` → `Map<nodeId, {x, y}>`. Pure function over the adapter's output; no React, no DOM, no hidden state. - Builds the parent→child relation from `edges` rather than from each node's domain-level `parentId`. This makes the function consume the SAME filtered view react-flow gets — collapsed-fan descendants were already dropped by the adapter. - Uses `d3-hierarchy.stratify()` to build the hierarchy from the flat node + edge lists, then `tree().nodeSize([w, h])` for Buchheim-Walker layout with configurable per-node block size. - Defaults: horizontalSpacing=220 (matches the card min-width), verticalSpacing=140 (generous room for card height + action rail + meta rows). Both overridable via the options bag. - Defensive fallbacks: - empty input → empty Map - cycle / orphan-only input (no roots) → every node at (0,0) so the canvas doesn't crash; bug surfaces visually - `LayoutNode`, `LayoutOptions` types exported for the TreeCanvas consumer + future testing. - frontend/src/components/Tree/layoutTree.test.ts (NEW) - 17 tests across eight describes: 1. coverage — every node gets a coord; filtered subset only; single-node tree; zero-node defensive 2. top-down orientation — root at smallest y; siblings share y 3. linear chain — vertical line (every node shares x) 4. sibling placement — distinct x; middle child centered over parent; left-to-right ordering matches insertion 5. determinism — identical input → identical output 6. nested subtree separation — sibling subtrees do not collide 7. configurable spacing — vertical + horizontal scale correctly 8. TreeCanvas integration probe — produces non-(0,0) coords - frontend/src/components/Tree/TreeCanvas.tsx (modified) - New `useMemo` over `layoutTree(rawNodes, edges)`. Maps each adapter-emitted node through the position lookup; falls back to the adapter's placeholder for any node the layout pass didn't cover (defensive — should never happen). - The memo's deps are the adapter-output references, NOT the tree/collapsedFanIds — so a re-render that doesn't change shape (e.g., a callback-prop change) doesn't re-layout. - frontend/jest.config.ts (modified) - Added `moduleNameMapper` entry redirecting `d3-hierarchy` to its UMD bundle at /node_modules/d3-hierarchy/dist/. The npm package ships ESM as its main entry; ts-jest's transform only hits `.tsx?` and Jest's CJS require trips on the ESM `import` statements. The UMD bundle works under CJS without any transformer. Production (Vite) keeps the ESM path — only the jest transform sidesteps it. - frontend/package.json (modified) - `d3-hierarchy` ^3.1.2 as a runtime dep. - `@types/d3-hierarchy` ^3.1.7 as a devDep. - One new transitive (d3-hierarchy itself; no further chain). Notable shape decisions - Edges as the parent-relation source. The adapter's node has a domain `node.parentId` deeply nested in `data.node`, but the edge list is the post-filter view (collapsed-fan descendants are absent from BOTH nodes AND edges). Consuming edges keeps the layout pass aligned with the adapter's filter rules without needing the layout to re-implement them. - `tree().nodeSize([w, h])` instead of `tree().size([W, H])`. `nodeSize` makes the per-node block fixed and lays out within that — the canvas can grow to fit the tree. `size` would scale the whole layout to fit a fixed bounding box, which clips on large trees. The spec calls for "tight packing"; nodeSize is the right primitive. - No main-path pinning (V1.1). Per spec §4.3, V1.0 ships layer 2 (plain Buchheim-Walker) only. Layer 1 (main-path centerline pinning) requires the SendCard's ★ Pin icon which is also V1.1. Both land together in V1.1 — adding layer 1 here would create a V1.0 surface that nothing else V1.0 reaches. - No adaptive collapse (the third layer per §4.3). PR5e already ships the Fan-Children Stack collapse (the only "adaptive" layer V1.0 needs); the adapter pre-filters its descendants before this layout pass sees them. - Defensive cycle fallback (no roots → every node at origin) rather than throwing. The spec doesn't require the canvas to crash on a malformed view; visual overlap is more diagnostic than a runtime exception that breaks the whole tree view. - moduleNameMapper redirect for d3-hierarchy instead of widening transformIgnorePatterns. The transform path required adding `d3-.*` to the negative-lookahead AND extending the `transform` config to cover `.jsx?` (ts-jest doesn't transform JS by default). The mapper is one config line and doesn't perturb other modules' transform pipeline. - Multi-root forest path explicitly dropped. The original sketch had a "lay each root out, translate them so they don't overlap" loop. The adapter never produces multi-root trees (V1.0 domain contract: one root per tree), so the path was dead code. The cycle fallback covers the "no roots" case; the impossible "multiple roots in a well-formed tree" case is treated identically (each root laid out at the origin; visual overlap surfaces the malformation). TDD narrative Single test file (layoutTree.test.ts) with 16 cases drove the implementation. RED was TS2307 on './layoutTree'. Implementation had two pivots: 1. First implementation pass used `import { stratify, tree } from 'd3-hierarchy'`. Jest barfed: d3-hierarchy is ESM-only and ts-jest's transform only hits TypeScript files. Tried widening transformIgnorePatterns to allow-list `d3-.*` — didn't help because ts-jest still doesn't have a `.js` transform. Switched to moduleNameMapper redirecting to the UMD bundle. 2. The "asymmetric subtree separation" test as originally written expected u2 to sit fully OUTSIDE u1's subtree x-range. Wrong: Buchheim-Walker keeps siblings at the same depth (u2 at depth 1, u1's grandchildren at depth 4); u2's x can sit between u1's grandchildren x-range because they're at different y's. Rewrote the test to assert disjoint-x at the SAME depth (u1 vs u2 at depth 1). Added the malformed-cycle coverage test after the initial implementation to push branch coverage on the no-roots fallback above the 85% threshold. Defects surfaced during TDD - The "different y means non-colliding" insight from the second pivot is important for the V1.1 main-path-pinning layer (different y is exactly what main-path will exploit). Worth a note for the V1.1 implementer. - d3-hierarchy's ESM-only publishing pattern WILL recur every time we add a d3-* package. The moduleNameMapper indirection is per-package; documented inline so the next d3 package gets the same treatment. - The first-write of the layout pass took ~1ms for a 60-node tree (measured ad-hoc via `console.time` during local bring-up). Well within budget; the V1.0 1000-node soft cap is a non-issue at this perf level. Worth re-measuring if PR6's wave-status banner forces frequent re-layouts. Verification Tests: 1058 frontend passing (1042 prior + 16 new + 1 extra for coverage = 17 new layoutTree tests). Backend unchanged. Lint: clean. Type-check: clean (main + contract). Coverage: src/components/Tree directory 96.49 / 90.57 / 100 / 98.55 (was 96.99 / 91.78 / 100 / 98.29 in PR5f — statements + lines slightly higher, branches dipped because of new defensive fallbacks). layoutTree.ts: 93.33/82.6/100/100 (the uncovered branches are defensive guards: visibleIds.has() short-circuits + the out.has() dup-suppression — both unreachable in well-formed input). All other modules: unchanged from PR5f. Global coverage gate: passes (no threshold-fail at end of coverage run). Open rubber-duck items still pending (unchanged from PR5f) - DTO original_prompt_id nullability. - Citation-strip discipline (legacy refs in runner files). - CI gate for the 126 latent test type errors. - PR1 backward-compat fallback corpus verification. - PR4e cancelQueued-race. - dispatch.ts branch coverage at 50% (pre-existing). - Spec drift docs. - shim drain loop call-stack serialization. - PR5f deferred items (auto-expand on Pick-attempt, keyboard accelerator). Next slice PR5 is now feature-complete (PR5a-g + b.1 review). Per the rubber-duck schedule, fire the reviewer on the full UI surface (PR5a-g) before starting PR6. PR6 wires: - cost-guardrail modal (intercepts Refresh at confirmThreshold) - ↻ cost-preview tooltip - wave-status ribbon (canvas-level) - wave-complete toast with the 5-bucket summary - [Retry failed] button + reflog drawer Before PR6, the integration question to settle: does the layout pass need to re-run on per-leaf state pings (e.g., a Send flipping to "running" during a wave)? Today it does NOT — the layout memo deps are the adapter output references, and state changes don't change shape (so the adapter returns the same reference). Verify this stays true when PR6 wires the runner-state-sink integration into TreeCanvas.

…hape (PR5h.1 review) Per PR5 rubber-duck reviewer findings B + D (bundled fix): the adapter mixed pure shape mapping with render-time policy (collapse filter + stackedSummary), and the TreeCanvas layout pass re-ran on every tree ref change. The latter becomes a 60-leaf-wave layout cliff once PR6 wires the runner sink — every state flip would re-layout the canvas. - `applyStackCollapse(shape, tree, collapsedFanIds): TreeFlowAdapterResult` new module owning the Fan-Children Stack collapse policy: - filters descendants of collapsed fans + edges into/out of them - clones collapsed fan nodes with `stackedSummary` attached (no input mutation) - identity behaviour: empty `collapsedFanIds` returns the input shape - `conversationTreeToReactFlow(tree)` simplified to a pure shape pass: - `TreeFlowAdapterOptions` interface removed; no `collapsedFanIds` option; no `stackedSummary` attachment; no `computeStackAggregate` import - still attaches `parentKind` per edge and `fanChildInfo` per fan- child (these are stable derivations from the input, not UI policy) - `TreeCanvas` new pipeline: `adapter → applyStackCollapse → layout`, with `useShapeMemoizedLayout` hook keying layout on a derived shape-key (`nodes.length:ids|edges.length:ids`). Cached positions are returned by reference across renders where shape is unchanged. - `fanChildInfo` stays in the adapter (not in collapse pass). It depends on `promotedChildSlotIndex` (per-Pick state), so the adapter is NOT shape-reference-stable across Pick clicks. That's fine: the layout memo uses a shape-key (string), so even when the adapter returns a new ref, layout doesn't re-run unless the shape-key changes. Putting `fanChildInfo` in collapse would force a deeper split (separate decoration pass) for no operator-observable benefit. - `applyStackCollapse` early-returns the input shape on empty `collapsedFanIds` — preserves shape identity for the common case, saves a defensive clone-and-filter pass. - `useShapeMemoizedLayout` uses `useRef`+conditional-assignment-during- render rather than `useMemo` over the shape-key. `useMemo` would recompute when the underlying `nodes`/`edges` refs change (state flips); the ref cache returns the cached Map even when callers pass new arrays. - `hidden as unknown as ReadonlySet<string>` cast at the call site: react-flow's `Edge.source/target` are plain strings; `hidden` is branded `ConversationTreeNodeId`. Membership is structural at runtime; local cast avoids per-call brand annotations. - RED: created `applyStackCollapse.test.ts` (10 tests covering identity, single + multiple fan collapse, descendant filtering, edge filtering, `stackedSummary` aggregation, and explicit purity on shape and tree). Confirmed TS2307 cannot-find-module via `tsc --noEmit -p tsconfig.test.json`. - GREEN: implemented `applyStackCollapse.ts`. 10/10 pass. - RED for memoization: added 2 tests to `TreeCanvas.test.tsx` spying on `layoutTree`. The "runs once across state-only re-renders" test asserts call count stays constant after 2 re-renders with new tree refs but same shape (id, kind, edge structure). The "re-runs when shape changes" test asserts call count grows when a node is added. - GREEN: refactored `TreeCanvas` to the new pipeline + extracted `useShapeMemoizedLayout`. Both memoization tests pass. - Migrated the moved collapse-option describe block out of `conversationTreeToReactFlow.test.ts` (those tests now live in `applyStackCollapse.test.ts`). Updated `layoutTree.test.ts`'s one callsite that used the old `{ collapsedFanIds }` option to call through `applyStackCollapse` instead. - Initial type-check failure: my `hidden.has(e.source)` and `hidden.has(n.id)` calls in `applyStackCollapse.ts` were branded- vs-string mismatches (react-flow's `Edge.source` is `string`; `hidden` is `ReadonlySet<ConversationTreeNodeId>`). The original adapter avoided this because it filtered `tree.edges` directly (branded). Resolved with a local `hiddenStrings` view cast at the function head, keeping the brand discipline at the boundary. - `npm test --no-coverage`: 1064 passed, 1064 total (was 1059; +10 new applyStackCollapse tests; -7 moved out; +2 new TreeCanvas memo tests) - `npm run type-check`: 0 - `npm run type-check:contract`: 0 - `npm run lint`: 0 warnings - `npm run test:coverage`: - `TreeCanvas.tsx`: 100/90/100/100 (was 100/100/100/100; the new hook has a one-branch first-call path uncovered) - `applyStackCollapse.ts`: 97.43/95/100/100 - `conversationTreeToReactFlow.ts`: 95.12/89.47/100/94.59 (slightly down from 96.49 because the dropped collapse code lost some branches it used to cover) - `layoutTree.ts`: 93.33/82.6/100/100 (unchanged) - All exceed 85/85/90/90 threshold. PR5h.2: `InsertEdge` discriminant-lie fix — replace `kind: 'fan_attempt'` typed-literal-for-disabled-items with a discriminated `InsertMenuOption` union (`{ disabled: false; kind; label }` vs `{ disabled: true; label; reason }`). Reviewer Finding H#1; ~10 LOC plus tests. From PR5 review: - PR5h.2: InsertEdge discriminant safety (H#1) -- NEXT - PR5h.3: edgeInsert.test.tsx vacuous-pass guard (J#4) - PR5h.4: action rail hover-gate CSS (J#6) - PR5h.5-7: editing affordances (UserTurn Edit + RootPrompt Edit + UserTurn Converter palette) -- per user Q1 hybrid decision - Per-child Pick toggle drift -- amend spec §2.4 end-of-V1.0 doc pass (user Q2 = keep + amend) - V1.1 disabled stubs skipped (user Q3 = c; spec §2.4 amendment) - `💬 View raw response` dropped from V1.0 (user Q1 hybrid) - TreeCanvas synchronous-setState-during-render (rev H#2) -- track - Auto-collapse seed only re-runs on tree.id change, not shape change within same tree (rev J#1) -- track for PR6 - `toFlowEdge`'s `parentKind ?? 'root_prompt'` silent default (rev J#2) - Tooltip `relationship="description"` on icon-only buttons (rev J#3) - Card decomposition (450 LOC; rev J#5) -- defer to start of PR6a - Spec drift docs (drain-outside-lock, retry-failed-in-shim, LockAcquireResult DU, SendCard inline lastError) -- end-of-V1.0 - shim drain loop call-stack serialization -- PR6 wave-status surface - Stack-collapse persistence across un-stackable transitions -- PR6 - dispatch.ts 50% branch coverage -- pre-existing PR4c2; not gating

@ts-expect-error

…ty (PR5h.2 review) Per PR5 reviewer Finding H#1: the V1.1-disabled "Fan out: prompt (coming later)" and "Fan out: target (coming later)" items each carried `kind: 'fan_attempt' as const` — a typed lie ("discriminant is unused on disabled items"). The compiler stayed happy because the old `InsertMenuOption` shape required `kind: EdgeInsertKind` and treated `disabled` as optional. When V1.1 adds `'fan_prompt'` and `'fan_target'` to `EdgeInsertKind`, a flag-flip from `disabled: true` to `disabled: false` would silently dispatch the wrong axis until someone notices the literal-`'fan_attempt'` left behind. ## What ships - `InsertMenuOption` is now exported and a discriminated union: type InsertMenuOption = | { readonly disabled: false; readonly kind: EdgeInsertKind; readonly label: string } | { readonly disabled: true; readonly label: string; readonly disabledReason: string } The disabled arm has NO `kind` field. Enabling a disabled item forces a same-commit choice of a real `kind`; an editor that merely flips the discriminant gets a type error. - All `menuForParent` callsites updated to construct each option with the explicit `disabled: false | true` discriminant. Disabled items lose their stale `'fan_attempt' as const` placeholder. - Render handlers narrow via `if (opt.disabled === false) handleSelect(opt.kind)` in both basic and fanAxes loops, replacing the previous truthy- short-circuit `() => !opt.disabled && handleSelect(opt.kind)`. The explicit narrowing satisfies the discriminated union — the short-circuit form does not (TS sees `opt.kind` as possibly absent on the disabled arm). ## TDD narrative - RED: created `insertMenuOption.test.ts` with 5 cases: 1. disabled item without kind compiles. 2. enabled item with kind compiles. 3. disabled item with kind fails to compile (asserted via `@ts-expect-error`). 4. enabled item without kind fails to compile (asserted via `@ts-expect-error`). 5. narrowing: `opt.kind` accessible after `opt.disabled === false`. `tsc --noEmit -p tsconfig.test.json` reported TS2459 (InsertMenuOption not exported) + TS2578 unused @ts-expect-error directives (because the unresolved import collapsed the type to `any`, swallowing the errors the directives expected). - GREEN: refactored `InsertEdge.tsx` to export the union + update all construction + render-time narrowing. 5/5 new tests pass. ## Verification - `npm test --no-coverage`: 1069 passed (was 1064; +5 insertMenuOption) - `npm run type-check`: 0 - `npm run type-check:contract`: 0 - `npm run lint`: 0 warnings ## Next slice PR5h.3: strengthen `edgeInsert.test.tsx` "disabled V1.1 items do NOT invoke onEdgeInsert when clicked" — add `expect(disabledFan).toBeDefined()` before the click so the test stops passing vacuously when no disabled item is rendered (reviewer Finding J#4). ## Open rubber-duck items still pending (unchanged from PR5h.1 commit body; tracking continues to PR5h.7)

…PR5h.3 review) Per PR5 reviewer Finding J#4: two tests in edgeInsert.test.tsx had vacuous-pass shapes that would survive a regression removing the V1.1 disabled stubs. ## What ships - "V1.1 axes (Fan prompt, Fan target) render disabled": the old for-loop body fired `expect()` only when an item matched the regex; an empty filter result passed the test silently. New shape filters first, asserts `v11Items.length >= 2`, then loops. - "disabled V1.1 fan-axis items do NOT invoke onEdgeInsert when clicked": the old `if (disabledFan !== undefined) { ... }` skipped the assertion when no disabled item rendered. New shape asserts `expect(disabledFan).toBeDefined()` before the click and non-null asserts at the click site. ## TDD narrative — proving the strengthening is meaningful After landing the change, I temporarily stripped both V1.1 disabled items (the V1_0_FAN_AXES "prompt"/"target" entries + the user_turn fan-prompt entry) from `InsertEdge.tsx` and re-ran the edge-insert suite. Both strengthened tests failed honestly: Tests: 2 failed, 21 passed, 23 total > expect(disabledFan).toBeDefined() ^ at src/components/Tree/edgeInsert.test.tsx:288:25 > expect(v11Items.length).toBeGreaterThanOrEqual(2) Before the strengthening, both would have passed silently with the same production change. ## Verification - Reverted the temp strip via `git checkout`; original 23/23 green again. - `npm test -- --testPathPatterns=edgeInsert`: 23 passed. ## Next slice PR5h.4: action rail hover-gate CSS — match spec §2.2's "rail floats below each node card on hover/focus" by adding the visibility-on-hover gate via `:hover [data-tree-action-rail]` + `[data-selected="true"] [data-tree-action-rail]` selectors. Reviewer Finding J#6; ~5 LOC + 2 tests (default opacity 0, hover/selected → opacity 1). ## Open rubber-duck items still pending (unchanged from PR5h.1/h.2 commit bodies; tracking continues to PR5h.7)

…4 review) Per PR5 reviewer Finding J#6: spec §2.2 says "rail floats below each node card on hover/focus" but PR5c shipped with `opacity: 1` always ("always-visible until PR5e/PR5f wire the hover behavior"). PR5e/f landed without that follow-up. Operator-visible drift; cheap to fix. ## What ships - `actionRail.styles.ts`: rail default `opacity: 0` + 120ms opacity transition. The fade is gentle enough to feel intentional (no hard-snap on hover) and short enough to not lag a keyboard-walker. - `nodeCards.styles.ts`: frame style adds three nested selectors via Griffel's `&` reference: - `&:hover [data-tree-action-rail]` → opacity 1 - `&:focus-within [data-tree-action-rail]` → opacity 1 (keyboard walker hits a rail button → frame's focus-within fires → rail stays visible while the button has focus) - `&[data-selected="true"] [data-tree-action-rail]` → opacity 1 (selected card keeps its rail visible regardless of pointer) - Two new behavioral tests in `actionRail.test.tsx`: - default opacity is 0 when card is unselected - opacity flips to 1 when frame carries `data-selected="true"` jsdom + Griffel honor the attribute-selector branch; `:hover` and `:focus-within` are CSS-only and verified via code review + Playwright follow-up (jsdom can't simulate either reliably). ## Notable shape decisions - Hover-gate lives on the FRAME style (parent), not the rail style (child). Griffel can't write "if ancestor :hover" from the child; the parent's `&:hover [data-tree-action-rail]` selector is the only honest expression. - `transitionProperty: 'opacity'` + `transitionDuration: '120ms'` is a visual nicety, not a behavior. It costs nothing in jsdom (no transitions fire) and avoids a jarring snap when hovering away. - Did NOT touch the rail's `[data-tree-action-rail]` attribute or any other DOM contract — only the visual opacity changed. ## TDD narrative - Tests written first: 2 new cases in `actionRail.test.tsx §5`. Default-hidden + data-selected-visible. Both fail before the CSS change (opacity is `'1'` for both cases under the old `opacity: 1` rule). - GREEN after applying the styles change. ## Verification - `npm test --no-coverage`: 1071 passed (was 1069; +2 hover-gate) - `npm run type-check`: 0 - `npm run type-check:contract`: 0 - `npm run lint`: 0 warnings ## Next slice PR5h.5: UserTurn `✏ Edit text inline` — first of three V1.0 editing affordances per the Q1 hybrid scope. Inline `<Textarea>` swap on the user-turn card, sets `node.state = 'edited'`, propagates downstream via the runner sink (the sink path is already wired through the ActionCallbacks `onEditParams` slot — actually, no: PR5c shipped `onEditParams` only as part of the common rail design; the inline- edit path needs its own callback. Confirm before writing.). ## Open rubber-duck items still pending PR5h mechanical fixes complete after this. Remaining V1.0 work: - PR5h.5: UserTurn ✏ Edit text inline - PR5h.6: RootPrompt ✏ Edit prompt + target + system prompt - PR5h.7: UserTurn ⚡ Converter palette - All Q-decision spec amendments (per-child Pick, dropped 💬 raw response, no V1.1 stubs) → end-of-V1.0 doc pass - Other carry-forwards unchanged

Wires the last PR7-review dead seam — useAutoReverse — to a real consumer: the History tab's "Open as tree" action (spec §5.12). This completes the §13.1a/§5.12 host-integration surface. Per spec line 18 + §5.12, V1.0 "Open as tree" ships linear+converter reconstruction (multi-conversation fanout detection is V1.1) — so this path uses useAutoReverse (single-AR linear), distinct from the reload path's fan-aware reconstruction (PR7g). What ships: - useAutoReverse: preserves the AR's conversation_tree_id as the reconstructed tree's id (spec §13.1). A V1.0+ AR (label present) keeps that id so subsequent reload/refresh stay consistent; a pre-V1.0 AR (no label) keeps the freshly-minted id from linearChainFromMessages. - TreeRunnerHost: new `openFromAttackResultId?` prop drives useAutoReverse; an effect pushes the reconstructed tree via onTreeChange, gated on the tree id (applied once per reconstructed AR, so the inline onTreeChange re-firing each render doesn't re-push). `autoReverseApi?` test-injectable defaults to the module attacksApi. - AttackTable: optional `onOpenAttackAsTree?` → a BranchRegular icon button next to the existing "Open attack" button. Renders only when the callback is provided (implicitly flag-gated). Threaded through AttackHistory. - App (thin glue): `openTreeFromArId` state + `handleOpenAttackAsTree` (sets the AR id + switches to the tree view); passes onOpenAttackAsTree to AttackHistory only when treeUiEnabled, and openFromAttackResultId to the host. Notable shape decisions: - The id-preservation lives in useAutoReverse (which already fetches the AR) rather than the host, so the hook is self-consistent and unit- tested. Without it, opening a V1.0+ AR would mint a fresh id that disagrees with the AR's labels → reload would find no rows and fall to greenfield. That's a real bug, fixed here. - The host reuses its existing onTreeChangeRef mirror (not a render-time ref write) for the push effect; lastOpenedTreeIdRef gates re-pushes. - "Open as tree" intentionally uses linear reconstruction for ALL ARs per §5.12 V1.0 scope; the fan-aware path is reserved for in-session reload (§9.4.1 / PR7g). Attempt-fan trees opened from History show the linear base — acceptable + spec-sanctioned for V1.0. TDD narrative: 1. useAutoReverse: +2 tests (preserve conversation_tree_id; mint fresh when absent). RED → GREEN (override tree.id from ar.labels). 2. TreeRunnerHost: +2 tests (auto-reverse emits tree via onTreeChange; no-op when openFromAttackResultId null). RED → GREEN (prop + effect). 3. AttackTable: +2 tests (button absent without callback; present + fires, row-open suppressed). RED → GREEN (prop + button). 4. App + AttackHistory: thin glue (untested per the agreed approach). Verification: - jest AttackTable|AttackHistory|useAutoReverse|TreeRunnerHost: 91/91. - npm test: 1374 pass / 70 suites (was 1368). - type-check / lint clean. This closes both PR7-review dead seams (onGuardedSwapReady in PR7i.3a, useAutoReverse here). Remaining: converter + nested fan-aware reload reconstruction (deferred V1.1 follow-up).

…ter fans (PR7g slice 3) Extends fan-aware reload (PR7g slice 2 did attempt fans) to a single root-level `converter` fan, using the flattened topology the operator chose. Nested / multi-axis fans remain on the honest degraded-banner path. Investigation first (Explore subagent, cited): the converter-fan reverse shape is NOT uniquely determined by the forward tree — V1.0's no-cache dispatch sends prepended_conversation=[], so the authored `u_above` level is never persisted. Operator decision: flatten (attach the fan directly to root); re-execution is equivalent (each slot re-dispatches its converter). Per-leaf message fetches accepted for V1.0. What ships: - autoReverse.reconstructTreeWithFans gains `converterResolver?` + `onConverterDivergence?`. When the single fan is a root-level converter fan AND a resolver is supplied, it assembles: root_prompt(text) → fan(converter, variants) → [user_turn(text, converterPipeline=slot converters) → send] × N - root/user_turn text comes from the base leaf's first-turn ORIGINAL value (the authored prompt, not the converted gibberish). - per-slot converters come from reconstructVariantPayloads via the resolver (most-frequent consensus; onConverterDivergence per disagreeing slot). - bails to degraded (returns null → linear) when the base lacks a leading user turn or slots aren't contiguous 0..N-1 (no tombstone guessing in V1.0). - useReloadReconstruction: when the leaf set is a single root-level converter fan, pre-fetches each member leaf's messages (buildConverterResolver) and reads each one's first-user-turn converter_identifiers into a resolver keyed by AR id. Non-converter shapes skip the N fetches. Surfaces a console.warn (not the degraded banner) when reconstructed-but-divergent slots exist. Notable shape decisions: - The pure function stays pure: it takes a synchronous resolver; the async N-leaf fetch lives in the hook. This keeps reconstructTreeWithFans unit-testable without mocking an API. - Slot contiguity gate (0..N-1) avoids guessing deleted-slot tombstone semantics; non-contiguous → degraded. - Corrected the prior PR7g comment that claimed the base leaf's converter "survives via linearChainFromMessages" — it does NOT (root_prompt has no converterPipeline field; a first-turn converter is dropped by linear promotion). The converter now survives via this fan-aware path; the linear-drop for non-fan converted first-turns remains a documented V1.x gap (root prompts don't model converters). TDD narrative: 1. autoReverse: +5 tests — converter fan WITH resolver reconstructs the flattened shape; original_value (not converted) used for text; per-slot divergence → most-frequent + callback; non-contiguous slots degrade; (kept) no-resolver degrades. 2. RED: 3 new tests failed (still degrading); 2 already passed. 3. GREEN: assembleRootConverterFan; 46/46 autoReverse pass. 4. useReloadReconstruction: flipped the old "converter degrades" test to "converter reconstructs (no banner) + fetches each member leaf"; added a nested-fan test to preserve degraded-banner coverage. 55/55. Verification: - jest autoReverse|useReloadReconstruction: 55/55 pass. - npm test: 1379 pass / 70 suites (was 1374). - type-check / lint clean. Remaining fan-reconstruction work (deferred V1.1): nested fans, multi-axis-at-same-position fans, and the linear converter-on-root-prompt modeling gap.

Code-review fixes:\n- resolve pending cost-guardrail approvals on unmount so waves do not hang behind a vanished modal\n- prevent cost and dirty-edit dialogs from stacking in the shared modal slot\n- cancel active waves on TreeRunnerHost unmount and keep StrictMode/Fast Refresh from closing the live lock manager\n- route Open-as-tree through the dirty-edit guard and suppress stale fragment reload while explicit AR reconstruction is in flight\n- page reload reconstruction with backend-safe limit=100 and fail soft when one converter leaf fetch fails\n\nBrowser-driven fixes:\n- mount Tree View directly when the URL carries conversation_tree_id\n- add React Flow controls and minimap\n- enable local node dragging while keeping ad-hoc connecting disabled and edge tab stops suppressed\n- increase default vertical layout spacing to avoid editor/response-card overlap\n- show assistant response previews on response cards from both reconstruction and live dispatch\n- wire integrated root/user-turn editing with stale propagation\n- make response-card refresh run the subtree so interior stale responses are actionable\n- compose sequential sink mutations so live response previews survive recordExecution -> clean transitions\n\nLive validation:\n- created a benign live attack against OpenAIChatTarget_gpt-4o_rr\n- opened it as a tree, edited root prompt/target, refreshed the response, and verified POST /api/attacks + POST /messages succeeded with the card updating in-session\n- verified dirty-edit guard, response preview rendering, controls/minimap, local dragging, and overlap checks in browser\n\nVerification:\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n- changed suites in-band: 10 passed / 232 tests\n- full Jest: 71 passed / 1403 tests (parallel worker teardown warning remains)\n\nOpen design follow-ups:\n- backend AttackSummary does not expose target_registry_name, so historical tree reload/open cannot recover a refreshable target without operator edit\n- edge handles currently sit visually inside cards; consider moving handles to card exterior in the visual polish pass\n- consider response truncation plus expanded/linear-detail view for long outputs\n- decide whether action rails should always show, remain hover/focus gated, or sit over the node edge

What ships:\n- wires response-card Add follow-up prompt action through TreeRunnerHost\n- wires response-card Fan out response attempts and Fan out converters actions\n- wires the existing edge insert menu callback into host-owned structural reducers\n- adds pure tree reducers for append-child, insert-between, and attempt/converter fan wrapping\n\nBehavior:\n- Add follow-up creates an edited UserTurn under the response\n- Fan out response attempts wraps the response edge in an attempt Fan and adds a fresh stale Send slot\n- Fan out converters wraps the response edge in a converter Fan and adds a new editable UserTurn -> stale Send branch\n- Edge insert menu now mutates the tree instead of remaining presentational\n\nValidation:\n- browser-verified Add follow-up prompt, attempt fan, and converter fan creation on a live tree\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n- changed suites in-band: 11 passed / 262 tests\n- full Jest: 71 passed / 1410 tests (parallel worker teardown warning remains)\n\nStill not wired:\n- converter catalog fetch + UserTurn converter palette host persistence\n- fan-child Pick/Unpick host persistence\n- clone/branch tree action\n- delete node/subtree action\n- open path in linear/chat view\n- refresh cost preview tooltip

What ships:\n- fetches converter instances in App and passes them into TreeRunnerHost\n- wires UserTurn converter palette selections through host state\n- wires fan child Pick/Unpick through host state\n- wires refresh cost preview labels using the subtree estimator that matches the Refresh action\n\nValidation:\n- changed suites in-band: 13 passed / 305 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n\nStill intentionally not wired:\n- clone/branch tree action\n- delete node/subtree action\n- open path in linear/chat view\n- deeper visual polish for edge handles/action rail placement/long-message expansion

What ships:\n- moves React Flow connection handles to the card perimeter instead of visually inside the node content\n- styles handles with Fluent/brand-aware colors and a small outline\n- floats the action rail below the card edge as a compact toolbar while preserving hover/focus/selected reveal behavior\n\nValidation:\n- focused visual/component suites in-band\n- changed suites in-band: 14 passed / 334 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n\nRemaining design follow-ups:\n- decide whether the action rail should eventually always show instead of hover/focus reveal\n- add long-message expand/detail behavior or explicit open-linear path

What ships:\n- wires Clone tree / Branch from here to create a new client-side tree with parentConversationTreeId\n- wires Delete for non-root subtrees\n- wires Open in linear view to a lightweight right-side path drawer\n- adds pure reducers for clone and subtree delete\n\nValidation:\n- changed suites in-band: 14 passed / 340 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint\n\nRemaining follow-ups:\n- richer long-message expansion/detail behavior\n- fuller branch-from-node semantics beyond whole-tree clone\n- explicit delete confirmation modal if product wants destructive-action confirmation before V1.0

What ships:\n- hides Delete on root prompt cards\n- routes non-root Delete actions through a confirmation dialog\n- preserves the shared modal slot priority with cost and dirty-edit dialogs\n\nValidation:\n- changed suites in-band: 14 passed / 341 tests\n- npm run type-check\n- npm run type-check:contract\n- npm run lint

What ships:\n- adds Playwright coverage for Tree View greenfield\n- covers History Open as tree reconstruction with response previews\n- covers add follow-up prompt\n- covers response attempt fan creation\n- covers converter fan creation\n\nValidation:\n- VITE_ENABLE_TREE_UI=true playwright test e2e/tree.spec.ts --project mock: 4 passed\n- npm run type-check\n- npm run type-check:contract\n- npm run lint

What ships:\n- themes React Flow minimap, controls, and attribution with Fluent tokens\n- removes white minimap/control chrome in dark mode\n\nBrowser validation:\n- dark mode minimap: rgb(31, 31, 31)\n- dark mode controls: rgb(41, 41, 41)\n- light mode minimap/controls use light neutral surfaces\n\nVerification:\n- TreeCanvas focused tests\n- npm run type-check\n- npm run lint

Captures the next checkpoint after PR7 quality-gate work: target recovery, attempt fan count, branch-from-here subtree semantics, long-response inspection, converter fan UX, auto-layout, action rail/handle polish, delete details, past runs, wave toast, and browser coverage exit criteria.

romanlutz · 2026-06-15T21:56:36Z

Why are there two plus signs? Would it be the same action?

romanlutz · 2026-06-15T21:57:51Z

This could be amazing for TAP!

romanlutz · 2026-06-15T21:58:44Z

Does it work for multimodal content?

romanlutz · 2026-06-15T21:59:05Z

How do I switch back and forth from this to the single chat view?

spencrr · 2026-06-15T22:47:20Z

Why are there two plus signs? Would it be the same action?

Currently there are a lot of cases in the UI that really add to the complexity. In this case, we have the (converter/no converter) which really should be collapsed into an option on the converter fan to use/not use the nullconverter. (likely as a checkbox)

spencrr · 2026-06-15T22:48:07Z

This could be amazing for TAP!

Yeah effectively this is manual TAP!! Curious to think about what we can do to adapt a whole scenario to a conversation and be able to play with the internal messages afterward. More thought required...

spencrr · 2026-06-15T22:48:40Z

Does it work for multimodal content?

MVP no, but no reason not to.

spencrr · 2026-06-15T22:50:31Z

How do I switch back and forth from this to the single chat view?

Another shortcoming is how the tree is currently persisted (via labels which is very hacky). Ideally the backend just has this data model directly.

My long-term vision here is to unify the UIs into one tab with two splits (one side chat, the other tree) and of course could support hiding the tree view for linear chats or user preference. The current chat split in the tree tab (see on the right side) is not up to feature parity with the chat tab.

spencrr · 2026-06-15T22:51:26Z

How do I switch back and forth from this to the single chat view?

My recommendation for now is to just use the chat split in the tree view for most things and if you cannot do a specific feature, then just fallback the entire conversation to the chat tab.

spencrr · 2026-06-15T22:51:38Z

Thanks for the comments, @romanlutz !

romanlutz · 2026-06-16T02:00:14Z

This could be amazing for TAP!

Yeah effectively this is manual TAP!! Curious to think about what we can do to adapt a whole scenario to a conversation and be able to play with the internal messages afterward. More thought required...

But a scenario implicitly means many different attacks that wouldn't be represented together(?)

romanlutz · 2026-06-16T02:01:59Z

How do I switch back and forth from this to the single chat view?

Another shortcoming is how the tree is currently persisted (via labels which is very hacky). Ideally the backend just has this data model directly.

My long-term vision here is to unify the UIs into one tab with two splits (one side chat, the other tree) and of course could support hiding the tree view for linear chats or user preference. The current chat split in the tree tab (see on the right side) is not up to feature parity with the chat tab.

Labels? The prompt IDs and original prompt IDs are tracked on message pieces, right? And related conversations are tracked on attack result. What else is needed?

romanlutz · 2026-06-16T02:02:22Z

How do I switch back and forth from this to the single chat view?

My recommendation for now is to just use the chat split in the tree view for most things and if you cannot do a specific feature, then just fallback the entire conversation to the chat tab.

Ok but what's the mechanism to switch? Is there a button?

spencrr added 30 commits June 15, 2026 13:43

[DOCS]: Initial GUI Tree Plan

b374627

spencrr added 14 commits June 15, 2026 13:43

feat(frontend): improve tree UI MVP

d33ddc4

fixup(ruff): formatting

232f8a9

fixup(tests): fixup failing frontend/backend unit tests

dceeafe

spencrr requested a review from romanlutz June 15, 2026 21:37

spencrr closed this Jun 15, 2026

spencrr reopened this Jun 15, 2026

Conversation

spencrr commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Draft: Complete Tree UI MVP

Summary

High-Level Concepts

Conversation Trees

Path Chat

Refresh Waves

Historical Reconstruction

UI Captures

Merged tree with path chat

Follow-up prompt creates pending response / Attempt fan pruned to picked path

Converter transform branch

Notable Details

Tests

Areas For Improvement

Backend Conversation Tree Persistence

Backend Wave Representation

Richer Reconstruction

Collaboration And Concurrency

Backend-Owned Tree Search And Audit

Appendix

Similar Projects

Uh oh!

romanlutz commented Jun 15, 2026

Uh oh!

romanlutz commented Jun 15, 2026

Uh oh!

romanlutz commented Jun 15, 2026

Uh oh!

romanlutz commented Jun 15, 2026

Uh oh!

spencrr commented Jun 15, 2026

Uh oh!

spencrr commented Jun 15, 2026

Uh oh!

spencrr commented Jun 15, 2026

Uh oh!

spencrr commented Jun 15, 2026

Uh oh!

spencrr commented Jun 15, 2026

Uh oh!

spencrr commented Jun 15, 2026

Uh oh!

romanlutz commented Jun 16, 2026

Uh oh!

romanlutz commented Jun 16, 2026

Uh oh!

romanlutz commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spencrr commented Jun 15, 2026 •

edited

Loading