From 8deaccc2ecbf7e76b23d3ff8d7193e882cccc476 Mon Sep 17 00:00:00 2001
From: os-zhuang <jack@objectstack.ai>
Date: Thu, 11 Jun 2026 08:23:23 +0500
Subject: [PATCH 1/3] docs(adr): ADR-0037 token/scope-tree execution (proposed)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Proposes generalizing the engine's single-program-counter (SuspendedRun.nodeId,
one position) to a token/scope-tree runtime model so a run can pause in more
than one place at once — unblocking durable pause inside parallel branches
(parallel approvals) and loop iterations (batch approvals), which ADR-0019 M1
explicitly forbids today.

The token/scope tree is the runtime dual of ADR-0031's structured regions
(a region instance is a scope) and the established BPMN-engine model
(Camunda/Flowable); replay-based models (Temporal) are rejected because
ADR-0018's open node registry breaks their determinism precondition. Authoring
model and DAG invariant are explicitly preserved (tokens are runtime-only); the
single-token degenerate case keeps today's flows bit-for-bit unchanged.

Status: Proposed — decision doc for review before the core-engine work. Phased
roadmap (2a internal refactor → 2b parallel pause → 2c loop pause → 2d
cancellation/boundary-event foundation). Adds forward-references from ADR-0019
and ADR-0031.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/0019-approval-as-flow-node.md        |   2 +-
 ...31-advanced-flow-node-executors-and-dag.md |   4 +
 docs/adr/0037-token-scope-tree-execution.md   | 235 ++++++++++++++++++
 3 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 docs/adr/0037-token-scope-tree-execution.md

diff --git a/docs/adr/0019-approval-as-flow-node.md b/docs/adr/0019-approval-as-flow-node.md
index d94ee07bd..eddcf2f61 100644
--- a/docs/adr/0019-approval-as-flow-node.md
+++ b/docs/adr/0019-approval-as-flow-node.md
@@ -176,7 +176,7 @@ The open-source / enterprise split is **not** an architectural concern and is **
 A pausing node inside a **subflow** now suspends the whole chain instead of failing the parent.
 Model: **linked runs** (the inter-flow half of the long-term execution-state architecture —
 cf. Step Functions nested executions / Temporal child workflows; the intra-flow half, a
-token/scope tree replacing the single-program-counter continuation, is a separate future ADR).
+token/scope tree replacing the single-program-counter continuation, is [ADR-0037](./0037-token-scope-tree-execution.md)).
 
 - The child's continuation persists under its **own run id** (run identity keeps per-flow version
   pinning, run logs, and `$runId`-based approval/wait correlation intact). The parent suspends at
diff --git a/docs/adr/0031-advanced-flow-node-executors-and-dag.md b/docs/adr/0031-advanced-flow-node-executors-and-dag.md
index 3f048c5f0..4d6f203d2 100644
--- a/docs/adr/0031-advanced-flow-node-executors-and-dag.md
+++ b/docs/adr/0031-advanced-flow-node-executors-and-dag.md
@@ -196,6 +196,10 @@ with best-effort folding of foreign BPMN gateways. BPMN 2.0 **XML**
 - Relaxing the DAG invariant to allow arbitrary cycles (loops are structured
   containers instead).
 - Runtime BPMN boundary events (timer/signal) — interop representation retained.
+- Durable pause *inside* a `parallel` branch or `loop` iteration — the structured
+  constructs run their regions synchronously here. Generalizing the engine's
+  single-program-counter to a token/scope tree (the runtime dual of these
+  regions) is [ADR-0037](./0037-token-scope-tree-execution.md).
 
 ## Already shipped this line of work
 
diff --git a/docs/adr/0037-token-scope-tree-execution.md b/docs/adr/0037-token-scope-tree-execution.md
new file mode 100644
index 000000000..e0e4f26b7
--- /dev/null
+++ b/docs/adr/0037-token-scope-tree-execution.md
@@ -0,0 +1,235 @@
+# ADR-0037: Token / scope-tree execution — durable pause inside parallel branches and loop iterations
+
+**Status**: Proposed (2026-06-11)
+**Deciders**: ObjectStack Protocol Architects
+**Builds on**: [ADR-0019](./0019-approval-as-flow-node.md) (durable-pause node via suspend/resume — *between*-flow chaining added in its 2026-06-10 addendum), [ADR-0031](./0031-advanced-flow-node-executors-and-dag.md) (structured `loop` / `parallel` / `try_catch` constructs, DAG invariant), [ADR-0018](./0018-unified-node-action-registry.md) (open node/executor registry)
+**Consumers**: `@objectstack/services/service-automation` (engine core — `executeNode` / `traverseNext` / `resume`, `SuspendedRun`, `sys_automation_run`), `@objectstack/spec` (`automation/execution.zod.ts`), `../objectui` (Runs panel, flow runner)
+
+---
+
+## TL;DR
+
+The engine tracks a paused run with a **single program counter** — `SuspendedRun.nodeId`,
+one position. That is enough for a linear pause (`approval` / `screen` / `wait`
+on the main path) but cannot represent **two pauses at once**. So the engine
+**forbids pausing inside a `parallel` branch or a `loop` iteration** (ADR-0019 M1
+scope note, `engine.ts`: "durable pause across parallel gateways is out of
+scope"). That blocks the most-requested real shapes: *parallel approvals*
+("finance AND legal sign off concurrently") and *batch approvals* ("route each
+line item over $10k").
+
+This ADR adopts a **token / scope-tree** runtime model — the established
+BPMN-engine representation (Camunda / Flowable). A run's live state becomes a
+**set of tokens** (execution positions) organized in a **scope tree** (the root
+flow, each parallel branch, each loop iteration, each try region is a scope).
+Any token can pause independently; a scope's join is a barrier that completes
+when its child tokens arrive. The flow **authoring model is unchanged** — token
+tree is a pure runtime representation, invisible to the flow JSON, so ADR-0031's
+AI-authored structured constructs and the DAG invariant both stand.
+
+This is a **core-engine rewrite** of traversal + suspend/resume + persistence —
+deliberately phased, with the single-program-counter case becoming the
+degenerate *one-token* tree so today's flows are bit-for-bit unchanged.
+
+## Context — current state (verified 2026-06-11)
+
+- **One position per run.** `SuspendedRun = { runId, flowName, nodeId,
+  variables, steps, context, … }` — a single `nodeId`. Resume restores that one
+  position and calls `traverseNext` from it.
+- **Suspend unwinds the stack.** A pausing node throws `FlowSuspendSignal`,
+  caught at `execute()` / `resume()`, which snapshots the one position. There is
+  no place to record a *second* live position.
+- **Structured constructs run their bodies synchronously.** `loop` / `parallel`
+  / `try_catch` execute their region(s) to completion in-line (ADR-0031). A
+  pausing node inside a region throws the suspend signal up through the
+  container, discarding the other branches / iterations — hence the hard ban.
+- **`parallel` is concurrent but not pausable.** `traverseNext` already runs
+  unconditional out-edges via `Promise.all`, and the `parallel` block joins at
+  block end — but a suspend inside one branch unwinds that branch and the
+  siblings already in flight are **not** cancelled or persisted. Correctness
+  holds only because pause-in-branch is forbidden.
+- **Between-flow pause already works.** ADR-0019's addendum (subflow linked
+  runs, #1693) chains *separate* runs across the subflow boundary. That is the
+  *inter*-flow half and is orthogonal to this ADR — it keeps working unchanged.
+
+The gap is strictly **intra-flow concurrency + pause**: one run, several live
+positions.
+
+## The reframing — why a token tree, and why not the alternatives
+
+A flow run is a **token game** on a graph (BPMN's own mental model): a token
+sits on a node; executing the node moves the token along its out-edges; a split
+turns one token into several; a join consumes several and emits one. The
+engine's "single `nodeId`" is the special case of *exactly one token that never
+splits*. Generalizing to *a set of tokens* is the minimal change that makes
+concurrent pause representable.
+
+- **Why not "serialize the interpreter stack" (Salesforce-Flow style).**
+  Snapshotting the call stack inlines child state into the parent and destroys
+  per-branch run identity; it also does not naturally express *N* independent
+  paused positions. Rejected.
+- **Why not "event-sourced deterministic replay" (Temporal style).** Replay
+  requires every node to be deterministic / idempotent. ADR-0018's **open node
+  registry** lets third-party executors run arbitrary side effects — the replay
+  precondition does not hold for this platform. Rejected. (This is a
+  *generative-ecosystem* constraint, not a taste call: low-code + open plugins ⇒
+  the Camunda branch, not the Temporal branch.)
+- **Why the token/scope tree.** It is the runtime dual of ADR-0031's structured
+  regions: a region instance *is* a scope, a scope's tokens *are* its live
+  positions, and the join *is* the scope barrier. It is locally composable,
+  statically bounded (no back-edges — ADR-0031's DAG invariant is preserved),
+  and is the proven model in every BPMN engine. We are not inventing a runtime;
+  we are adopting the standard one.
+
+## Decision
+
+### D1 — A run's live state is a set of **tokens** in a **scope tree**
+
+- A **token** is one execution position: `{ tokenId, scopeId, nodeId, status }`
+  where `status ∈ { running, paused, completed, cancelled }`.
+- A **scope** is a region instance: the **root** flow, or an instance of a
+  `parallel` branch / `loop` iteration / `try` or `catch` region. Scopes nest by
+  containment → a tree. Each scope records `{ scopeId, parentScopeId, kind,
+  iteration?, joinState }`.
+- A linear flow with no concurrency is a **one-token, one-scope** tree —
+  identical behavior to today (the back-compat anchor).
+
+### D2 — Split / join are scope operations
+
+- Entering a `parallel` block creates one **child scope per branch**, each
+  seeded with a token at its region entry; the block's join barrier records how
+  many branch tokens must arrive.
+- Entering a `loop` creates one child scope per iteration (sequential by default;
+  the model permits concurrent iterations behind a flag — out of scope for v1).
+- A scope **completes** when all its tokens reach its single exit (ADR-0031
+  single-entry/single-exit regions make this well-defined); completion emits one
+  token into the parent scope at the container's ordinary out-edge — the
+  existing "after-block / after-loop continuation."
+
+### D3 — Any token may pause; the scope persists partial progress
+
+- A pausing node (`approval` / `screen` / `wait`) sets **its token** to `paused`
+  and snapshots the **whole tree** (all tokens, all scope join states, the
+  variable scoping per D5). Sibling tokens keep running; the run is `paused`
+  while **any** token is paused or running-then-pausable.
+- The run is `completed` only when the root scope completes with no paused
+  tokens; `failed` per D6.
+
+### D4 — Resume targets a token
+
+- `resume(runId, signal)` gains an optional `tokenId`. With exactly one paused
+  token (today's universal case) it resolves unambiguously — **the existing
+  single-argument resume is unchanged**. Approval/wait/screen already carry a
+  correlation key; the engine maps `correlation → tokenId`.
+- Resuming a token continues traversal **within its scope**; reaching the
+  scope's exit decrements the parent join barrier. When the last branch token
+  arrives, the join fires and the parent continues — possibly itself pausing
+  again elsewhere.
+
+### D5 — Variable scoping is copy-on-write per scope
+
+- ADR-0031 keeps region bodies in the **enclosing variable scope**. With
+  concurrent *paused* branches that share an enclosing map, a naive shared map
+  lets one paused branch's later resume clobber another's reads. The model makes
+  per-scope variable writes **copy-on-write**: a scope sees the enclosing values
+  but its writes are isolated to its scope frame until it joins, at which point a
+  defined **merge policy** folds them back (last-writer-wins by default;
+  loop accumulation via explicit output variables, as today). The merge policy
+  is a named decision point, not left implicit.
+
+### D6 — Failure and cancellation are scope-scoped
+
+- A token failing terminally **fails its scope**; by default the scope's failure
+  **cancels its sibling tokens** in the same parent (interrupt semantics) and
+  propagates up — matching the intuition "if the parallel block can't finish,
+  the block fails." `try_catch` is the structured opt-out: a `try`-scope failure
+  routes to the `catch` scope instead of propagating (ADR-0031, unchanged).
+- Cancellation of a *running* token is cooperative (checked at node boundaries);
+  cancellation of a *paused* token consumes its continuation and records it
+  cancelled. This cancellation primitive is what later unlocks **boundary
+  events / timers** (a separate follow-up ADR builds on it — see Non-goals).
+
+### D7 — Authoring model and DAG invariant unchanged
+
+- The flow JSON does **not** change. Tokens/scopes are runtime-only;
+  `flow.zod.ts` and the designer are untouched. ADR-0031's structured constructs
+  remain the authoring surface and the AI design center (ADR-0010/0011).
+- No back-edges are introduced. Scopes are acyclic single-entry/single-exit
+  regions; iteration stays the loop container's job. The DAG invariant holds.
+
+## Representation — persistence evolution (additive)
+
+`SuspendedRun` / `sys_automation_run` evolve **additively**:
+
+- Keep `nodeId` as the **primary token's** position (the first/only paused token)
+  so existing readers, the Runs panel, and one-pause flows keep working with no
+  change.
+- Add `tokens_json` (and the scope tree) as a new JSON column / field carrying
+  the full set when there is more than one. A row with no `tokens_json` is a
+  one-token run — rehydrated as today. This mirrors the ADR-0019 discipline of
+  not breaking the suspended-run table; the single new additive column is the
+  deliberate exception this feature requires.
+- `resume` continuation, correlation→token mapping, and the cold-boot
+  wait-timer re-arm (#1687) all extend to address a token instead of the run.
+
+`execution.zod.ts` gains a `tokens[]` / `scopes[]` shape on the run log;
+`ExecutionStepLogSchema` already tags steps with `parentNodeId` / `iteration` /
+`regionKind` (ADR-0031 #1505), which the token model formalizes as scope ids.
+
+## Consequences
+
+- **Unlocks** parallel approvals, batch (per-iteration) approvals, concurrent
+  waits, and lays the cancellation primitive for boundary timers/events.
+- **Core risk.** Traversal, suspend/resume, and persistence are the engine's
+  heart; this is the largest change to it since ADR-0019. Mitigated by phasing
+  (below) with the one-token degenerate case as a behavior-preserving anchor and
+  the full existing suite as the regression gate at every phase.
+- **Observability improves**: the Runs panel can show a tree of live positions
+  ("branch ① paused at approval, branch ② done") instead of one node.
+- **No authoring or migration cost** for existing flows — they are one-token
+  trees; the JSON, the designer, and stored runs are untouched.
+- **Subflow linked-runs (ADR-0019 addendum) composes**: a subflow token whose
+  child run pauses stays `paused` in its scope exactly like any other pausing
+  node — inter-flow chaining and intra-flow tokens stack cleanly.
+
+## Sequencing (roadmap)
+
+Each phase ships behind tests; the suite stays green throughout.
+
+1. **2a — Internal token model, zero behavior change.** Represent today's
+   single program counter as a one-token / one-scope tree inside the engine.
+   `executeNode` / `traverseNext` / `resume` operate on tokens; structured
+   containers create child scopes but still run synchronously. No new capability;
+   pure refactor that de-risks the rewrite. *Gate: full suite unchanged.*
+2. **2b — Pause inside `parallel` branches.** The most-requested case (parallel
+   approvals). Join barrier persists partial completion; branch tokens pause and
+   resume independently; D5 copy-on-write + merge lands here.
+3. **2c — Pause inside `loop` iterations.** Batch approvals. Sequential
+   iterations first; the per-iteration scope is the unit of pause.
+4. **2d — Cancellation / interrupt (D6).** Sibling cancellation on scope
+   failure; cooperative running-token cancellation. Unblocks a follow-up ADR for
+   **boundary events / timers** (BPMN interrupting boundaries map onto this).
+
+## Non-goals / deferred
+
+- **Distributed token execution** across workers/nodes. v1 keeps one claimer per
+  run (today's model); tokens are concurrent *within* a process, not sharded
+  across machines.
+- **Parallel I/O speedup as a goal.** Concurrency here is about independent
+  *pause*, not throughput; any wall-clock win is incidental.
+- **Full BPMN boundary-event / event-subprocess semantics.** The cancellation
+  primitive (2d) is the foundation; the node-type surface is a separate ADR.
+- **Concurrent loop iterations** (fan-out map). The model permits it behind a
+  flag; v1 ships sequential iteration only.
+- **Changing the authoring model.** Out of scope by D7 — tokens are runtime-only.
+
+## Relationship to prior ADRs
+
+- **ADR-0019** gave durable pause for a *single* position and (in its addendum)
+  *between*-flow chaining. This ADR generalizes the *within*-flow position from
+  one to a tree. The resume contract and `sys_automation_run` extend additively.
+- **ADR-0031** defined the structured regions; this ADR is their **runtime
+  dual** — a region instance is a scope. The DAG invariant and AI-authoring
+  center are explicitly preserved.
+- **ADR-0018**'s open registry is the reason replay-based models are rejected
+  (D-reframing) and why the token/scope model is the right fit.

From 2d3c6bf6ce519aa574cc0c64543d01b0200eb94d Mon Sep 17 00:00:00 2001
From: os-zhuang <jack@objectstack.ai>
Date: Thu, 11 Jun 2026 08:45:32 +0500
Subject: [PATCH 2/3] docs(adr): revise ADR-0037 after code + industry
 self-review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Self-review against the actual engine (flat shared variable Map; suspend = a
thrown FlowSuspendSignal unwinding the stack; runRegion bans region pause;
traverseNext already fans out unconditional edges via Promise.all) and against
Camunda/Zeebe/Flowable/Step Functions found the first draft adopted the token
tree as a data structure while missing the execution model that makes it work:

- recursion + throw-to-suspend must become an explicit token scheduler (a
  thrown exception cannot pause branch A while branch B keeps running);
- the flat shared variable Map must become hierarchical scope variables (read
  up the tree) — NOT the copy-on-write/merge scheme the first draft invented,
  which no major engine uses;
- sibling token resumes must serialize per run (Camunda's per-instance
  optimistic locking) — so the concurrency is logical, not parallel execution.

Given that true cost, the revision splits the decision into two tracks and
recommends Track A first:
- Track A (now, no engine-core change): multi-instance / aggregating nodes —
  parallel approvals as one `approval` node aggregating N decisions, batch
  approvals as a `map` node. Camunda multi-instance / Step Functions Map shape.
- Track B (deferred, recorded): the general token/scope tree + scheduler +
  hierarchical scoping, started only when a flow needs arbitrary-position
  concurrent pause that multi-instance can't express.

Keeps the correct parts (reject Temporal replay per ADR-0018 open registry;
authoring model + DAG invariant preserved; single-token = back-compat anchor).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/0037-token-scope-tree-execution.md | 408 ++++++++++----------
 1 file changed, 199 insertions(+), 209 deletions(-)

diff --git a/docs/adr/0037-token-scope-tree-execution.md b/docs/adr/0037-token-scope-tree-execution.md
index e0e4f26b7..2effdfea4 100644
--- a/docs/adr/0037-token-scope-tree-execution.md
+++ b/docs/adr/0037-token-scope-tree-execution.md
@@ -1,235 +1,225 @@
-# ADR-0037: Token / scope-tree execution — durable pause inside parallel branches and loop iterations
+# ADR-0037: Concurrent durable pause — multi-instance nodes now, token/scope-tree later
 
-**Status**: Proposed (2026-06-11)
+**Status**: Proposed (2026-06-11) — revised after a code + industry self-review
 **Deciders**: ObjectStack Protocol Architects
 **Builds on**: [ADR-0019](./0019-approval-as-flow-node.md) (durable-pause node via suspend/resume — *between*-flow chaining added in its 2026-06-10 addendum), [ADR-0031](./0031-advanced-flow-node-executors-and-dag.md) (structured `loop` / `parallel` / `try_catch` constructs, DAG invariant), [ADR-0018](./0018-unified-node-action-registry.md) (open node/executor registry)
-**Consumers**: `@objectstack/services/service-automation` (engine core — `executeNode` / `traverseNext` / `resume`, `SuspendedRun`, `sys_automation_run`), `@objectstack/spec` (`automation/execution.zod.ts`), `../objectui` (Runs panel, flow runner)
+**Consumers**: `@objectstack/services/service-automation` (engine core — `executeNode` / `traverseNext` / `runRegion` / `resume`, `SuspendedRun`, `sys_automation_run`), `@objectstack/spec` (`automation/execution.zod.ts`), `../objectui` (Runs panel, flow runner)
 
 ---
 
 ## TL;DR
 
 The engine tracks a paused run with a **single program counter** — `SuspendedRun.nodeId`,
-one position. That is enough for a linear pause (`approval` / `screen` / `wait`
-on the main path) but cannot represent **two pauses at once**. So the engine
-**forbids pausing inside a `parallel` branch or a `loop` iteration** (ADR-0019 M1
-scope note, `engine.ts`: "durable pause across parallel gateways is out of
-scope"). That blocks the most-requested real shapes: *parallel approvals*
-("finance AND legal sign off concurrently") and *batch approvals* ("route each
-line item over $10k").
-
-This ADR adopts a **token / scope-tree** runtime model — the established
-BPMN-engine representation (Camunda / Flowable). A run's live state becomes a
-**set of tokens** (execution positions) organized in a **scope tree** (the root
-flow, each parallel branch, each loop iteration, each try region is a scope).
-Any token can pause independently; a scope's join is a barrier that completes
-when its child tokens arrive. The flow **authoring model is unchanged** — token
-tree is a pure runtime representation, invisible to the flow JSON, so ADR-0031's
-AI-authored structured constructs and the DAG invariant both stand.
-
-This is a **core-engine rewrite** of traversal + suspend/resume + persistence —
-deliberately phased, with the single-program-counter case becoming the
-degenerate *one-token* tree so today's flows are bit-for-bit unchanged.
-
-## Context — current state (verified 2026-06-11)
-
-- **One position per run.** `SuspendedRun = { runId, flowName, nodeId,
-  variables, steps, context, … }` — a single `nodeId`. Resume restores that one
-  position and calls `traverseNext` from it.
-- **Suspend unwinds the stack.** A pausing node throws `FlowSuspendSignal`,
-  caught at `execute()` / `resume()`, which snapshots the one position. There is
-  no place to record a *second* live position.
-- **Structured constructs run their bodies synchronously.** `loop` / `parallel`
-  / `try_catch` execute their region(s) to completion in-line (ADR-0031). A
-  pausing node inside a region throws the suspend signal up through the
-  container, discarding the other branches / iterations — hence the hard ban.
-- **`parallel` is concurrent but not pausable.** `traverseNext` already runs
-  unconditional out-edges via `Promise.all`, and the `parallel` block joins at
-  block end — but a suspend inside one branch unwinds that branch and the
-  siblings already in flight are **not** cancelled or persisted. Correctness
-  holds only because pause-in-branch is forbidden.
+one position — and suspend is implemented as a **thrown exception that unwinds
+the call stack** (`FlowSuspendSignal`). That cannot represent **two pauses at
+once**, so the engine **forbids pausing inside a `parallel` branch or `loop`
+iteration** (`runRegion` converts a suspend inside a region into a hard error).
+This blocks **parallel approvals** ("finance AND legal sign off concurrently")
+and **batch approvals** ("route each line item over $10k").
+
+The tempting answer — adopt a BPMN-style **token / scope tree** (Camunda) — is
+the right *long-term* runtime model but is a **full engine-core rewrite**: it is
+not just a data structure, it forces replacing three coupled things the current
+engine relies on (see [Why the token tree is expensive](#why-the-token-tree-is-expensive-the-real-cost)).
+A code review (below) shows the cost is much larger than the data-structure
+change implies.
+
+**Decision: two tracks.** Ship **Track A — multi-instance / aggregating nodes**
+first: model the actual demand (parallel approvals, batch approvals) as *single
+nodes* that wait for N decisions, the way Camunda multi-instance and AWS Step
+Functions `Map` do. This needs **no change to the engine's execution model** —
+the node owns the fan-out and the aggregation, the run still has one program
+counter. Defer **Track B — the general token/scope tree** until demand exceeds
+what multi-instance covers; this ADR records its design so Track A is built
+toward it, not away from it.
+
+## Context — current state (verified 2026-06-11, against the code)
+
+- **One position per run.** `SuspendedRun.nodeId` is a single node id. `resume`
+  restores that one position and calls `traverseNext` from it.
+- **Suspend is a thrown exception.** A pausing node throws `FlowSuspendSignal`;
+  `executeNode` unwinds, `execute()` / `resume()` catch it and snapshot the one
+  position. The JS call stack *is* the continuation while running; on resume the
+  engine re-derives traversal from the single `nodeId` (it does not restore a
+  stack).
+- **`runRegion` bans pause structurally.** `parallel` / `loop` / `try_catch` run
+  their region(s) through `runRegion`, which catches a `FlowSuspendSignal` and
+  rethrows it as `Error("durable pause inside a structured region … is not
+  supported")`. That is where the ban lives.
+- **Two concurrency sources, not one.** Besides the structured `parallel` node,
+  `traverseNext` already runs a node's **multiple unconditional out-edges
+  concurrently via `Promise.all`** — raw graph fan-out. A suspend in either path
+  unwinds and the siblings are not cancelled; correctness holds only because
+  pause-in-branch is banned.
+- **Variables are one flat shared `Map`.** `Map<string, unknown>` is shared by
+  the whole run *and* every region/branch/iteration — there is **no scoping**.
+  Loop iterations overwrite the iterator var in place; node output is written as
+  `variables.set('${nodeId}.${key}', …)`. ADR-0031 deliberately runs regions "in
+  the enclosing variable scope," i.e. on this same flat map.
 - **Between-flow pause already works.** ADR-0019's addendum (subflow linked
-  runs, #1693) chains *separate* runs across the subflow boundary. That is the
-  *inter*-flow half and is orthogonal to this ADR — it keeps working unchanged.
+  runs, #1693) chains *separate* runs across the subflow boundary — orthogonal
+  to this ADR and unchanged by either track.
 
 The gap is strictly **intra-flow concurrency + pause**: one run, several live
 positions.
 
-## The reframing — why a token tree, and why not the alternatives
-
-A flow run is a **token game** on a graph (BPMN's own mental model): a token
-sits on a node; executing the node moves the token along its out-edges; a split
-turns one token into several; a join consumes several and emits one. The
-engine's "single `nodeId`" is the special case of *exactly one token that never
-splits*. Generalizing to *a set of tokens* is the minimal change that makes
-concurrent pause representable.
-
-- **Why not "serialize the interpreter stack" (Salesforce-Flow style).**
-  Snapshotting the call stack inlines child state into the parent and destroys
-  per-branch run identity; it also does not naturally express *N* independent
-  paused positions. Rejected.
-- **Why not "event-sourced deterministic replay" (Temporal style).** Replay
-  requires every node to be deterministic / idempotent. ADR-0018's **open node
-  registry** lets third-party executors run arbitrary side effects — the replay
-  precondition does not hold for this platform. Rejected. (This is a
-  *generative-ecosystem* constraint, not a taste call: low-code + open plugins ⇒
-  the Camunda branch, not the Temporal branch.)
-- **Why the token/scope tree.** It is the runtime dual of ADR-0031's structured
-  regions: a region instance *is* a scope, a scope's tokens *are* its live
-  positions, and the join *is* the scope barrier. It is locally composable,
-  statically bounded (no back-edges — ADR-0031's DAG invariant is preserved),
-  and is the proven model in every BPMN engine. We are not inventing a runtime;
-  we are adopting the standard one.
+## Why the token tree is expensive (the real cost)
+
+A self-review against Camunda/Zeebe/Flowable and the actual code found that the
+token/scope tree is a *data structure* whose value only appears when paired with
+**three execution-model changes the current engine does not have**. Adopting the
+tree without these (as a first draft of this ADR did) is adopting the noun
+without the verb.
+
+1. **Recursion + throw  →  an explicit token scheduler.** Today execution is
+   recursive `executeNode` and suspend is a thrown unwind. You cannot
+   simultaneously "pause branch A" and "keep branch B running" with a thrown
+   exception — `Promise.all` rejects on A's throw while B keeps mutating the
+   shared map *after* the snapshot. Camunda/Zeebe instead run a **command/job
+   queue**: pop a runnable token, advance it one step, persist; a token that
+   hits a wait state simply stops being runnable (no exception). Concurrent pause
+   *requires* this scheduler — it is the core rewrite, not a refactor.
+
+2. **Flat shared map  →  hierarchical scope variables.** Camunda resolves a
+   variable by walking **up the execution tree** (token scope → parent → … →
+   process instance); a write defaults to the current scope and is discarded
+   when the scope ends unless promoted. (The first draft of this ADR invented a
+   "copy-on-write + merge-on-join" scheme — **no major engine does that**; it is
+   both harder and semantically surprising.) Moving from one flat `Map` to
+   scope-chained resolution touches **every** `variables.get`/`set`, every
+   template interpolation, and every CEL evaluation in the engine.
+
+3. **Per-run serialization.** Two sibling tokens (e.g. two parallel approvals
+   decided at the same instant) would resume concurrently and race on shared run
+   state and the join barrier. Camunda serializes commands **per process
+   instance** (optimistic locking). v1 of Track B would likewise need to
+   serialize token advances within a run — which means the concurrency is
+   *logical* (independent pause points), not *parallel execution*. That is a
+   real, honest limitation to state up front.
+
+The token tree is correct long-term, but its cost is "rebuild the engine's
+execution model," not "add a tree to `SuspendedRun`."
 
 ## Decision
 
-### D1 — A run's live state is a set of **tokens** in a **scope tree**
-
-- A **token** is one execution position: `{ tokenId, scopeId, nodeId, status }`
-  where `status ∈ { running, paused, completed, cancelled }`.
-- A **scope** is a region instance: the **root** flow, or an instance of a
-  `parallel` branch / `loop` iteration / `try` or `catch` region. Scopes nest by
-  containment → a tree. Each scope records `{ scopeId, parentScopeId, kind,
-  iteration?, joinState }`.
-- A linear flow with no concurrency is a **one-token, one-scope** tree —
-  identical behavior to today (the back-compat anchor).
-
-### D2 — Split / join are scope operations
-
-- Entering a `parallel` block creates one **child scope per branch**, each
-  seeded with a token at its region entry; the block's join barrier records how
-  many branch tokens must arrive.
-- Entering a `loop` creates one child scope per iteration (sequential by default;
-  the model permits concurrent iterations behind a flag — out of scope for v1).
-- A scope **completes** when all its tokens reach its single exit (ADR-0031
-  single-entry/single-exit regions make this well-defined); completion emits one
-  token into the parent scope at the container's ordinary out-edge — the
-  existing "after-block / after-loop continuation."
-
-### D3 — Any token may pause; the scope persists partial progress
-
-- A pausing node (`approval` / `screen` / `wait`) sets **its token** to `paused`
-  and snapshots the **whole tree** (all tokens, all scope join states, the
-  variable scoping per D5). Sibling tokens keep running; the run is `paused`
-  while **any** token is paused or running-then-pausable.
-- The run is `completed` only when the root scope completes with no paused
-  tokens; `failed` per D6.
-
-### D4 — Resume targets a token
-
-- `resume(runId, signal)` gains an optional `tokenId`. With exactly one paused
-  token (today's universal case) it resolves unambiguously — **the existing
-  single-argument resume is unchanged**. Approval/wait/screen already carry a
-  correlation key; the engine maps `correlation → tokenId`.
-- Resuming a token continues traversal **within its scope**; reaching the
-  scope's exit decrements the parent join barrier. When the last branch token
-  arrives, the join fires and the parent continues — possibly itself pausing
-  again elsewhere.
-
-### D5 — Variable scoping is copy-on-write per scope
-
-- ADR-0031 keeps region bodies in the **enclosing variable scope**. With
-  concurrent *paused* branches that share an enclosing map, a naive shared map
-  lets one paused branch's later resume clobber another's reads. The model makes
-  per-scope variable writes **copy-on-write**: a scope sees the enclosing values
-  but its writes are isolated to its scope frame until it joins, at which point a
-  defined **merge policy** folds them back (last-writer-wins by default;
-  loop accumulation via explicit output variables, as today). The merge policy
-  is a named decision point, not left implicit.
-
-### D6 — Failure and cancellation are scope-scoped
-
-- A token failing terminally **fails its scope**; by default the scope's failure
-  **cancels its sibling tokens** in the same parent (interrupt semantics) and
-  propagates up — matching the intuition "if the parallel block can't finish,
-  the block fails." `try_catch` is the structured opt-out: a `try`-scope failure
-  routes to the `catch` scope instead of propagating (ADR-0031, unchanged).
-- Cancellation of a *running* token is cooperative (checked at node boundaries);
-  cancellation of a *paused* token consumes its continuation and records it
-  cancelled. This cancellation primitive is what later unlocks **boundary
-  events / timers** (a separate follow-up ADR builds on it — see Non-goals).
-
-### D7 — Authoring model and DAG invariant unchanged
-
-- The flow JSON does **not** change. Tokens/scopes are runtime-only;
-  `flow.zod.ts` and the designer are untouched. ADR-0031's structured constructs
-  remain the authoring surface and the AI design center (ADR-0010/0011).
-- No back-edges are introduced. Scopes are acyclic single-entry/single-exit
-  regions; iteration stays the loop container's job. The DAG invariant holds.
-
-## Representation — persistence evolution (additive)
-
-`SuspendedRun` / `sys_automation_run` evolve **additively**:
-
-- Keep `nodeId` as the **primary token's** position (the first/only paused token)
-  so existing readers, the Runs panel, and one-pause flows keep working with no
-  change.
-- Add `tokens_json` (and the scope tree) as a new JSON column / field carrying
-  the full set when there is more than one. A row with no `tokens_json` is a
-  one-token run — rehydrated as today. This mirrors the ADR-0019 discipline of
-  not breaking the suspended-run table; the single new additive column is the
-  deliberate exception this feature requires.
-- `resume` continuation, correlation→token mapping, and the cold-boot
-  wait-timer re-arm (#1687) all extend to address a token instead of the run.
-
-`execution.zod.ts` gains a `tokens[]` / `scopes[]` shape on the run log;
-`ExecutionStepLogSchema` already tags steps with `parentNodeId` / `iteration` /
-`regionKind` (ADR-0031 #1505), which the token model formalizes as scope ids.
+### Track A (now) — multi-instance / aggregating nodes
+
+Model the concrete demand as **single nodes** that internally fan out and
+aggregate, leaving the engine's one-program-counter model intact:
+
+- **`approval` gains multi-approver aggregation** (it largely has this already —
+  `behavior: 'unanimous' | 'first_response'`): one `approval` node opens N
+  approval requests and **stays suspended at that one node** until the
+  aggregation rule is met, then resumes down `approve` / `reject`. "Finance AND
+  legal" is `unanimous` over two approver groups — **one node, one program
+  counter, already paused once**. No engine-core change.
+- **A `map` / multi-instance node** for "do X for each item, then continue":
+  one node owns the collection, opens a child unit per item (e.g. an approval or
+  a subflow per row), and suspends at that single node until all units settle.
+  The aggregation (all / any / threshold) is node config. This is the Step
+  Functions `Map` shape and Camunda's multi-instance activity.
+- These compose with ADR-0019 durable pause exactly as today: the node suspends
+  once, at one position; resume targets the run, not a token. `sys_automation_run`
+  is unchanged. The node executor tracks its own per-unit state (it already may,
+  via the approvals service's `sys_approval_request` rows).
+
+Track A covers parallel approvals and batch approvals — the demand that
+motivated this ADR — at a fraction of Track B's cost and risk.
+
+### Track B (deferred) — the general token / scope tree
+
+When a flow genuinely needs to pause at **arbitrary, independent positions** that
+multi-instance cannot express (e.g. two unrelated long-running waits on different
+branches that each continue into different downstream logic), adopt the full
+model:
+
+- **Token** = `{ tokenId, scopeId, nodeId, status }`,
+  `status ∈ { running | paused | completed | cancelled }`.
+- **Scope** = a region instance (root flow, parallel branch, loop iteration, try
+  region), nested by containment into a tree. A linear flow is a one-token /
+  one-scope tree — the back-compat anchor (today's behavior unchanged).
+- **Execution** is the scheduler of [§1 above](#why-the-token-tree-is-expensive-the-real-cost),
+  not recursion. **Variables** are scope-hierarchical (§2). **Resume** targets a
+  `tokenId` (defaulting to the sole paused token for back-compat) and is
+  **serialized per run** (§3). **Split/join** are scope operations; a scope's
+  join is a barrier that fires when its child tokens reach its single exit
+  (ADR-0031 single-entry/single-exit makes this well-defined). **Failure**
+  fails the scope and cancels siblings (interrupt) unless caught by a `try_catch`
+  scope; this cancellation primitive is what later unlocks boundary events/timers.
+- **Persistence is additive**: keep `nodeId` as the primary token's position so
+  existing readers and one-pause flows are unchanged; add `tokens_json` for the
+  full tree when there is more than one.
+- **Authoring and DAG unchanged** (D7 below): tokens are runtime-only; the flow
+  JSON, the designer, and the AI design center (ADR-0010/0011) are untouched, and
+  no back-edges are introduced.
+
+### D7 — invariants that hold on both tracks
+
+- The flow JSON, the structured-construct authoring surface (ADR-0031), the AI
+  design center, and the DAG invariant are **unchanged**. Concurrency is a
+  runtime concern, never an authoring one.
+- The single-position / single-token case stays bit-for-bit today's behavior.
+- Subflow linked-runs (ADR-0019 addendum) composes with either track.
+
+## Why not the other models
+
+- **Serialize the interpreter stack** (Salesforce-Flow style): inlines child
+  state into the parent, destroys per-branch run identity, and still cannot
+  express N independent pauses. Rejected.
+- **Event-sourced deterministic replay** (Temporal/Zeebe-internals style):
+  requires every node to be deterministic/idempotent. ADR-0018's **open node
+  registry** lets third-party executors run arbitrary side effects — the replay
+  precondition does not hold here. This is a generative-ecosystem constraint, not
+  a taste call. Rejected as the engine model.
+- **Jump straight to the general token tree** (first draft of this ADR):
+  correct long-term but over-built for the near-term demand, and its true cost
+  (the three execution-model changes above) is not yet justified. Deferred to
+  Track B.
 
 ## Consequences
 
-- **Unlocks** parallel approvals, batch (per-iteration) approvals, concurrent
-  waits, and lays the cancellation primitive for boundary timers/events.
-- **Core risk.** Traversal, suspend/resume, and persistence are the engine's
-  heart; this is the largest change to it since ADR-0019. Mitigated by phasing
-  (below) with the one-token degenerate case as a behavior-preserving anchor and
-  the full existing suite as the regression gate at every phase.
-- **Observability improves**: the Runs panel can show a tree of live positions
-  ("branch ① paused at approval, branch ② done") instead of one node.
-- **No authoring or migration cost** for existing flows — they are one-token
-  trees; the JSON, the designer, and stored runs are untouched.
-- **Subflow linked-runs (ADR-0019 addendum) composes**: a subflow token whose
-  child run pauses stays `paused` in its scope exactly like any other pausing
-  node — inter-flow chaining and intra-flow tokens stack cleanly.
-
-## Sequencing (roadmap)
-
-Each phase ships behind tests; the suite stays green throughout.
-
-1. **2a — Internal token model, zero behavior change.** Represent today's
-   single program counter as a one-token / one-scope tree inside the engine.
-   `executeNode` / `traverseNext` / `resume` operate on tokens; structured
-   containers create child scopes but still run synchronously. No new capability;
-   pure refactor that de-risks the rewrite. *Gate: full suite unchanged.*
-2. **2b — Pause inside `parallel` branches.** The most-requested case (parallel
-   approvals). Join barrier persists partial completion; branch tokens pause and
-   resume independently; D5 copy-on-write + merge lands here.
-3. **2c — Pause inside `loop` iterations.** Batch approvals. Sequential
-   iterations first; the per-iteration scope is the unit of pause.
-4. **2d — Cancellation / interrupt (D6).** Sibling cancellation on scope
-   failure; cooperative running-token cancellation. Unblocks a follow-up ADR for
-   **boundary events / timers** (BPMN interrupting boundaries map onto this).
+- **Track A unblocks the real demand now** (parallel + batch approvals) with no
+  engine-core rewrite, no persistence change, and no new concurrency hazards.
+- **Track B is recorded, not started.** The team avoids a premature core rewrite
+  while keeping a coherent target; Track A's multi-instance node is designed so
+  its per-unit state could later be re-expressed as scoped tokens.
+- **Honest limitation of Track A**: it does not allow pausing at a *free* point
+  inside a hand-drawn parallel/loop region — only the structured aggregating node
+  pauses. If a flow needs that, it is the signal to start Track B.
+- **Observability**: Track A shows N per-unit rows under one node (e.g. the
+  approvals list); Track B would show a tree of live positions. The Runs panel
+  extends additively either way.
+
+## Sequencing
+
+1. **A1 — specify the aggregating `approval` contract** (formalize
+   `unanimous` / `first_response` / threshold over approver *groups*; today's
+   `unanimous` already tallies — make N-group explicit and tested).
+2. **A2 — `map` / multi-instance node**: collection in, per-item child unit
+   (approval or subflow), aggregation policy, single suspend/resume at the node.
+   Showcase example: per-line-item approval over an invoice's lines.
+3. **B-gate** — only if a concrete flow needs arbitrary-position concurrent
+   pause: open a follow-up ADR to start Track B at §1's scheduler, with the
+   one-token refactor as the first, behavior-preserving step.
 
 ## Non-goals / deferred
 
-- **Distributed token execution** across workers/nodes. v1 keeps one claimer per
-  run (today's model); tokens are concurrent *within* a process, not sharded
-  across machines.
-- **Parallel I/O speedup as a goal.** Concurrency here is about independent
-  *pause*, not throughput; any wall-clock win is incidental.
-- **Full BPMN boundary-event / event-subprocess semantics.** The cancellation
-  primitive (2d) is the foundation; the node-type surface is a separate ADR.
-- **Concurrent loop iterations** (fan-out map). The model permits it behind a
-  flag; v1 ships sequential iteration only.
-- **Changing the authoring model.** Out of scope by D7 — tokens are runtime-only.
+- The general token/scope tree and its scheduler (Track B) — recorded, not
+  scheduled.
+- Distributed token execution across workers/nodes (one claimer per run stands).
+- Concurrent loop iterations / true parallel I/O speedup (logical concurrency
+  only; not a throughput feature).
+- Full BPMN boundary-event / event-subprocess semantics (built on Track B's
+  cancellation primitive; separate node-type ADR).
+- Any change to the authoring model (D7).
 
 ## Relationship to prior ADRs
 
-- **ADR-0019** gave durable pause for a *single* position and (in its addendum)
-  *between*-flow chaining. This ADR generalizes the *within*-flow position from
-  one to a tree. The resume contract and `sys_automation_run` extend additively.
-- **ADR-0031** defined the structured regions; this ADR is their **runtime
-  dual** — a region instance is a scope. The DAG invariant and AI-authoring
-  center are explicitly preserved.
-- **ADR-0018**'s open registry is the reason replay-based models are rejected
-  (D-reframing) and why the token/scope model is the right fit.
+- **ADR-0019** gave durable pause for a single position and (addendum)
+  between-flow chaining. Track A reuses that pause as-is (the aggregating node
+  pauses once); Track B would generalize the within-flow position to a tree.
+- **ADR-0031** defined the structured regions. Track A's multi-instance node is a
+  new structured construct alongside them; Track B's scopes are their runtime
+  dual. The DAG invariant and AI-authoring center are preserved on both.
+- **ADR-0018**'s open registry is why replay models are rejected and why, when
+  Track B comes, the Camunda-style scheduler (not Temporal replay) is the fit.

From ae35d1e64bd29b176c4e3f2b359291bf2668c984 Mon Sep 17 00:00:00 2001
From: os-zhuang <jack@objectstack.ai>
Date: Thu, 11 Jun 2026 09:11:31 +0500
Subject: [PATCH 3/3] =?UTF-8?q?docs(adr):=20correct=20ADR-0037=20=E2=80=94?=
 =?UTF-8?q?=20the=20map=20node=20is=20NOT=20engine-free=20(A1=20vs=20A2)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Building A2 surfaced a real error in the ADR: it claimed Track A's `map` /
multi-instance node needs "no change to the engine's execution model." Examined
against the resume/bubble code, that is false for any map that serves batch
approval (each item can pause):

- concurrent map needs durable N:1 aggregation + per-parent serialization +
  completion-ordering handling (part of Track B's hard concurrency, confined to
  one node);
- sequential map needs resume-INTO-the-node (next item) instead of the engine's
  resume-past-the-node default (the DAG has no back-edge to loop the node);
- only a synchronous, non-pausing map is engine-free, and that does not serve
  batch approval.

Splits Track A into A1 (aggregating approval — truly free, shipped #1708) and
A2 (map node — a bounded, separately-justified engine task, design-first). A1
covers parallel approvals at zero engine cost; A2 is not a free rider on it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/0037-token-scope-tree-execution.md | 93 +++++++++++++--------
 1 file changed, 60 insertions(+), 33 deletions(-)

diff --git a/docs/adr/0037-token-scope-tree-execution.md b/docs/adr/0037-token-scope-tree-execution.md
index 2effdfea4..ec202ed08 100644
--- a/docs/adr/0037-token-scope-tree-execution.md
+++ b/docs/adr/0037-token-scope-tree-execution.md
@@ -25,13 +25,16 @@ A code review (below) shows the cost is much larger than the data-structure
 change implies.
 
 **Decision: two tracks.** Ship **Track A — multi-instance / aggregating nodes**
-first: model the actual demand (parallel approvals, batch approvals) as *single
-nodes* that wait for N decisions, the way Camunda multi-instance and AWS Step
-Functions `Map` do. This needs **no change to the engine's execution model** —
-the node owns the fan-out and the aggregation, the run still has one program
-counter. Defer **Track B — the general token/scope tree** until demand exceeds
-what multi-instance covers; this ADR records its design so Track A is built
-toward it, not away from it.
+first: model the demand as *single nodes* that wait for N decisions, the way
+Camunda multi-instance and AWS Step Functions `Map` do. Track A splits into a
+**free** tier and a **bounded** tier — a distinction worth stating up front:
+**A1 (parallel approval — one `approval` node aggregating N decisions) needs no
+engine change and is shipped (#1708)**; **A2 (a `map` / multi-instance node for
+batch approval) is NOT free** — because each item can pause, it needs a bounded
+extension of the engine's resume path (N:1 aggregation or node re-entry), so it
+is a separately-justified increment, not a free rider on A1. Defer **Track B —
+the general token/scope tree** until demand exceeds what multi-instance covers;
+this ADR records its design so Track A is built toward it, not away from it.
 
 ## Context — current state (verified 2026-06-11, against the code)
 
@@ -107,24 +110,41 @@ execution model," not "add a tree to `SuspendedRun`."
 Model the concrete demand as **single nodes** that internally fan out and
 aggregate, leaving the engine's one-program-counter model intact:
 
-- **`approval` gains multi-approver aggregation** (it largely has this already —
-  `behavior: 'unanimous' | 'first_response'`): one `approval` node opens N
-  approval requests and **stays suspended at that one node** until the
-  aggregation rule is met, then resumes down `approve` / `reject`. "Finance AND
-  legal" is `unanimous` over two approver groups — **one node, one program
-  counter, already paused once**. No engine-core change.
-- **A `map` / multi-instance node** for "do X for each item, then continue":
-  one node owns the collection, opens a child unit per item (e.g. an approval or
-  a subflow per row), and suspends at that single node until all units settle.
-  The aggregation (all / any / threshold) is node config. This is the Step
-  Functions `Map` shape and Camunda's multi-instance activity.
-- These compose with ADR-0019 durable pause exactly as today: the node suspends
-  once, at one position; resume targets the run, not a token. `sys_automation_run`
-  is unchanged. The node executor tracks its own per-unit state (it already may,
-  via the approvals service's `sys_approval_request` rows).
-
-Track A covers parallel approvals and batch approvals — the demand that
-motivated this ADR — at a fraction of Track B's cost and risk.
+Track A has **two tiers of cost** — a distinction the first revision of this ADR
+got wrong by lumping them together. They are not equal.
+
+**A1 — aggregating `approval` node (truly free; shipped #1708).** One `approval`
+node with `behavior: 'unanimous'` over N approver groups opens **one**
+`sys_approval_request` whose `pending_approvers` lists all groups (notified in
+parallel) and stays suspended until every group approves, then resumes down
+`approve` / `reject`. "Finance AND legal" is exactly this — **one node, one
+program counter, paused once**. This needed **no engine change**: the
+unanimous-over-N aggregation already exists in the approvals service and is
+unit-tested; A1 added a showcase (`showcase_invoice_signoff`) and docs, browser-
+verified. The aggregation state lives in the plugin's own `sys_approval_request`
+row, not the engine.
+
+**A2 — `map` / multi-instance node (NOT free — engine-adjacent).** A correction:
+a `map` node that serves **batch approval** (each item can pause) **cannot** be
+"no engine change," contrary to this ADR's first revision. Examined against the
+code, every flavor needs a bounded extension of the engine's resume/bubble path:
+  - *concurrent* map (N items pause at once) needs **durable N:1 aggregation +
+    per-parent serialization + completion-ordering handling** — i.e. part of
+    Track B's hard concurrency, just confined to one node;
+  - *sequential* map (one item at a time) needs **resume-into-the-node** (process
+    the next item) instead of the engine's resume-past-the-node default — the DAG
+    has no back-edge to loop the node;
+  - only a *synchronous, non-pausing* map is engine-free, and that does not serve
+    batch approval (which pauses).
+  The map node reuses ADR-0019's linked-runs (#1693) for the 1:1 bubble but
+  extends it to N:1 / re-entry. It is a real, bounded engine task — smaller than
+  the full Track B scheduler, but **not** the zero-cost item A1 was. It should be
+  built only against concrete batch-approval demand, with the aggregation /
+  re-entry semantics designed first.
+
+So Track A as shipped (**A1**) covers *parallel* approvals at zero engine cost.
+*Batch* approvals (**A2**) are a deliberate, separately-justified increment, not
+a free rider on A1.
 
 ### Track B (deferred) — the general token / scope tree
 
@@ -192,15 +212,22 @@ model:
 
 ## Sequencing
 
-1. **A1 — specify the aggregating `approval` contract** (formalize
-   `unanimous` / `first_response` / threshold over approver *groups*; today's
-   `unanimous` already tallies — make N-group explicit and tested).
-2. **A2 — `map` / multi-instance node**: collection in, per-item child unit
-   (approval or subflow), aggregation policy, single suspend/resume at the node.
-   Showcase example: per-line-item approval over an invoice's lines.
+1. **A1 — aggregating `approval` node. ✅ Shipped (#1708).** The
+   `unanimous`-over-N-approver-groups aggregation already existed and was
+   unit-tested; #1708 added the `showcase_invoice_signoff` worked example
+   (finance AND legal, browser-verified) and docs. No engine change. Threshold /
+   quorum (M-of-N) stays enterprise-tier per `approval.zod.ts`.
+2. **A2 — `map` / multi-instance node (design-first; not started).** Collection
+   in, per-item child unit, aggregation, single suspend at the node. **Cost
+   correction**: because items can pause, this needs a bounded engine resume-path
+   extension (durable N:1 aggregation for concurrent, or resume-into-node for
+   sequential) — it is *not* the zero-engine-change item A1 was, so it is gated on
+   concrete batch-approval demand and a design note that nails the aggregation /
+   re-entry + serialization semantics first.
 3. **B-gate** — only if a concrete flow needs arbitrary-position concurrent
-   pause: open a follow-up ADR to start Track B at §1's scheduler, with the
-   one-token refactor as the first, behavior-preserving step.
+   pause that a multi-instance node cannot express: open a follow-up ADR to start
+   Track B at the scheduler, with the one-token refactor as the first,
+   behavior-preserving step.
 
 ## Non-goals / deferred