Skip to content

Duplicate copilot processes per agent with mismatched --session-id (resolver-vs-runtime drift) #2

Description

@dfrysinger

Symptom

After running for a while, 4 of 15 agent tmux panes (echo, india, lima, november) had two stacked copilot processes with different --session-id values in the pane's process tree. The displayed pane content always matched the inner process's session-id; the outer process held a stale session-id whose events.jsonl hadn't been touched in days.

Discovered during pre-migration verification: ca <agent> resolver picks the session matching displayed content for all 15 agents, so no resume bug — but the orphan procs are wasting resources and would have caused real confusion if the resolver had favored the orphan.

Process tree pattern (echo example)

tmux pane shell pid=44072
└─ pid=44130  copilot --session-id=39c8d55c… --name=echo --remote --allow-all   ← orphan, mtime 2026-06-22
   └─ pid=24792  copilot --name=echo --remote --yolo --session-id 333facd1…    ← displayed, mtime 2026-06-24

Same shape on india, lima, november. Other 11 agents either had a single copilot process or a normal parent/child pair sharing the same --session-id.

Per-agent fingerprint

Agent Displayed sid (resolver pick) Orphan sid Last events.jsonl: displayed vs orphan
echo 333facd1… 39c8d55c… 2026-06-24 17:52 vs 2026-06-22 18:31
india 3a45f59a… 49dbdaa2… 2026-06-25 10:36 vs 2026-06-12 11:34
lima 00cef069… 86d02bf0… 2026-06-25 10:02 vs 2026-06-19 16:02
november 7c875aff… 75aef5fc… 2026-06-25 00:27 vs 2026-06-20 10:56
foxtrot 176e090d… 79b517ff… 2026-06-18 23:41 vs 2026-06-12 11:34
golf 4d91dd65… b331c3a6… 2026-06-18 21:07 vs 2026-06-12 11:34

(foxtrot/golf had the same single-stack shape — the "orphan" sid showed up only as the outer --name=NAME proc with the resolver-pick session living on events.jsonl.)

Pane content matched the displayed sid via grep of unique phrases against events.jsonl:

  • echo: "Which, in the end, is all that any road is for" → 1 hit in 333facd1, 0 in 39c8d55c
  • foxtrot: "Savings & Efficacy panel spec" → 1 hit in 176e090d, 0 in 79b517ff
  • golf: "CopilotLimiter" → 169 hits in 4d91dd65, 0 in b331c3a6
  • lima: "xattr -dr com.apple.quarantine" → 38 hits in 00cef069, 0 in 86d02bf0
  • november: "google-re2" → 4 hits in 7c875aff, 0 in 75aef5fc

Likely causes (speculation)

A second copilot --session-id=… got spawned in the agent's tmux pane after the original launch, without killing the prior child. Candidate triggers:

  1. bin/copilot-agent invoked twice for the same --name: the resolver would pick a different sid the second time (older "real" session vs newly created "empty" session, or vice versa, depending on tier+updated_at state at the moment), and tmux new-session -A reattaches the existing pane, sending a new copilot --session-id=… keystroke into the prompt. End result: two stacked copilot procs in the same shell.
  2. Mailbox poke (MAILBOX_POKE env path) racing with a user-initiated ca <name> relaunch.
  3. A /resume action inside copilot that forks a child with the chosen session-id instead of re-exec'ing in place.

Worth instrumenting bin/copilot-agent to log the resolver decision and check for an already-attached copilot in the pane before sending the launch keystroke.

Suggested fix candidates

  • In bin/copilot-agent, before exec tmux new-session -s "$NAME" …\; send-keys "$CMD" Enter, detect whether the named tmux session already has a live copilot descendant. If yes, just attach (tmux attach -t "$NAME") and skip the send-keys. Idempotent re-invocation.
  • Or: track the chosen UUID in a sticky ~/.copilot/agents/<name>.sid file written on first launch and re-read on subsequent ones, instead of re-running the workspace.yaml resolver every time. The resolver's tier+updated_at logic flipped the pick at least once for echo, india, lima, november because those agents have multiple sessions with matching cwd: and one of them got summarizer activity that the other didn't.
  • Or: kill any pre-existing copilot proc in the pane's process tree before sending Enter, so we never end up stacked.

Reproduction posture

I'm about to drain all 15 agent sessions for a Dropbox→OneDrive workspace migration, so the immediate fingerprint will be gone. Capturing this issue now while the diagnostic context is fresh. After migration I can intentionally re-trigger by running ca echo twice with a fresh session creation in between, if useful.

Migration safety status

All 15 agents will resume to the on-screen session via the resolver — verified via process-tree + events.jsonl mtime + unique-phrase grep. Orphan procs lose nothing on tmux kill-server.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions