feat(droid-control): add desktop-control atom backed by cua-driver#27
Open
factory-ain3sh wants to merge 1 commit into
Open
feat(droid-control): add desktop-control atom backed by cua-driver#27factory-ain3sh wants to merge 1 commit into
factory-ain3sh wants to merge 1 commit into
Conversation
… cua-driver (CLI-900) droid-control routed terminals and web/Electron but had no atom for native desktop GUI apps; that gap was papered over by a stale, macOS-only standalone mac-control skill. The new atom is a thin overlay over upstream's maintained cua-driver skill pack: it owns droid-control integration (routing, RUN_ID -> cua-session isolation, delegation, evidence handoff) and defers the fast-moving tool surface upstream, so it cannot go stale the way mac-control did. Platform files carry verified mechanics: macOS TCC/LaunchServices flow plus patterns absorbed from mac-control (now retired); Windows UIA + Session 0 + PS 5.1 quoting, smoke-tested end-to-end on a Win11 KVM VM (lossless UIA type_text with before/after evidence); Linux included at upstream's pre-release tier with empirically verified caveats (Qt/GTK4 silently drop XSendEvent input, lossy typing -> verify-and-repair loop). Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
droid-control routes terminals (tuistory/true-input) and web/Electron (agent-browser), but native desktop GUI apps had no atom -- the gap was papered over by a standalone, hand-maintained
mac-controlskill that was macOS-only and already stale against upstream. This PR adds a desktop-control driver atom backed by upstream trycua/cuacua-driver(cross-platform Rust rewrite, v0.5.x): one routing row, an atom owning the droid-control integration (RUN_ID -> cua-session isolation, delegation, evidence handoff), and per-platform mechanics files. The atom is deliberately a thin overlay: the deep tool reference defers to upstream's versioned skill pack (cua-driver skills install), which updates with the binary -- the anti-staleness property mac-control lacked.Deviation from the ticket's written scope: Linux ships as a supported third platform (user call mid-implementation) at upstream's pre-release tier, with empirically verified caveats instead of the originally planned hard exclusion.
Out of scope: filing the Linux input-loss/AT-SPI findings upstream; teaching the command docs (
/qa-testetc.) to enumerate native desktop targets.Related Issue
Closes CLI-900
Reviewer Guide
Read order:
skills/desktop-control/SKILL.md>platforms/linux.md>skills/droid-control/SKILL.md(routing hunks) >platforms/macos.md>platforms/windows.md> capture/ARCHITECTURE/README touches.Review depth: Standard -- docs-only diff, but the platform files encode operational claims; linux.md's are the ones to scrutinize (each is backed by a smoke observation).
Open for pushback: linux.md recommends pixel-first + a probe-one-keystroke pattern on the pre-release tier rather than refusing the Act stage outright (
skills/desktop-control/platforms/linux.md, toolkit-boundary section).Risk & Impact
Markdown-only; no code paths change. (1) Routing drift: Linux droids may route native-GUI work to desktop-control and hit pre-release limits -- contained by the caveat in the routing note plus explicit per-symptom fallbacks to agent-browser/true-input in linux.md. (2) Upstream drift: the atom pins only stable CLI verbs (verified against upstream
cli.rs) and defers the volatile tool surface to the auto-updating upstream pack. Retiring the superseded mac-control skill happened in a personal skills repo, not this diff.Verification
Behavior verified. Windows (production tier): end-to-end smoke on a Win11 KVM VM -- install via upstream
install.ps1, daemon viaautostart kick, thenstart_session->launch_appnotepad ->get_window_state(19-element UIA tree) ->type_text-> re-snapshot; all 46 chars written losslessly via UIA ValuePattern, before/after PNGs + tree value confirm (verified @ 99720f4). Linux (pre-release tier): same loop locally on CachyOS Plasma Wayland -- lifecycle, doctor, sessions, window discovery, per-window screenshots all green; the smoke also disproved the initially drafted clipboard mitigation and surfaced the real constraint (Qt/GTK4 silently drop XSendEvent input; alacritty accepts it but drops trailing chars) -- linux.md was rewritten claim-by-claim to observed behavior.Regression coverage. The repo's CI gate (
check-skills.yml: every SKILL.md symlinked or.skillsrc-listed) mirrored locally: PASS. No other test harness exists for skill prose.Not tested.
platforms/macos.md-- no Mac in the loop this session; content derives from upstream MACOS.md semantics plus patterns carried over from the battle-tested mac-control skill. Follow-up: run the recipe below on real macOS.Standard validators. Markdown-only repo: format/lint/typecheck/tests genuinely unconfigured (no signal); check-skills clean.
Repro Recipe
Implementation Notes
Source:
.agents/specs/2026-06-10-droid-control-desktop-control-atom-backed-by-cua-driver.notes.mdDeviations from spec: Linux promoted from excluded to supported mid-implementation (user decision); the ticket AC "does NOT route Linux there" is intentionally superseded.
platforms/linux.mdadded; routing row lost its macOS/Windows qualifier.Tradeoffs: thin-overlay atom over upstream's skill pack instead of a self-contained tool reference -- duplicating the fast-moving surface recreates exactly the staleness that killed mac-control.
Discovered constraints: upstream Linux input is XSendEvent; Qt/GTK4 drop
send_event-flagged events entirely (silent no-op), and paste hotkeys/middle-click don't land even where typing does -- linux.md documents the probe + verify-and-repair patterns instead of a clipboard workaround. Upside: XID-targeted events cannot leak into the user's focused app. Windows installer corrections (binary in%LOCALAPPDATA%\Programs\Cua,install.ps1path) came out of the VM smoke.Follow-ups not in this PR: file the Linux findings upstream (optional per ticket); macOS smoke on real hardware; command docs could enumerate native desktop targets.