Skip to content

Integrate voice input into the TUI and add turn cancellation#236

Merged
alexkroman merged 1 commit into
mainfrom
claude/affectionate-babbage-itam2d
Jun 18, 2026
Merged

Integrate voice input into the TUI and add turn cancellation#236
alexkroman merged 1 commit into
mainfrom
claude/affectionate-babbage-itam2d

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Summary

This PR integrates voice input/output into the code agent TUI, allowing users to speak their requests and hear summaries of replies read back. It also adds turn cancellation (Escape) and a safer quit mechanism (Ctrl-C double-press confirmation).

Key Changes

Voice Integration in TUI

  • Added _VoiceIO protocol to abstract voice session interface (listen/speak)
  • TUI now accepts optional voice parameter and routes spoken turns into the prompt
  • Implemented _capture_voice_turn() to listen for spoken input on a background thread
  • Implemented _voice_followup() to read back a spoken summary after each reply
  • Voice degrades gracefully to typed input if microphone is unavailable (sets _voice_typed flag)

Turn Cancellation & Safer Quit

  • Added action_interrupt() (Escape key) to cancel a running agent turn without quitting
  • Added action_quit_or_interrupt() (Ctrl-C) that interrupts running turns or requires double-press to quit when idle
  • Implemented _turn_running() to check if prompt is disabled (turn in flight)
  • Implemented _cancel_turn() to set the session's cancel flag cooperatively
  • Added _quit_pending flag and _arm_quit_pending() to show "Press Ctrl-C again to quit" hint

Session Streaming & Cancellation

  • Added _SupportsStream protocol to detect agents that support incremental streaming
  • Refactored CodeSession.send() to call new _run() method that streams events incrementally
  • _run() checks _cancel flag between steps, allowing cooperative turn interruption
  • _resolve_interrupts() also respects the cancel flag to stop approval loops early
  • Added request_cancel() method to set the cancel flag from another thread

Voice Readback Improvements

  • Added spoken_summary() function to strip code (fenced and inline) from replies before TTS
  • Caps spoken text to 600 characters to keep readback brief
  • Returns generic fallback message when reply is all code
  • Updated voice REPL sink to use spoken_summary() instead of reading full text

UI Polish

  • Changed background color from #0b0e16 to pure black (#000000) for cleaner appearance
  • Updated key bindings: Escape for interrupt, Ctrl-C for interrupt/quit, Ctrl-Q for unconditional quit

Testing

  • Added comprehensive test suite in test_code_tui_voice.py covering voice integration scenarios
  • Added tests for turn cancellation and quit mechanics in test_code_tui.py
  • Added streaming agent test in test_code_agent.py to verify incremental event emission
  • Added spoken_summary() tests in test_code_voice.py
  • Updated command dispatch tests to verify voice session is passed to TUI

Implementation Details

  • Voice operations (listen/speak) run on daemon threads to avoid blocking the UI
  • call_from_thread() is used to safely update UI state from voice worker threads
  • The cancel flag is a threading.Event checked at step boundaries, enabling cooperative cancellation without killing threads
  • Streaming agents (real compiled graphs) emit events incrementally; test doubles that only implement invoke() emit once at the end
  • The quit hint timer automatically clears the _quit_pending flag after 3 seconds

https://claude.ai/code/session_01FUL1Y7QWgAUDTRdQtK2qCJ

- TUI: pure-black canvas (all surface fills #0b0e16 -> #000000).
- Stream the agent turn-by-step (stream_mode="values") so tool calls,
  results, and reply text render live instead of all at the end; the
  approval/interrupt flow is preserved and request_cancel() can break the
  loop between steps.
- Escape interrupts a running turn; Ctrl-C interrupts a running turn or,
  when idle, quits only on a confirmed double-press (mirrors deepagents-code's
  action_interrupt / action_quit_or_interrupt).
- Voice now drives the TUI: a spoken turn is transcribed, entered into the
  prompt, and submitted; TTS reads back a code-stripped summary
  (spoken_summary) instead of the full reply. --no-tui keeps the voice REPL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01FUL1Y7QWgAUDTRdQtK2qCJ
@alexkroman alexkroman enabled auto-merge June 18, 2026 02:45
@alexkroman alexkroman added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit a3b72c8 Jun 18, 2026
19 checks passed
@alexkroman alexkroman deleted the claude/affectionate-babbage-itam2d branch June 18, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants