feat(pulse/telegram): bidirectional image support#1384
Open
klausagnoletti wants to merge 1 commit into
Open
Conversation
Inbound: new message:photo / message:document handler. The file is downloaded, validated, and its path passed to the SDK session to Read. Security (single-user gated bridge; files go to a bypassPermissions session): - Accept PNG/JPEG only, identified by MAGIC BYTES, never the declared mime. - Reject WebP/GIF/SVG/voice/audio/video/stickers and all other documents, which retires the libwebp (CVE-2023-4863) and SVG-script decoder classes. - 10MB byte cap + 40MP / 10000px pre-decode dimension cap (decompression bombs). - UUID filename from the sniffed type; sandboxed incoming dir; cleanup after read. Outbound: the DA emits [[IMG:/abs/path]] or [[IMG:https://url]]; the bridge sends it (photo or document) and strips the tag from the text, including mid-stream. Refactor: the message:text handler body is extracted verbatim into a shared processPrompt() reused by both handlers; the text path is unchanged. allowed_users gating and caption injection-scanning preserved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pulse Telegram bridge: bidirectional image support
Summary
The Pulse Telegram module (
PAI/PULSE/modules/telegram.ts) is text-only today: asingle
bot.on("message:text")handler, no media in or out. This adds bidirectionalimage support so you can send your DA a photo or PNG/JPEG from your phone and have it
actually look at the image, and so the DA can send images back. Implemented and running
in production on my instance.
Motivation
The most natural thing to do from a phone is send a screenshot or a photo. Today the
bridge silently drops any non-text message (the
message:photo/message:documentupdate has no handler, so grammY ignores it and the user gets no reply). Outbound, the
DA can generate or fetch an image but has no way to deliver it.
Design
Inbound (photo or PNG/JPEG document)
A new
bot.on(["message:photo", "message:document"])handler downloads the file, thenhands the local path to the SDK session, which uses the
Readtool (vision) to view it.Captions flow through and are sanitised + injection-scanned exactly like text messages.
Because the saved path is handed to a
bypassPermissionssession, inbound bytes arevalidated before they ever reach the decoder:
mime_type(which is attacker-controllable).non-image documents. Refusing to decode those formats is what retires the libwebp
(CVE-2023-4863) and SVG-script attack classes: a decoder you never invoke can't hurt you.
not re-encoded by Telegram, so a tiny file can claim a gigapixel canvas (decompression
bomb); the dimension cap is read cheaply from the PNG IHDR / JPEG SOF header before any
full decode or allocation.
state/telegram/incoming/dir with arandomUUID()filenameand the extension derived from the sniffed type (never the remote path). Best-effort
deleted after the session reads it.
Outbound (DA to user)
The DA includes a line
[[IMG:/absolute/path]](or[[IMG:https://url]]) anywhere inits reply. The bridge extracts the refs, strips the tags from the visible text (including
during the live streaming edits), and sends each as a photo or document. Documented to the
model via two lines added to the existing
TELEGRAM MODE OVERRIDEsystem-prompt block.Refactor (no behaviour change to text)
To avoid duplicating ~140 lines of SDK/stream/reply logic across the text and image
handlers, the body of the existing
message:texthandler is extracted verbatim into ashared
processPrompt(ctx, { userLog, newMessage }). Both handlers call it. The text pathis unchanged: history-building, session-resume, billing key-strip, timeout, chunking, and
persistence all identical.
Security boundary
No change to the trust boundary. The
allowed_usersmiddleware still gates every update,including media. The threat model is a compromised owner account or an owner forwarding a
malicious file, not the open internet.
Patch
One file changed (
Releases/v5.0.0/.claude/PAI/PULSE/modules/telegram.ts), +214 / -47,parses clean (
bun build). The PNG/JPEG sniff and dimension logic is unit-tested againstreal files and a synthetic decompression-bomb header. Tested and running in production on
my instance: photos, captioned photos, PNG/JPEG documents, and outbound generated images
all round-trip.
Residual risk (stated plainly)
Decoding any image still invokes the host image decoder, so libpng/libjpeg memory-safety
exposure remains on attacker-influenced (compromised-owner) input. It is bounded by the
magic-byte allowlist, the 10 MB / 40 MP / 10000 px pre-decode caps, the sandboxed
Read-only consumer, and the single-user access gate. The libwebp and SVG classes are fully
retired by never decoding those formats. Operators handling untrusted forwards should keep
host image libraries patched.
Out of scope (deliberately)
the bridge doesn't have; a focused follow-up PR is the right home for them.
pathToClaudeCodeExecutabletothe SDK so it works on non-native installs) is NOT in this diff; filable separately if useful.