Skip to content

feat: add image generation and audio (speech/transcription) methods#16

Merged
njbrake merged 4 commits into
mainfrom
feat/image-audio-methods
Jun 16, 2026
Merged

feat: add image generation and audio (speech/transcription) methods#16
njbrake merged 4 commits into
mainfrom
feat/image-audio-methods

Conversation

@njbrake

@njbrake njbrake commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Adds public image_generation, speech, and transcription methods to both OtariClient (sync) and AsyncOtariClient (async), mirroring the existing completion / embedding / moderation / rerank shape. This lets callers (notably any-llm's otari provider) route image generation and audio through the public client surface instead of reaching into the generated otari._client internals.

  • image_generation routes through the generated ImagesApi (typed ImageGenerationRequest); the gateway returns an OpenAI-compatible JSON object, which the generated core models as opaque, so it is returned as a dict.
  • speech returns binary audio (audio/mpeg), which has no JSON response model, so it posts over the existing httpx client via a small _post helper and returns the raw bytes.
  • transcription uploads audio as multipart form data (the generated core mistypes file as a string) and returns the parsed JSON dict, or the raw text for text / srt / vtt formats.

The _post helper reuses the same error-mapping path as the streaming shim, so both auth modes (platform Bearer and self-hosted Otari-Key) keep working.

Endpoint manifest

  • Moved /v1/images/generations, /v1/audio/speech, /v1/audio/transcriptions from [excluded] to [covered].
  • Deferred the gateway's newly added /v1/files endpoints under [excluded] (# not yet wrapped) so the endpoint-coverage drift gate passes. Wrapping the Files API is tracked separately.

Test plan

  • uv run ruff check .
  • uv run mypy src/
  • uv run pytest tests/unit (122 passed): image generation returns the JSON object and maps errors; speech returns bytes and sends the auth header; transcription sends multipart and returns the parsed JSON / raw text; raw-transport error mapping.

Closes mozilla-ai/otari#138

🤖 Generated with Claude Code


Update: rebased on the regenerated core (gateway PR #178 merged); image_generation now returns the typed ImagesResponse. Ready to merge. Closes mozilla-ai/otari#138; ticks the Python box on #179.

njbrake and others added 2 commits June 16, 2026 15:03
Expose image_generation, speech, and transcription on both OtariClient and
AsyncOtariClient so callers (notably any-llm's otari provider) can route image
generation and audio through the public client surface instead of reaching into
the generated otari._client internals.

image_generation goes through the generated ImagesApi (typed request, opaque
JSON object response returned as a dict). speech and transcription do not fit
the generated JSON core (binary audio out, multipart upload in), so they post
over the existing httpx client via a small _post helper that reuses the same
error mapping as the streaming shim: speech returns raw bytes, transcription
returns the parsed JSON dict (or text for text/srt/vtt formats).

Move the three endpoints from [excluded] to [covered] in sdk-endpoints.txt.
Also defer the gateway's newly added /v1/files endpoints under [excluded] (not
yet wrapped by any SDK) so the endpoint-coverage drift gate passes; wrapping the
files API is tracked separately.

Closes mozilla-ai/otari#138

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the loosely typed Any return (a dict or a str depending on the
response format) with a TranscriptionResult dataclass exposing two fields:
json (set for json / verbose_json formats) and text (set for text / srt / vtt
formats). This gives the transcription method a single explicit return type and
keeps it uniform with the other otari SDKs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@njbrake njbrake force-pushed the feat/image-audio-methods branch from cc09b1a to ed7800f Compare June 16, 2026 16:17
njbrake and others added 2 commits June 16, 2026 18:21
Now that the regenerated core types the /v1/images/generations response (gateway
enrich_spec injects any-llm's ImagesResponse), return the typed model from both
OtariClient.image_generation and AsyncOtariClient.image_generation instead of an
untyped dict, matching embedding / rerank / moderation. Updates tests and the
README example to use attribute access.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@njbrake njbrake force-pushed the feat/image-audio-methods branch from ccdc46f to 8ee185f Compare June 16, 2026 18:25
@njbrake njbrake marked this pull request as ready for review June 16, 2026 19:48
@njbrake njbrake merged commit c558a03 into main Jun 16, 2026
3 checks passed
@njbrake njbrake deleted the feat/image-audio-methods branch June 16, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose async image generation and audio (speech/transcription) on AsyncOtariClient

1 participant