feat: add image generation and audio (speech/transcription) methods by njbrake · Pull Request #16 · mozilla-ai/otari-sdk-python

njbrake · 2026-06-16T15:50:10Z

Summary

Adds public image_generation, speech, and transcription methods to both OtariClient (sync) and AsyncOtariClient (async), mirroring the existing completion / embedding / moderation / rerank shape. This lets callers (notably any-llm's otari provider) route image generation and audio through the public client surface instead of reaching into the generated otari._client internals.

image_generation routes through the generated ImagesApi (typed ImageGenerationRequest); the gateway returns an OpenAI-compatible JSON object, which the generated core models as opaque, so it is returned as a dict.
speech returns binary audio (audio/mpeg), which has no JSON response model, so it posts over the existing httpx client via a small _post helper and returns the raw bytes.
transcription uploads audio as multipart form data (the generated core mistypes file as a string) and returns the parsed JSON dict, or the raw text for text / srt / vtt formats.

The _post helper reuses the same error-mapping path as the streaming shim, so both auth modes (platform Bearer and self-hosted Otari-Key) keep working.

Endpoint manifest

Moved /v1/images/generations, /v1/audio/speech, /v1/audio/transcriptions from [excluded] to [covered].
Deferred the gateway's newly added /v1/files endpoints under [excluded] (# not yet wrapped) so the endpoint-coverage drift gate passes. Wrapping the Files API is tracked separately.

Test plan

uv run ruff check .
uv run mypy src/
uv run pytest tests/unit (122 passed): image generation returns the JSON object and maps errors; speech returns bytes and sends the auth header; transcription sends multipart and returns the parsed JSON / raw text; raw-transport error mapping.

Closes mozilla-ai/otari#138

🤖 Generated with Claude Code

Update: rebased on the regenerated core (gateway PR #178 merged); image_generation now returns the typed ImagesResponse. Ready to merge. Closes mozilla-ai/otari#138; ticks the Python box on #179.

Expose image_generation, speech, and transcription on both OtariClient and AsyncOtariClient so callers (notably any-llm's otari provider) can route image generation and audio through the public client surface instead of reaching into the generated otari._client internals. image_generation goes through the generated ImagesApi (typed request, opaque JSON object response returned as a dict). speech and transcription do not fit the generated JSON core (binary audio out, multipart upload in), so they post over the existing httpx client via a small _post helper that reuses the same error mapping as the streaming shim: speech returns raw bytes, transcription returns the parsed JSON dict (or text for text/srt/vtt formats). Move the three endpoints from [excluded] to [covered] in sdk-endpoints.txt. Also defer the gateway's newly added /v1/files endpoints under [excluded] (not yet wrapped by any SDK) so the endpoint-coverage drift gate passes; wrapping the files API is tracked separately. Closes mozilla-ai/otari#138 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the loosely typed Any return (a dict or a str depending on the response format) with a TranscriptionResult dataclass exposing two fields: json (set for json / verbose_json formats) and text (set for text / srt / vtt formats). This gives the transcription method a single explicit return type and keeps it uniform with the other otari SDKs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Now that the regenerated core types the /v1/images/generations response (gateway enrich_spec injects any-llm's ImagesResponse), return the typed model from both OtariClient.image_generation and AsyncOtariClient.image_generation instead of an untyped dict, matching embedding / rerank / moderation. Updates tests and the README example to use attribute access. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

njbrake and others added 2 commits June 16, 2026 15:03

njbrake force-pushed the feat/image-audio-methods branch from cc09b1a to ed7800f Compare June 16, 2026 16:17

njbrake and others added 2 commits June 16, 2026 18:21

Merge remote-tracking branch 'origin/main' into feat/image-audio-methods

6201018

njbrake force-pushed the feat/image-audio-methods branch from ccdc46f to 8ee185f Compare June 16, 2026 18:25

njbrake marked this pull request as ready for review June 16, 2026 19:48

njbrake merged commit c558a03 into main Jun 16, 2026
3 checks passed

njbrake deleted the feat/image-audio-methods branch June 16, 2026 19:48

github-actions Bot mentioned this pull request Jun 16, 2026

chore(main): release otari 0.2.0 #18

Merged

njbrake mentioned this pull request Jun 16, 2026

Expose async image generation and audio (speech/transcription) on AsyncOtariClient mozilla-ai/otari#138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add image generation and audio (speech/transcription) methods#16

feat: add image generation and audio (speech/transcription) methods#16
njbrake merged 4 commits into
mainfrom
feat/image-audio-methods

njbrake commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

njbrake commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Endpoint manifest

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

njbrake commented Jun 16, 2026 •

edited

Loading