feat: add image generation and audio (speech/transcription) methods#16
Merged
Conversation
Expose image_generation, speech, and transcription on both OtariClient and AsyncOtariClient so callers (notably any-llm's otari provider) can route image generation and audio through the public client surface instead of reaching into the generated otari._client internals. image_generation goes through the generated ImagesApi (typed request, opaque JSON object response returned as a dict). speech and transcription do not fit the generated JSON core (binary audio out, multipart upload in), so they post over the existing httpx client via a small _post helper that reuses the same error mapping as the streaming shim: speech returns raw bytes, transcription returns the parsed JSON dict (or text for text/srt/vtt formats). Move the three endpoints from [excluded] to [covered] in sdk-endpoints.txt. Also defer the gateway's newly added /v1/files endpoints under [excluded] (not yet wrapped by any SDK) so the endpoint-coverage drift gate passes; wrapping the files API is tracked separately. Closes mozilla-ai/otari#138 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the loosely typed Any return (a dict or a str depending on the response format) with a TranscriptionResult dataclass exposing two fields: json (set for json / verbose_json formats) and text (set for text / srt / vtt formats). This gives the transcription method a single explicit return type and keeps it uniform with the other otari SDKs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cc09b1a to
ed7800f
Compare
Now that the regenerated core types the /v1/images/generations response (gateway enrich_spec injects any-llm's ImagesResponse), return the typed model from both OtariClient.image_generation and AsyncOtariClient.image_generation instead of an untyped dict, matching embedding / rerank / moderation. Updates tests and the README example to use attribute access. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ccdc46f to
8ee185f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds public
image_generation,speech, andtranscriptionmethods to bothOtariClient(sync) andAsyncOtariClient(async), mirroring the existingcompletion/embedding/moderation/rerankshape. This lets callers (notably any-llm's otari provider) route image generation and audio through the public client surface instead of reaching into the generatedotari._clientinternals.image_generationroutes through the generatedImagesApi(typedImageGenerationRequest); the gateway returns an OpenAI-compatible JSON object, which the generated core models as opaque, so it is returned as a dict.speechreturns binary audio (audio/mpeg), which has no JSON response model, so it posts over the existing httpx client via a small_posthelper and returns the rawbytes.transcriptionuploads audio as multipart form data (the generated core mistypesfileas a string) and returns the parsed JSON dict, or the raw text fortext/srt/vttformats.The
_posthelper reuses the same error-mapping path as the streaming shim, so both auth modes (platform Bearer and self-hostedOtari-Key) keep working.Endpoint manifest
/v1/images/generations,/v1/audio/speech,/v1/audio/transcriptionsfrom[excluded]to[covered]./v1/filesendpoints under[excluded](# not yet wrapped) so the endpoint-coverage drift gate passes. Wrapping the Files API is tracked separately.Test plan
uv run ruff check .uv run mypy src/uv run pytest tests/unit(122 passed): image generation returns the JSON object and maps errors; speech returns bytes and sends the auth header; transcription sends multipart and returns the parsed JSON / raw text; raw-transport error mapping.Closes mozilla-ai/otari#138
🤖 Generated with Claude Code
Update: rebased on the regenerated core (gateway PR #178 merged);
image_generationnow returns the typedImagesResponse. Ready to merge. Closes mozilla-ai/otari#138; ticks the Python box on #179.