fix(kosong): clamp max_tokens to max_output_size for OpenAI-compatible providers#1208
fix(kosong): clamp max_tokens to max_output_size for OpenAI-compatible providers#1208slgao wants to merge 1 commit into
Conversation
…e providers Third-party OpenAI-compatible providers (HuggingFace, Ollama, etc.) can have output limits below the generic CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING (131072). When max_output_size is explicitly configured, withMaxCompletionTokens now honours it as a hard upper bound instead of applying the generic ceiling, preventing 400 errors from providers whose actual limit is lower. Fixes MoonshotAI#1148.
🦋 Changeset detectedLatest commit: 52f9427 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52f942740f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| baseUrl: providerValue(provider.baseUrl, provider.env, 'OPENAI_BASE_URL'), | ||
| apiKey: providerApiKey(provider), | ||
| reasoningKey, | ||
| ...(maxOutputSize !== undefined ? { maxTokens: maxOutputSize } : {}), |
There was a problem hiding this comment.
Honor the max-token env opt-out
When KIMI_MODEL_MAX_COMPLETION_TOKENS=0 or KIMI_MODEL_MAX_TOKENS=0, resolveCompletionBudget returns undefined to disable completion-token clamping, but this line now bakes maxOutputSize into every OpenAI-compatible provider config. OpenAILegacyChatProvider.generate() serializes constructor maxTokens as max_tokens even when applyCompletionBudget is skipped, so any OpenAI-compatible model alias with maxOutputSize will still send a cap despite the documented env opt-out. Keep maxOutputSize in the budget path or avoid wiring it when the opt-out is active.
Useful? React with 👍 / 👎.
Related Issue
Resolves #1148
Problem
When using a third-party OpenAI-compatible provider (HuggingFace, Ollama, etc.) with a
max_output_sizelower thanCHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING(131072), the agent sends amax_tokensvalue that exceeds the provider's actual limit, causing a 400 error.Root cause: two gaps in the OpenAI provider path:
provider-manager.tswas not forwardingmaxOutputSizeasmaxTokensto the OpenAI provider config (unlike the Anthropic case which already passesdefaultMaxTokens).withMaxCompletionTokensinopenai-legacy.tsalways applies the generic 128k ceiling, overriding an explicitly configuredmaxTokenswhen the budget cap is large.What changed
packages/kosong/src/providers/openai-legacy.ts_explicitMaxTokens: booleanfield, set whenoptions.maxTokensis provided at construction.withMaxCompletionTokens, when_explicitMaxTokensis true, the configured value is used as the ceiling instead ofCHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING. This mirrors the existing_explicitMaxTokenspattern already used by the Anthropic provider.packages/agent-core/src/session/provider-manager.ts...(maxOutputSize !== undefined ? { maxTokens: maxOutputSize } : {})to theopenaicase intoKosongProviderConfig, mirroring the existingdefaultMaxTokensspread in theanthropiccase.packages/kosong/test/openai-legacy.test.tsmaxTokens: 65536(simulating a HuggingFace endpoint), callswithMaxCompletionTokens(1_048_576, ...), and asserts the resulting request body hasmax_tokens: 65536instead of the old131072.packages/agent-core/test/agent/config-state.test.ts'clamps the LLM completion cap to 128k for openai-compatible providers'test, which was written to document the old buggy behaviour (expect(requestMaxTokens).toBe(131072)formaxOutputSize: 384000). Now asserts384000with an updated comment.Checklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.