Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (chapters/08-production.md) plus its examples/08-production/ samples.
Goal (1 sentence): Harden the agent for production: error handling, retries/timeouts, rate-limit-aware concurrency, observability, cost discipline, and deployment.
After this chapter you can
- Catch and handle the full SDK error hierarchy (
APIError, RateLimitError, APIConnectionError, APITimeoutError, AuthenticationError, BadRequestError) at the right level and decide whether to retry, back off, or surface to the user.
- Instrument every API call with structured logs that pair
request-id with usage cost fields (including cache_read_input_tokens), and control runaway spending with max_tokens and cache_control.
- Choose between a long-running process and a serverless deployment, configure Telegram webhooks for production traffic, and containerize the agent with a minimal Bun Dockerfile that loads secrets from the environment.
What to cover (ONE paragraph, not a list)
The chapter opens by walking you through the SDK error hierarchy - Anthropic.APIError as the base class and the named subclasses RateLimitError (429), APIConnectionError, APITimeoutError (a subclass of APIConnectionError), AuthenticationError, and BadRequestError - explaining when to retry versus surface. It then covers the SDK's built-in retry logic (408/409/429/5xx with exponential backoff, tuned via maxRetries; disabled with maxRetries: 0 for non-idempotent calls where a retry would duplicate stateful tool side effects), the timeout option on client.messages.create and client.messages.stream set as either a per-request override or a client-wide default (a fired timeout raises APITimeoutError, including mid-stream), and rate-limit-aware concurrency via the x-ratelimit-remaining-requests and x-ratelimit-reset-requests response headers and a token-bucket or queue-based limiter that respects retry-after. Next comes observability: extracting request-id via response._request_id or client.messages.withRawResponse.create(...), emitting structured log lines pairing request-id with usage.input_tokens, usage.output_tokens, and derived cost, and flagging cache hits via usage.cache_read_input_tokens. Cost discipline is also covered on the main line: choosing the right model per task (heavyweight reasoning vs. Haiku-tier triage), applying cache_control with type: "ephemeral" on stable system prompts and large tool schemas to minimize usage.input_tokens, and setting max_tokens conservatively to prevent runaway completions. The chapter closes by contrasting deployment shapes - a long-running Bun process with bot.startPolling() versus a serverless handler receiving Telegram webhook Update payloads over HTTPS (tradeoffs: cold-start latency, connection reuse, streaming compatibility) - and shows a minimal Dockerfile using the official Bun base image with a non-root user and ENV declarations, with all secrets (ANTHROPIC_API_KEY, TELEGRAM_BOT_TOKEN) loaded from environment variables, never hardcoded.
Going deeper (optional asides - keep OFF the main line)
- Advanced cache strategies: composing multiple
cache_control breakpoints for multi-turn conversations with large tool schemas.
Out of scope (defer - do NOT preview)
Code samples - examples/08-production/
Must-keep for a beginner (floor - never cut for brevity)
- The run command for the first sample.
- "Never hardcode your key; it comes from the environment" (once, in prose).
- Anything a beginner cannot infer from the code (e.g. Bun auto-loads
.env, no loader needed).
- The one genuinely non-obvious gotcha: the SDK already retries 408/409/429/5xx, so avoid double-retrying; use
maxRetries: 0 only for non-idempotent calls where a retry would duplicate stateful tool side effects.
Friendliness floor (never cut - terse is not friendly)
- The chapter addresses the reader as "you", never "the user" or "one".
- The intro AND at least one section open with a warm, second-person sentence.
Key APIs (flat list, reference only - NOT a coverage checklist)
Anthropic.APIError, RateLimitError, APIConnectionError, APITimeoutError, AuthenticationError, BadRequestError, maxRetries, timeout, response._request_id, client.messages.withRawResponse, usage.input_tokens, usage.output_tokens, usage.cache_read_input_tokens, cache_control, max_tokens, x-ratelimit-remaining-requests, x-ratelimit-reset-requests, retry-after
Prerequisites
Chapters 1-7. The webhook sample needs an HTTPS endpoint; the Dockerfile sample needs Docker.
Definition of done
Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (
chapters/08-production.md) plus itsexamples/08-production/samples.Goal (1 sentence): Harden the agent for production: error handling, retries/timeouts, rate-limit-aware concurrency, observability, cost discipline, and deployment.
After this chapter you can
APIError,RateLimitError,APIConnectionError,APITimeoutError,AuthenticationError,BadRequestError) at the right level and decide whether to retry, back off, or surface to the user.request-idwithusagecost fields (includingcache_read_input_tokens), and control runaway spending withmax_tokensandcache_control.What to cover (ONE paragraph, not a list)
The chapter opens by walking you through the SDK error hierarchy -
Anthropic.APIErroras the base class and the named subclassesRateLimitError(429),APIConnectionError,APITimeoutError(a subclass ofAPIConnectionError),AuthenticationError, andBadRequestError- explaining when to retry versus surface. It then covers the SDK's built-in retry logic (408/409/429/5xx with exponential backoff, tuned viamaxRetries; disabled withmaxRetries: 0for non-idempotent calls where a retry would duplicate stateful tool side effects), thetimeoutoption onclient.messages.createandclient.messages.streamset as either a per-request override or a client-wide default (a fired timeout raisesAPITimeoutError, including mid-stream), and rate-limit-aware concurrency via thex-ratelimit-remaining-requestsandx-ratelimit-reset-requestsresponse headers and a token-bucket or queue-based limiter that respectsretry-after. Next comes observability: extractingrequest-idviaresponse._request_idorclient.messages.withRawResponse.create(...), emitting structured log lines pairingrequest-idwithusage.input_tokens,usage.output_tokens, and derived cost, and flagging cache hits viausage.cache_read_input_tokens. Cost discipline is also covered on the main line: choosing the right model per task (heavyweight reasoning vs. Haiku-tier triage), applyingcache_controlwithtype: "ephemeral"on stable system prompts and large tool schemas to minimizeusage.input_tokens, and settingmax_tokensconservatively to prevent runaway completions. The chapter closes by contrasting deployment shapes - a long-running Bun process withbot.startPolling()versus a serverless handler receiving Telegram webhookUpdatepayloads over HTTPS (tradeoffs: cold-start latency, connection reuse, streaming compatibility) - and shows a minimalDockerfileusing the official Bun base image with a non-root user andENVdeclarations, with all secrets (ANTHROPIC_API_KEY,TELEGRAM_BOT_TOKEN) loaded from environment variables, never hardcoded.Going deeper (optional asides - keep OFF the main line)
cache_controlbreakpoints for multi-turn conversations with large tool schemas.Out of scope (defer - do NOT preview)
Code samples - examples/08-production/
error-handling.ts- try/catch ladder over the error subclasses; logrequest-id.retry-backoff.ts- manual backoff withmaxRetries: 0.concurrency-limiter.ts- async queue draining rate-limit headers.cost-tracker.ts- accumulateusage(incl.cache_read_input_tokens) per session.webhook-server.ts- Bun HTTP server receiving TelegramUpdatePOSTs (secret-validated).Dockerfile- multi-stage Bun build, non-root user, secrets via env.Must-keep for a beginner (floor - never cut for brevity)
.env, no loader needed).maxRetries: 0only for non-idempotent calls where a retry would duplicate stateful tool side effects.Friendliness floor (never cut - terse is not friendly)
Key APIs (flat list, reference only - NOT a coverage checklist)
Anthropic.APIError,RateLimitError,APIConnectionError,APITimeoutError,AuthenticationError,BadRequestError,maxRetries,timeout,response._request_id,client.messages.withRawResponse,usage.input_tokens,usage.output_tokens,usage.cache_read_input_tokens,cache_control,max_tokens,x-ratelimit-remaining-requests,x-ratelimit-reset-requests,retry-afterPrerequisites
Chapters 1-7. The webhook sample needs an HTTPS endpoint; the Dockerfile sample needs Docker.
Definition of done
chapters/08-production.md, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (pastewc -lANDgrep -c '^## 'in the PR).bun run, imported via<<< @/examples/08-production/file.ts, <=35 lines, comment:code <=0.30.@anthropic-ai/sdksurface; ASCII punctuation only.README.mdand the.vitepress/config.tssidebar;bun x vitepress buildpasses.