Chapter 8: Production and Deployment

Part of the **Build Your Own Coding Agent** tutorial. One issue = one chapter (`chapters/08-production.md`) plus its `examples/08-production/` samples.



**Goal (1 sentence):** Harden the agent for production: error handling, retries/timeouts, rate-limit-aware concurrency, observability, cost discipline, and deployment.

### After this chapter you can
- Catch and handle the full SDK error hierarchy (`APIError`, `RateLimitError`, `APIConnectionError`, `APITimeoutError`, `AuthenticationError`, `BadRequestError`) at the right level and decide whether to retry, back off, or surface to the user.
- Instrument every API call with structured logs that pair `request-id` with `usage` cost fields (including `cache_read_input_tokens`), and control runaway spending with `max_tokens` and `cache_control`.
- Choose between a long-running process and a serverless deployment, configure Telegram webhooks for production traffic, and containerize the agent with a minimal Bun Dockerfile that loads secrets from the environment.

### What to cover (ONE paragraph, not a list)
The chapter opens by walking you through the SDK error hierarchy - `Anthropic.APIError` as the base class and the named subclasses `RateLimitError` (429), `APIConnectionError`, `APITimeoutError` (a subclass of `APIConnectionError`), `AuthenticationError`, and `BadRequestError` - explaining when to retry versus surface. It then covers the SDK's built-in retry logic (408/409/429/5xx with exponential backoff, tuned via `maxRetries`; disabled with `maxRetries: 0` for non-idempotent calls where a retry would duplicate stateful tool side effects), the `timeout` option on `client.messages.create` and `client.messages.stream` set as either a per-request override or a client-wide default (a fired timeout raises `APITimeoutError`, including mid-stream), and rate-limit-aware concurrency via the `x-ratelimit-remaining-requests` and `x-ratelimit-reset-requests` response headers and a token-bucket or queue-based limiter that respects `retry-after`. Next comes observability: extracting `request-id` via `response._request_id` or `client.messages.withRawResponse.create(...)`, emitting structured log lines pairing `request-id` with `usage.input_tokens`, `usage.output_tokens`, and derived cost, and flagging cache hits via `usage.cache_read_input_tokens`. Cost discipline is also covered on the main line: choosing the right model per task (heavyweight reasoning vs. Haiku-tier triage), applying `cache_control` with `type: "ephemeral"` on stable system prompts and large tool schemas to minimize `usage.input_tokens`, and setting `max_tokens` conservatively to prevent runaway completions. The chapter closes by contrasting deployment shapes - a long-running Bun process with `bot.startPolling()` versus a serverless handler receiving Telegram webhook `Update` payloads over HTTPS (tradeoffs: cold-start latency, connection reuse, streaming compatibility) - and shows a minimal `Dockerfile` using the official Bun base image with a non-root user and `ENV` declarations, with all secrets (`ANTHROPIC_API_KEY`, `TELEGRAM_BOT_TOKEN`) loaded from environment variables, never hardcoded.

### Going deeper (optional asides - keep OFF the main line)
- Advanced cache strategies: composing multiple `cache_control` breakpoints for multi-turn conversations with large tool schemas.

### Out of scope (defer - do NOT preview)
- None

### Code samples - examples/08-production/
- [ ] `error-handling.ts` - try/catch ladder over the error subclasses; log `request-id`.
- [ ] `retry-backoff.ts` - manual backoff with `maxRetries: 0`.
- [ ] `concurrency-limiter.ts` - async queue draining rate-limit headers.
- [ ] `cost-tracker.ts` - accumulate `usage` (incl. `cache_read_input_tokens`) per session.
- [ ] `webhook-server.ts` - Bun HTTP server receiving Telegram `Update` POSTs (secret-validated).
- [ ] `Dockerfile` - multi-stage Bun build, non-root user, secrets via env.

### Must-keep for a beginner (floor - never cut for brevity)
- The run command for the first sample.
- "Never hardcode your key; it comes from the environment" (once, in prose).
- Anything a beginner cannot infer from the code (e.g. Bun auto-loads `.env`, no loader needed).
- The one genuinely non-obvious gotcha: the SDK already retries 408/409/429/5xx, so avoid double-retrying; use `maxRetries: 0` only for non-idempotent calls where a retry would duplicate stateful tool side effects.

### Friendliness floor (never cut - terse is not friendly)
- The chapter addresses the reader as "you", never "the user" or "one".
- The intro AND at least one section open with a warm, second-person sentence.

### Key APIs (flat list, reference only - NOT a coverage checklist)
`Anthropic.APIError`, `RateLimitError`, `APIConnectionError`, `APITimeoutError`, `AuthenticationError`, `BadRequestError`, `maxRetries`, `timeout`, `response._request_id`, `client.messages.withRawResponse`, `usage.input_tokens`, `usage.output_tokens`, `usage.cache_read_input_tokens`, `cache_control`, `max_tokens`, `x-ratelimit-remaining-requests`, `x-ratelimit-reset-requests`, `retry-after`

### Prerequisites
Chapters 1-7. The webhook sample needs an HTTPS endpoint; the Dockerfile sample needs Docker.

### Definition of done
- [ ] Chapter at `chapters/08-production.md`, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (paste `wc -l` AND `grep -c '^## '` in the PR).
- [ ] Every sample runnable with `bun run`, imported via `<<< @/examples/08-production/file.ts`, <=35 lines, comment:code <=0.30.
- [ ] One-home rule held: no prose sentence restates an inline code comment.
- [ ] Friendliness floor held: reader addressed as "you"; intro + >=1 section open warm.
- [ ] Samples use only real `@anthropic-ai/sdk` surface; ASCII punctuation only.
- [ ] Optional material lives in Going-deeper asides, not main-line H2s.
- [ ] Linked from `README.md` and the `.vitepress/config.ts` sidebar; `bun x vitepress build` passes.
- [ ] Error/retry behavior verified against current SDK docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 8: Production and Deployment #8

After this chapter you can

What to cover (ONE paragraph, not a list)

Going deeper (optional asides - keep OFF the main line)

Out of scope (defer - do NOT preview)

Code samples - examples/08-production/

Must-keep for a beginner (floor - never cut for brevity)

Friendliness floor (never cut - terse is not friendly)

Key APIs (flat list, reference only - NOT a coverage checklist)

Prerequisites

Definition of done

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chapter 8: Production and Deployment #8

Description

After this chapter you can

What to cover (ONE paragraph, not a list)

Going deeper (optional asides - keep OFF the main line)

Out of scope (defer - do NOT preview)

Code samples - examples/08-production/

Must-keep for a beginner (floor - never cut for brevity)

Friendliness floor (never cut - terse is not friendly)

Key APIs (flat list, reference only - NOT a coverage checklist)

Prerequisites

Definition of done

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions