Skip to content

Chapter 8: Production and Deployment #8

@yagop

Description

@yagop

Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (chapters/08-production.md) plus its examples/08-production/ samples.

Goal (1 sentence): Harden the agent for production: error handling, retries/timeouts, rate-limit-aware concurrency, observability, cost discipline, and deployment.

After this chapter you can

  • Catch and handle the full SDK error hierarchy (APIError, RateLimitError, APIConnectionError, APITimeoutError, AuthenticationError, BadRequestError) at the right level and decide whether to retry, back off, or surface to the user.
  • Instrument every API call with structured logs that pair request-id with usage cost fields (including cache_read_input_tokens), and control runaway spending with max_tokens and cache_control.
  • Choose between a long-running process and a serverless deployment, configure Telegram webhooks for production traffic, and containerize the agent with a minimal Bun Dockerfile that loads secrets from the environment.

What to cover (ONE paragraph, not a list)

The chapter opens by walking you through the SDK error hierarchy - Anthropic.APIError as the base class and the named subclasses RateLimitError (429), APIConnectionError, APITimeoutError (a subclass of APIConnectionError), AuthenticationError, and BadRequestError - explaining when to retry versus surface. It then covers the SDK's built-in retry logic (408/409/429/5xx with exponential backoff, tuned via maxRetries; disabled with maxRetries: 0 for non-idempotent calls where a retry would duplicate stateful tool side effects), the timeout option on client.messages.create and client.messages.stream set as either a per-request override or a client-wide default (a fired timeout raises APITimeoutError, including mid-stream), and rate-limit-aware concurrency via the x-ratelimit-remaining-requests and x-ratelimit-reset-requests response headers and a token-bucket or queue-based limiter that respects retry-after. Next comes observability: extracting request-id via response._request_id or client.messages.withRawResponse.create(...), emitting structured log lines pairing request-id with usage.input_tokens, usage.output_tokens, and derived cost, and flagging cache hits via usage.cache_read_input_tokens. Cost discipline is also covered on the main line: choosing the right model per task (heavyweight reasoning vs. Haiku-tier triage), applying cache_control with type: "ephemeral" on stable system prompts and large tool schemas to minimize usage.input_tokens, and setting max_tokens conservatively to prevent runaway completions. The chapter closes by contrasting deployment shapes - a long-running Bun process with bot.startPolling() versus a serverless handler receiving Telegram webhook Update payloads over HTTPS (tradeoffs: cold-start latency, connection reuse, streaming compatibility) - and shows a minimal Dockerfile using the official Bun base image with a non-root user and ENV declarations, with all secrets (ANTHROPIC_API_KEY, TELEGRAM_BOT_TOKEN) loaded from environment variables, never hardcoded.

Going deeper (optional asides - keep OFF the main line)

  • Advanced cache strategies: composing multiple cache_control breakpoints for multi-turn conversations with large tool schemas.

Out of scope (defer - do NOT preview)

  • None

Code samples - examples/08-production/

  • error-handling.ts - try/catch ladder over the error subclasses; log request-id.
  • retry-backoff.ts - manual backoff with maxRetries: 0.
  • concurrency-limiter.ts - async queue draining rate-limit headers.
  • cost-tracker.ts - accumulate usage (incl. cache_read_input_tokens) per session.
  • webhook-server.ts - Bun HTTP server receiving Telegram Update POSTs (secret-validated).
  • Dockerfile - multi-stage Bun build, non-root user, secrets via env.

Must-keep for a beginner (floor - never cut for brevity)

  • The run command for the first sample.
  • "Never hardcode your key; it comes from the environment" (once, in prose).
  • Anything a beginner cannot infer from the code (e.g. Bun auto-loads .env, no loader needed).
  • The one genuinely non-obvious gotcha: the SDK already retries 408/409/429/5xx, so avoid double-retrying; use maxRetries: 0 only for non-idempotent calls where a retry would duplicate stateful tool side effects.

Friendliness floor (never cut - terse is not friendly)

  • The chapter addresses the reader as "you", never "the user" or "one".
  • The intro AND at least one section open with a warm, second-person sentence.

Key APIs (flat list, reference only - NOT a coverage checklist)

Anthropic.APIError, RateLimitError, APIConnectionError, APITimeoutError, AuthenticationError, BadRequestError, maxRetries, timeout, response._request_id, client.messages.withRawResponse, usage.input_tokens, usage.output_tokens, usage.cache_read_input_tokens, cache_control, max_tokens, x-ratelimit-remaining-requests, x-ratelimit-reset-requests, retry-after

Prerequisites

Chapters 1-7. The webhook sample needs an HTTPS endpoint; the Dockerfile sample needs Docker.

Definition of done

  • Chapter at chapters/08-production.md, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (paste wc -l AND grep -c '^## ' in the PR).
  • Every sample runnable with bun run, imported via <<< @/examples/08-production/file.ts, <=35 lines, comment:code <=0.30.
  • One-home rule held: no prose sentence restates an inline code comment.
  • Friendliness floor held: reader addressed as "you"; intro + >=1 section open warm.
  • Samples use only real @anthropic-ai/sdk surface; ASCII punctuation only.
  • Optional material lives in Going-deeper asides, not main-line H2s.
  • Linked from README.md and the .vitepress/config.ts sidebar; bun x vitepress build passes.
  • Error/retry behavior verified against current SDK docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions