Skip to content

Chapter 4: Context and Conversation Management #4

@yagop

Description

@yagop

Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (chapters/04-context.md) plus its examples/04-context/ samples.

Goal (1 sentence): Build the stateful layer over the stateless Messages API - history, context-window limits, prompt caching, and per-user sessions.

After this chapter you can

  • Maintain a correctly alternating messages array across turns, count tokens before sending, and keep the conversation inside the model's context window.
  • Set and tune a system prompt for persona and constraints, using a string or a block array with cache_control.
  • Apply cache_control on stable prefixes to cut latency and cost, and verify cache hits via usage.cache_read_input_tokens.
  • Isolate per-user sessions in memory and persist them to disk so history survives restarts.

What to cover (ONE paragraph, not a list)

The Anthropic Messages API is stateless: every call to client.messages.create must supply the full conversation history as the messages array, with strictly alternating role: "user" and role: "assistant" turns where each content field accepts a string or an array of content blocks (text, tool_use, tool_result). After each call you extract response.content and push it back as an assistant turn before the next user message; a system string or array of text blocks sets persona and stable background context without occupying a turn. Before sending, call client.messages.countTokens with the same model, system, messages, and tools payload, then branch on the result - trimming oldest pairs (always in pairs to preserve alternation), running a rolling window of the last N turns, or calling create with a summarize instruction and replacing the accumulated turns with a single injected user/assistant summary pair - tying the threshold to your model via client.models.retrieve rather than hard-coding a number. Prompt caching adds cache_control: { type: "ephemeral" } to the final text block of the system array or large stable user turns; the prefix must exceed the model's minimum cacheable size (about 4096 tokens on Opus 4.x and Haiku 4.5, about 2048 on Sonnet 4.6) and must be byte-stable across requests or it silently no-ops. Finally, per-user sessions key an in-memory Map by Telegram chat.id and serialize to a JSON file so history survives bot restarts.

Going deeper (optional asides - keep OFF the main line)

  • None

Out of scope (defer - do NOT preview)

  • None

Code samples - examples/04-context/

  • multi-turn.ts - append assistant content back into messages; print turn count.
  • system-prompt.ts - system string vs block array; verify persona across turns.
  • token-counter.ts - countTokens before create; rolling-window trim at a model-specific threshold.
  • summarize-history.ts - summarize and replace oversized history.
  • prompt-cache.ts - cache_control on a large, byte-stable system block; print cache_read_input_tokens.
  • telegram-sessions.ts - Map by chat.id, persisted to sessions.json.

Must-keep for a beginner (floor - never cut for brevity)

  • The run command for the first sample.
  • "Never hardcode your key; it comes from the environment" (once, in prose).
  • Bun auto-loads .env; no loader needed.
  • Cache minimum sizes: ~4096 tokens on Opus 4.x and Haiku 4.5, ~2048 on Sonnet 4.6 - a changing value (e.g. a timestamp) invalidates the cache silently.
  • Trimming must drop pairs (user + assistant together) to preserve alternation - never drop one side alone.

Friendliness floor (never cut - terse is not friendly)

  • The chapter addresses the reader as "you", never "the user" or "one".
  • The intro AND at least one section open with a warm, second-person sentence.

Key APIs (flat list, reference only)

client.messages.create, client.messages.countTokens, client.models.retrieve, messages, system, cache_control: { type: "ephemeral" }, usage.cache_read_input_tokens, usage.cache_creation_input_tokens

Prerequisites

Chapters 1-3. The Telegram session sample reuses the Chapter 3 bot token.

Definition of done

  • Chapter at chapters/04-context.md, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (paste wc -l AND grep -c '^## ' in the PR).
  • Every sample runnable with bun run, imported via <<< @/examples/04-context/file.ts, <=35 lines, comment:code <=0.30.
  • One-home rule held: no prose sentence restates an inline code comment.
  • Friendliness floor held: reader addressed as "you"; intro + >=1 section open warm.
  • Samples use only real @anthropic-ai/sdk surface; ASCII punctuation only.
  • Optional material lives in Going-deeper asides, not main-line H2s.
  • Linked from README.md and the .vitepress/config.ts sidebar; bun x vitepress build passes.
  • Caching sample actually shows a non-zero cache_read_input_tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions