Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (chapters/04-context.md) plus its examples/04-context/ samples.
Goal (1 sentence): Build the stateful layer over the stateless Messages API - history, context-window limits, prompt caching, and per-user sessions.
After this chapter you can
- Maintain a correctly alternating
messages array across turns, count tokens before sending, and keep the conversation inside the model's context window.
- Set and tune a
system prompt for persona and constraints, using a string or a block array with cache_control.
- Apply
cache_control on stable prefixes to cut latency and cost, and verify cache hits via usage.cache_read_input_tokens.
- Isolate per-user sessions in memory and persist them to disk so history survives restarts.
What to cover (ONE paragraph, not a list)
The Anthropic Messages API is stateless: every call to client.messages.create must supply the full conversation history as the messages array, with strictly alternating role: "user" and role: "assistant" turns where each content field accepts a string or an array of content blocks (text, tool_use, tool_result). After each call you extract response.content and push it back as an assistant turn before the next user message; a system string or array of text blocks sets persona and stable background context without occupying a turn. Before sending, call client.messages.countTokens with the same model, system, messages, and tools payload, then branch on the result - trimming oldest pairs (always in pairs to preserve alternation), running a rolling window of the last N turns, or calling create with a summarize instruction and replacing the accumulated turns with a single injected user/assistant summary pair - tying the threshold to your model via client.models.retrieve rather than hard-coding a number. Prompt caching adds cache_control: { type: "ephemeral" } to the final text block of the system array or large stable user turns; the prefix must exceed the model's minimum cacheable size (about 4096 tokens on Opus 4.x and Haiku 4.5, about 2048 on Sonnet 4.6) and must be byte-stable across requests or it silently no-ops. Finally, per-user sessions key an in-memory Map by Telegram chat.id and serialize to a JSON file so history survives bot restarts.
Going deeper (optional asides - keep OFF the main line)
Out of scope (defer - do NOT preview)
Code samples - examples/04-context/
Must-keep for a beginner (floor - never cut for brevity)
- The run command for the first sample.
- "Never hardcode your key; it comes from the environment" (once, in prose).
- Bun auto-loads
.env; no loader needed.
- Cache minimum sizes: ~4096 tokens on Opus 4.x and Haiku 4.5, ~2048 on Sonnet 4.6 - a changing value (e.g. a timestamp) invalidates the cache silently.
- Trimming must drop pairs (user + assistant together) to preserve alternation - never drop one side alone.
Friendliness floor (never cut - terse is not friendly)
- The chapter addresses the reader as "you", never "the user" or "one".
- The intro AND at least one section open with a warm, second-person sentence.
Key APIs (flat list, reference only)
client.messages.create, client.messages.countTokens, client.models.retrieve, messages, system, cache_control: { type: "ephemeral" }, usage.cache_read_input_tokens, usage.cache_creation_input_tokens
Prerequisites
Chapters 1-3. The Telegram session sample reuses the Chapter 3 bot token.
Definition of done
Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (
chapters/04-context.md) plus itsexamples/04-context/samples.Goal (1 sentence): Build the stateful layer over the stateless Messages API - history, context-window limits, prompt caching, and per-user sessions.
After this chapter you can
messagesarray across turns, count tokens before sending, and keep the conversation inside the model's context window.systemprompt for persona and constraints, using a string or a block array withcache_control.cache_controlon stable prefixes to cut latency and cost, and verify cache hits viausage.cache_read_input_tokens.What to cover (ONE paragraph, not a list)
The Anthropic Messages API is stateless: every call to
client.messages.createmust supply the full conversation history as themessagesarray, with strictly alternatingrole: "user"androle: "assistant"turns where eachcontentfield accepts a string or an array of content blocks (text,tool_use,tool_result). After each call you extractresponse.contentand push it back as anassistantturn before the next user message; asystemstring or array oftextblocks sets persona and stable background context without occupying a turn. Before sending, callclient.messages.countTokenswith the samemodel,system,messages, andtoolspayload, then branch on the result - trimming oldest pairs (always in pairs to preserve alternation), running a rolling window of the last N turns, or callingcreatewith a summarize instruction and replacing the accumulated turns with a single injecteduser/assistantsummary pair - tying the threshold to your model viaclient.models.retrieverather than hard-coding a number. Prompt caching addscache_control: { type: "ephemeral" }to the finaltextblock of thesystemarray or large stableuserturns; the prefix must exceed the model's minimum cacheable size (about 4096 tokens on Opus 4.x and Haiku 4.5, about 2048 on Sonnet 4.6) and must be byte-stable across requests or it silently no-ops. Finally, per-user sessions key an in-memoryMapby Telegramchat.idand serialize to a JSON file so history survives bot restarts.Going deeper (optional asides - keep OFF the main line)
Out of scope (defer - do NOT preview)
Code samples - examples/04-context/
multi-turn.ts- append assistantcontentback intomessages; print turn count.system-prompt.ts-systemstring vs block array; verify persona across turns.token-counter.ts-countTokensbeforecreate; rolling-window trim at a model-specific threshold.summarize-history.ts- summarize and replace oversized history.prompt-cache.ts-cache_controlon a large, byte-stable system block; printcache_read_input_tokens.telegram-sessions.ts-Mapbychat.id, persisted tosessions.json.Must-keep for a beginner (floor - never cut for brevity)
.env; no loader needed.Friendliness floor (never cut - terse is not friendly)
Key APIs (flat list, reference only)
client.messages.create,client.messages.countTokens,client.models.retrieve,messages,system,cache_control: { type: "ephemeral" },usage.cache_read_input_tokens,usage.cache_creation_input_tokensPrerequisites
Chapters 1-3. The Telegram session sample reuses the Chapter 3 bot token.
Definition of done
chapters/04-context.md, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (pastewc -lANDgrep -c '^## 'in the PR).bun run, imported via<<< @/examples/04-context/file.ts, <=35 lines, comment:code <=0.30.@anthropic-ai/sdksurface; ASCII punctuation only.README.mdand the.vitepress/config.tssidebar;bun x vitepress buildpasses.cache_read_input_tokens.