Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (chapters/09-advanced-topics.md) plus its examples/09-advanced-topics/ samples.
Goal (1 sentence): Close the gap between a working prototype and a production-grade coding agent by combining retrieval-augmented generation (which lets your agent answer questions grounded in a private codebase or knowledge base rather than cramming everything into a context window), disciplined prompt structure, and an honest take on model customization.
After this chapter you can
- Build a RAG pipeline that chunks documents, embeds them with Voyage AI, retrieves top-k chunks by cosine similarity, and injects them into
client.messages.create as either plain user content or typed document blocks that unlock Claude's citation output.
- Apply prompt-engineering best practices - XML tags, role prompts, few-shot examples, chain-of-thought, and ordering content to match Claude's attention patterns - to reliably control Claude's output format and reasoning style.
- Use the Citations feature to ground answers in retrieved documents via typed
document content blocks and citations: { enabled: true }.
- Understand
cache_control caching tradeoffs as a prompting-layer alternative to fine-tuning, and confirm hits via usage.cache_read_input_tokens.
- Decide when to fine-tune (Bedrock, Claude Haiku) versus when RAG plus prompt engineering suffices.
What to cover (ONE paragraph, not a list)
The chapter opens by wiring up a full RAG pipeline: chunking a document corpus, calling Voyage AI's embeddings API (the Anthropic API has no embeddings endpoint), storing vectors locally, computing cosine similarity to retrieve top-k chunks, and injecting them into client.messages.create. It then contrasts two injection modes - plain text in a user turn versus typed document content blocks with source: { type: "text", media_type: "text/plain", data: "..." } (or base64, URL, and file_id variants) and citations: { enabled: true } - showing that omitting citations: { enabled: true } suppresses citation output and an invalid source shape returns a 400. Prompt structure comes next: wrapping instructions in XML tags (<instructions>, <context>, <examples>), placing stable guidance in the system field, ordering content to match Claude's attention patterns, supplying 2-4 few-shot user/assistant pairs, and eliciting chain-of-thought either via the top-level thinking parameter (no beta header needed for adaptive thinking) or an explicit scratchpad prompt. The chapter closes by covering cache_control: { type: "ephemeral" } on large byte-stable system prompts and retrieved document blocks - confirm hits via usage.cache_read_input_tokens - as a prompting-layer alternative to fine-tuning, and situating fine-tuning honestly: the first-party Anthropic API exposes no general fine-tuning endpoint; Amazon Bedrock offers it for select Claude models such as Claude Haiku; RAG plus prompt engineering covers most use cases and should be the default.
Going deeper (optional asides - keep OFF the main line)
- Bedrock fine-tuning workflow mechanics (data format, job creation, cost model).
Out of scope (defer - do NOT preview)
- Vector-database hosting and production indexing strategies (later infrastructure chapter).
- Evaluating retrieval quality (recall@k, NDCG) - deserves its own treatment.
Code samples - examples/09-advanced-topics/
Must-keep for a beginner (floor - never cut for brevity)
- The run command for
rag-pipeline.ts (the first sample).
- "Never hardcode your key; it comes from the environment" (once, in prose) - this applies to both
ANTHROPIC_API_KEY and the Voyage AI key.
- Bun auto-loads
.env; no loader config needed.
- The non-obvious gotcha: omitting
citations: { enabled: true } silently suppresses citation output; an invalid source shape returns a 400, not a validation warning.
Friendliness floor (never cut - terse is not friendly)
- The chapter addresses the reader as "you", never "the user" or "one".
- The intro AND at least one section open with a warm, second-person sentence.
Key APIs (flat list, reference only - NOT a coverage checklist)
client.messages.create, client.messages.countTokens, system, document blocks, source, citations, cache_control, usage.cache_read_input_tokens, usage.cache_creation_input_tokens, thinking, XML prompt structure, Voyage AI embeddings endpoint, Bedrock fine-tuning (Claude Haiku)
Prerequisites
Chapters 4 and 5. RAG samples need a VOYAGE_API_KEY in .env.
Definition of done
Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (
chapters/09-advanced-topics.md) plus itsexamples/09-advanced-topics/samples.Goal (1 sentence): Close the gap between a working prototype and a production-grade coding agent by combining retrieval-augmented generation (which lets your agent answer questions grounded in a private codebase or knowledge base rather than cramming everything into a context window), disciplined prompt structure, and an honest take on model customization.
After this chapter you can
client.messages.createas either plainusercontent or typeddocumentblocks that unlock Claude's citation output.documentcontent blocks andcitations: { enabled: true }.cache_controlcaching tradeoffs as a prompting-layer alternative to fine-tuning, and confirm hits viausage.cache_read_input_tokens.What to cover (ONE paragraph, not a list)
The chapter opens by wiring up a full RAG pipeline: chunking a document corpus, calling Voyage AI's embeddings API (the Anthropic API has no embeddings endpoint), storing vectors locally, computing cosine similarity to retrieve top-k chunks, and injecting them into
client.messages.create. It then contrasts two injection modes - plain text in auserturn versus typeddocumentcontent blocks withsource: { type: "text", media_type: "text/plain", data: "..." }(or base64, URL, andfile_idvariants) andcitations: { enabled: true }- showing that omittingcitations: { enabled: true }suppresses citation output and an invalidsourceshape returns a 400. Prompt structure comes next: wrapping instructions in XML tags (<instructions>,<context>,<examples>), placing stable guidance in thesystemfield, ordering content to match Claude's attention patterns, supplying 2-4 few-shotuser/assistantpairs, and eliciting chain-of-thought either via the top-levelthinkingparameter (no beta header needed for adaptive thinking) or an explicit scratchpad prompt. The chapter closes by coveringcache_control: { type: "ephemeral" }on large byte-stable system prompts and retrieved document blocks - confirm hits viausage.cache_read_input_tokens- as a prompting-layer alternative to fine-tuning, and situating fine-tuning honestly: the first-party Anthropic API exposes no general fine-tuning endpoint; Amazon Bedrock offers it for select Claude models such as Claude Haiku; RAG plus prompt engineering covers most use cases and should be the default.Going deeper (optional asides - keep OFF the main line)
Out of scope (defer - do NOT preview)
Code samples - examples/09-advanced-topics/
rag-pipeline.ts- chunk + Voyage AI embeddings + top-k cosine retrieval, injected asusercontent.citations-demo.ts-documentblocks withsource+citations: { enabled: true }; print citations.prompt-engineering.ts- compare bare / XML+few-shot / chain-of-thought; logusage.cache-large-prompt.ts-cache_controlon a large stable prompt; checkcache_read_input_tokens.finetuning-tradeoffs.ts- RAG+prompt approach matching a fine-tuning use case; note when Bedrock fine-tuning is warranted.Must-keep for a beginner (floor - never cut for brevity)
rag-pipeline.ts(the first sample).ANTHROPIC_API_KEYand the Voyage AI key..env; no loader config needed.citations: { enabled: true }silently suppresses citation output; an invalidsourceshape returns a 400, not a validation warning.Friendliness floor (never cut - terse is not friendly)
Key APIs (flat list, reference only - NOT a coverage checklist)
client.messages.create,client.messages.countTokens,system,documentblocks,source,citations,cache_control,usage.cache_read_input_tokens,usage.cache_creation_input_tokens,thinking, XML prompt structure, Voyage AI embeddings endpoint, Bedrock fine-tuning (Claude Haiku)Prerequisites
Chapters 4 and 5. RAG samples need a
VOYAGE_API_KEYin.env.Definition of done
chapters/09-advanced-topics.md, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (pastewc -lANDgrep -c '^## 'in the PR).bun run, imported via<<< @/examples/09-advanced-topics/file.ts, <=35 lines, comment:code <=0.30.@anthropic-ai/sdksurface; ASCII punctuation only.README.mdand the.vitepress/config.tssidebar;bun x vitepress buildpasses.sourceschema and fine-tuning availability verified against current docs.