Chapter 9: Advanced Topics: RAG, Prompt Engineering, and Fine-Tuning

Part of the **Build Your Own Coding Agent** tutorial. One issue = one chapter (`chapters/09-advanced-topics.md`) plus its `examples/09-advanced-topics/` samples.



**Goal (1 sentence):** Close the gap between a working prototype and a production-grade coding agent by combining retrieval-augmented generation (which lets your agent answer questions grounded in a private codebase or knowledge base rather than cramming everything into a context window), disciplined prompt structure, and an honest take on model customization.

### After this chapter you can
- Build a RAG pipeline that chunks documents, embeds them with Voyage AI, retrieves top-k chunks by cosine similarity, and injects them into `client.messages.create` as either plain `user` content or typed `document` blocks that unlock Claude's citation output.
- Apply prompt-engineering best practices - XML tags, role prompts, few-shot examples, chain-of-thought, and ordering content to match Claude's attention patterns - to reliably control Claude's output format and reasoning style.
- Use the Citations feature to ground answers in retrieved documents via typed `document` content blocks and `citations: { enabled: true }`.
- Understand `cache_control` caching tradeoffs as a prompting-layer alternative to fine-tuning, and confirm hits via `usage.cache_read_input_tokens`.
- Decide when to fine-tune (Bedrock, Claude Haiku) versus when RAG plus prompt engineering suffices.

### What to cover (ONE paragraph, not a list)
The chapter opens by wiring up a full RAG pipeline: chunking a document corpus, calling Voyage AI's embeddings API (the Anthropic API has no embeddings endpoint), storing vectors locally, computing cosine similarity to retrieve top-k chunks, and injecting them into `client.messages.create`. It then contrasts two injection modes - plain text in a `user` turn versus typed `document` content blocks with `source: { type: "text", media_type: "text/plain", data: "..." }` (or base64, URL, and `file_id` variants) and `citations: { enabled: true }` - showing that omitting `citations: { enabled: true }` suppresses citation output and an invalid `source` shape returns a 400. Prompt structure comes next: wrapping instructions in XML tags (`<instructions>`, `<context>`, `<examples>`), placing stable guidance in the `system` field, ordering content to match Claude's attention patterns, supplying 2-4 few-shot `user`/`assistant` pairs, and eliciting chain-of-thought either via the top-level `thinking` parameter (no beta header needed for adaptive thinking) or an explicit scratchpad prompt. The chapter closes by covering `cache_control: { type: "ephemeral" }` on large byte-stable system prompts and retrieved document blocks - confirm hits via `usage.cache_read_input_tokens` - as a prompting-layer alternative to fine-tuning, and situating fine-tuning honestly: the first-party Anthropic API exposes no general fine-tuning endpoint; Amazon Bedrock offers it for select Claude models such as Claude Haiku; RAG plus prompt engineering covers most use cases and should be the default.

### Going deeper (optional asides - keep OFF the main line)
- Bedrock fine-tuning workflow mechanics (data format, job creation, cost model).

### Out of scope (defer - do NOT preview)
- Vector-database hosting and production indexing strategies (later infrastructure chapter).
- Evaluating retrieval quality (recall@k, NDCG) - deserves its own treatment.

### Code samples - examples/09-advanced-topics/
- [ ] `rag-pipeline.ts` - chunk + Voyage AI embeddings + top-k cosine retrieval, injected as `user` content.
- [ ] `citations-demo.ts` - `document` blocks with `source` + `citations: { enabled: true }`; print citations.
- [ ] `prompt-engineering.ts` - compare bare / XML+few-shot / chain-of-thought; log `usage`.
- [ ] `cache-large-prompt.ts` - `cache_control` on a large stable prompt; check `cache_read_input_tokens`.
- [ ] `finetuning-tradeoffs.ts` - RAG+prompt approach matching a fine-tuning use case; note when Bedrock fine-tuning is warranted.

### Must-keep for a beginner (floor - never cut for brevity)
- The run command for `rag-pipeline.ts` (the first sample).
- "Never hardcode your key; it comes from the environment" (once, in prose) - this applies to both `ANTHROPIC_API_KEY` and the Voyage AI key.
- Bun auto-loads `.env`; no loader config needed.
- The non-obvious gotcha: omitting `citations: { enabled: true }` silently suppresses citation output; an invalid `source` shape returns a 400, not a validation warning.

### Friendliness floor (never cut - terse is not friendly)
- The chapter addresses the reader as "you", never "the user" or "one".
- The intro AND at least one section open with a warm, second-person sentence.

### Key APIs (flat list, reference only - NOT a coverage checklist)
`client.messages.create`, `client.messages.countTokens`, `system`, `document` blocks, `source`, `citations`, `cache_control`, `usage.cache_read_input_tokens`, `usage.cache_creation_input_tokens`, `thinking`, XML prompt structure, Voyage AI embeddings endpoint, Bedrock fine-tuning (Claude Haiku)

### Prerequisites
Chapters 4 and 5. RAG samples need a `VOYAGE_API_KEY` in `.env`.

### Definition of done
- [ ] Chapter at `chapters/09-advanced-topics.md`, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (paste `wc -l` AND `grep -c '^## '` in the PR).
- [ ] Every sample runnable with `bun run`, imported via `<<< @/examples/09-advanced-topics/file.ts`, <=35 lines, comment:code <=0.30.
- [ ] One-home rule held: no prose sentence restates an inline code comment.
- [ ] Friendliness floor held: reader addressed as "you"; intro + >=1 section open warm.
- [ ] Samples use only real `@anthropic-ai/sdk` surface; ASCII punctuation only.
- [ ] Optional material lives in Going-deeper asides, not main-line H2s.
- [ ] Linked from `README.md` and the `.vitepress/config.ts` sidebar; `bun x vitepress build` passes.
- [ ] Citations `source` schema and fine-tuning availability verified against current docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 9: Advanced Topics: RAG, Prompt Engineering, and Fine-Tuning #9

After this chapter you can

What to cover (ONE paragraph, not a list)

Going deeper (optional asides - keep OFF the main line)

Out of scope (defer - do NOT preview)

Code samples - examples/09-advanced-topics/

Must-keep for a beginner (floor - never cut for brevity)

Friendliness floor (never cut - terse is not friendly)

Key APIs (flat list, reference only - NOT a coverage checklist)

Prerequisites

Definition of done

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Chapter 9: Advanced Topics: RAG, Prompt Engineering, and Fine-Tuning #9

Description

After this chapter you can

What to cover (ONE paragraph, not a list)

Going deeper (optional asides - keep OFF the main line)

Out of scope (defer - do NOT preview)

Code samples - examples/09-advanced-topics/

Must-keep for a beginner (floor - never cut for brevity)

Friendliness floor (never cut - terse is not friendly)

Key APIs (flat list, reference only - NOT a coverage checklist)

Prerequisites

Definition of done

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions