Skip to content

Chapter 9: Advanced Topics: RAG, Prompt Engineering, and Fine-Tuning #9

@yagop

Description

@yagop

Part of the Build Your Own Coding Agent tutorial. One issue = one chapter (chapters/09-advanced-topics.md) plus its examples/09-advanced-topics/ samples.

Goal (1 sentence): Close the gap between a working prototype and a production-grade coding agent by combining retrieval-augmented generation (which lets your agent answer questions grounded in a private codebase or knowledge base rather than cramming everything into a context window), disciplined prompt structure, and an honest take on model customization.

After this chapter you can

  • Build a RAG pipeline that chunks documents, embeds them with Voyage AI, retrieves top-k chunks by cosine similarity, and injects them into client.messages.create as either plain user content or typed document blocks that unlock Claude's citation output.
  • Apply prompt-engineering best practices - XML tags, role prompts, few-shot examples, chain-of-thought, and ordering content to match Claude's attention patterns - to reliably control Claude's output format and reasoning style.
  • Use the Citations feature to ground answers in retrieved documents via typed document content blocks and citations: { enabled: true }.
  • Understand cache_control caching tradeoffs as a prompting-layer alternative to fine-tuning, and confirm hits via usage.cache_read_input_tokens.
  • Decide when to fine-tune (Bedrock, Claude Haiku) versus when RAG plus prompt engineering suffices.

What to cover (ONE paragraph, not a list)

The chapter opens by wiring up a full RAG pipeline: chunking a document corpus, calling Voyage AI's embeddings API (the Anthropic API has no embeddings endpoint), storing vectors locally, computing cosine similarity to retrieve top-k chunks, and injecting them into client.messages.create. It then contrasts two injection modes - plain text in a user turn versus typed document content blocks with source: { type: "text", media_type: "text/plain", data: "..." } (or base64, URL, and file_id variants) and citations: { enabled: true } - showing that omitting citations: { enabled: true } suppresses citation output and an invalid source shape returns a 400. Prompt structure comes next: wrapping instructions in XML tags (<instructions>, <context>, <examples>), placing stable guidance in the system field, ordering content to match Claude's attention patterns, supplying 2-4 few-shot user/assistant pairs, and eliciting chain-of-thought either via the top-level thinking parameter (no beta header needed for adaptive thinking) or an explicit scratchpad prompt. The chapter closes by covering cache_control: { type: "ephemeral" } on large byte-stable system prompts and retrieved document blocks - confirm hits via usage.cache_read_input_tokens - as a prompting-layer alternative to fine-tuning, and situating fine-tuning honestly: the first-party Anthropic API exposes no general fine-tuning endpoint; Amazon Bedrock offers it for select Claude models such as Claude Haiku; RAG plus prompt engineering covers most use cases and should be the default.

Going deeper (optional asides - keep OFF the main line)

  • Bedrock fine-tuning workflow mechanics (data format, job creation, cost model).

Out of scope (defer - do NOT preview)

  • Vector-database hosting and production indexing strategies (later infrastructure chapter).
  • Evaluating retrieval quality (recall@k, NDCG) - deserves its own treatment.

Code samples - examples/09-advanced-topics/

  • rag-pipeline.ts - chunk + Voyage AI embeddings + top-k cosine retrieval, injected as user content.
  • citations-demo.ts - document blocks with source + citations: { enabled: true }; print citations.
  • prompt-engineering.ts - compare bare / XML+few-shot / chain-of-thought; log usage.
  • cache-large-prompt.ts - cache_control on a large stable prompt; check cache_read_input_tokens.
  • finetuning-tradeoffs.ts - RAG+prompt approach matching a fine-tuning use case; note when Bedrock fine-tuning is warranted.

Must-keep for a beginner (floor - never cut for brevity)

  • The run command for rag-pipeline.ts (the first sample).
  • "Never hardcode your key; it comes from the environment" (once, in prose) - this applies to both ANTHROPIC_API_KEY and the Voyage AI key.
  • Bun auto-loads .env; no loader config needed.
  • The non-obvious gotcha: omitting citations: { enabled: true } silently suppresses citation output; an invalid source shape returns a 400, not a validation warning.

Friendliness floor (never cut - terse is not friendly)

  • The chapter addresses the reader as "you", never "the user" or "one".
  • The intro AND at least one section open with a warm, second-person sentence.

Key APIs (flat list, reference only - NOT a coverage checklist)

client.messages.create, client.messages.countTokens, system, document blocks, source, citations, cache_control, usage.cache_read_input_tokens, usage.cache_creation_input_tokens, thinking, XML prompt structure, Voyage AI embeddings endpoint, Bedrock fine-tuning (Claude Haiku)

Prerequisites

Chapters 4 and 5. RAG samples need a VOYAGE_API_KEY in .env.

Definition of done

  • Chapter at chapters/09-advanced-topics.md, <=120 lines, <=4 main-line H2s plus an optional "What's next" closer (paste wc -l AND grep -c '^## ' in the PR).
  • Every sample runnable with bun run, imported via <<< @/examples/09-advanced-topics/file.ts, <=35 lines, comment:code <=0.30.
  • One-home rule held: no prose sentence restates an inline code comment.
  • Friendliness floor held: reader addressed as "you"; intro + >=1 section open warm.
  • Samples use only real @anthropic-ai/sdk surface; ASCII punctuation only.
  • Optional material lives in Going-deeper asides, not main-line H2s.
  • Linked from README.md and the .vitepress/config.ts sidebar; bun x vitepress build passes.
  • Citations source schema and fine-tuning availability verified against current docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions