Skip to content

docs(adr): ADR-0038 Build Verification Loop — the agent builds, verifies, and corrects itself#1710

Merged
os-zhuang merged 1 commit into
mainfrom
adr-0038-build-verification-loop
Jun 11, 2026
Merged

docs(adr): ADR-0038 Build Verification Loop — the agent builds, verifies, and corrects itself#1710
os-zhuang merged 1 commit into
mainfrom
adr-0038-build-verification-loop

Conversation

@os-zhuang

Copy link
Copy Markdown
Contributor

Design center: never make correctness depend on a human looking. Humans won't review, and the magic moment auto-publishes before they could — the agent that builds an app must be the same loop that verifies and corrects it.

Grounded in this week's live data: six agent-authored defects shipped to staging in one day — every one passed schema validation, every one was found by a human manually browsing (dangling dataset refs, measure-name mismatches, seeds not materializing, queries returning 0 on populated objects, silent seed failures, views rendering as "Unknown component type"). Schema-valid ≠ renders ≠ returns data ≠ matches intent — four separate verification planes, of which only the first exists today.

Five layers, one issues[] contract consumed by the agent, the chat health-card, and the eval store:

  • L1 draft-time cross-artifact graph lint (same-turn feedback into the tool envelope)
  • L2 pre-publish renderability contract (translation + registry as data; dataset compilation)
  • L3 post-publish runtime probes (row counts, one real query per widget — generalizing seedApplied)
  • L4 bounded self-correction + machine-conditional publish gate (replaces the human approval gate for AI builds; HITL stays for destructive actions only) + an LLM intent-review pass
  • L5 golden-prompt eval suite on the existing-but-unused ai_eval_cases/ai_eval_runs (every incident becomes a permanent regression case)

Phase 1 (~1 week, cloud) alone converts this week's discovery latency from human-hours to same-turn.

Status: Proposed — open for review; merging records the proposal.

🤖 Generated with Claude Code

…ies, corrects itself

Never make correctness depend on a human looking: six agent-authored defects
shipped to staging in one day, every one schema-valid, every one found by a
human manually browsing. Five layers — draft-time graph lint, pre-publish
renderability contract, post-publish runtime probes, bounded self-correction
with a machine-conditional publish gate, and a golden-prompt eval suite on
the existing ai_eval_* skeleton. HITL remains only for destructive actions;
quality review is the machine's job.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 11, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
spec Ready Ready Preview, Comment Jun 11, 2026 4:55am

Request Review

@github-actions github-actions Bot added size/m documentation Improvements or additions to documentation labels Jun 11, 2026
@os-zhuang os-zhuang merged commit af72fb2 into main Jun 11, 2026
12 checks passed
@os-zhuang os-zhuang deleted the adr-0038-build-verification-loop branch June 11, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/m

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant