docs(adr): ADR-0038 Build Verification Loop — the agent builds, verifies, and corrects itself#1710
Merged
Merged
Conversation
…ies, corrects itself Never make correctness depend on a human looking: six agent-authored defects shipped to staging in one day, every one schema-valid, every one found by a human manually browsing. Five layers — draft-time graph lint, pre-publish renderability contract, post-publish runtime probes, bounded self-correction with a machine-conditional publish gate, and a golden-prompt eval suite on the existing ai_eval_* skeleton. HITL remains only for destructive actions; quality review is the machine's job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Design center: never make correctness depend on a human looking. Humans won't review, and the magic moment auto-publishes before they could — the agent that builds an app must be the same loop that verifies and corrects it.
Grounded in this week's live data: six agent-authored defects shipped to staging in one day — every one passed schema validation, every one was found by a human manually browsing (dangling dataset refs, measure-name mismatches, seeds not materializing, queries returning 0 on populated objects, silent seed failures, views rendering as "Unknown component type"). Schema-valid ≠ renders ≠ returns data ≠ matches intent — four separate verification planes, of which only the first exists today.
Five layers, one
issues[]contract consumed by the agent, the chat health-card, and the eval store:seedApplied)ai_eval_cases/ai_eval_runs(every incident becomes a permanent regression case)Phase 1 (~1 week, cloud) alone converts this week's discovery latency from human-hours to same-turn.
Status: Proposed — open for review; merging records the proposal.
🤖 Generated with Claude Code