Your agent says it's done. Loop Engineer makes it prove it.
Executable operating contracts for AI agent loops.
Loop Engineer turns long-running AI work from a fragile chat transcript into a repo-native contract: success criteria, task queue, verification gates, repair policy, terminal states, run history, and machine-readable state.
It ships as:
- a portable Loop Contract Protocol,
- a Python validator and inspector,
- and a Claude Code reference skill suite.
The prime directive:
If a loop cannot define success, verification, or a terminal state, it stops as
FailedSpecGapinstead of pretending the next completion is done.
Long-running agents fail in predictable ways:
- they forget what "done" meant after context compaction,
- they optimize to visible tests,
- they keep patching without measurable progress,
- they declare success without independent evidence,
- they have no typed way to say blocked, unsafe, unverifiable, or underspecified.
Loop Engineer makes those failure modes explicit contract states instead of vibes — a concrete, gate-backed reference implementation of loop engineering: contracts, typed termination, and health metrics for long-running agent loops.
git clone https://github.com/SollanSystems/loop-engineer.git
cd loop-engineer
python3 -m loop doctor examples/coverage-repair
python3 -m loop inspect examples/coverage-repairdoctor validates the contract objects (the real output also includes a
paths block listing every resolved contract file; omitted here for brevity):
{
"ok": true,
"schemas_checked": [
"loop-engineer/manifest@1",
"loop-engineer/state@1",
"loop-engineer/tasks@1",
"loop-engineer/terminal@1"
],
"issues": []
}inspect is a static contract linter: it scores the loop contract's structure —
what proof machinery is present and what is missing — without running the loop:
{
"target": "examples/coverage-repair",
"score": 90,
"terminal_states_covered": 7,
"present": [
"defines verifiable success criteria",
"independent verification",
"approval gates on side-effects",
"false-completion defense (held-out / anti-cheat)",
"all 7 terminal states reachable"
],
"gaps": [
"no plan-then-execute discipline for untrusted/web reads (prompt-injection surface)"
],
"verdict": "strong"
}Both commands accept either a workspace root or its .loop/ directory.
- Typed termination: every run exits through exactly one of 7 terminal states.
- Evidence before completion: a task is done only when it maps to a success criterion, a verifier passes, and evidence is recorded — not when the agent stops talking.
- Externalized state: loop status lives in files, not chat memory.
- Bounded repair: repair attempts are capped and measured.
- False-completion defense: held-out gates and anti-cheat scans designed to catch verifier gaming.
A loop contract is a repo-native directory — a small on-disk "repo-OS" — that externalizes intent, queue state, runtime state, verification, approvals, and terminal outcome:
<workspace>/
AGENTS.md # short entrypoint: where to find the contract
SPEC.md # what done means: success criteria + evidence rules
WORKFLOW.md # gates, budgets, repair cap, terminal states
TASKS.json # machine-readable task queue
RUNLOG.md # append-only iteration history
EVALS/ # datasets, rubrics, regressions, traces
scripts/
verify-fast # cheap deterministic gate
verify-full # full deterministic gate
verify-safety # safety / approval / injection checks
judge-rubric # advisory rubric judge
.loop/
manifest.yaml # contract metadata
state.json # live FSM cursor
terminal_state.json # final exit record, written once
artifacts/ # evidence bundles and intermediate outputs
approvals/ # approval requests and resolutions
checkpoints/ # recoverable snapshots
memory/ # run summaries and durable lessons
The contract split is deliberate:
SPEC.mddefines success.WORKFLOW.mddefines how the loop is allowed to operate.TASKS.jsondefines the executable queue.RUNLOG.mdrecords human-readable history..loop/state.jsonis the machine source of truth while the loop runs..loop/terminal_state.jsonrecords the final outcome (the resolver also accepts it at the workspace root).
A task is not done because an agent says it is done. A task is done only when it maps to a success criterion, its verifier passes, and evidence is recorded.
See reference/repo-os-contract.md for the canonical artifact schemas.
Loop Engineer does not allow a vague "completed." Every run exits through exactly one named state:
| State | When |
|---|---|
Succeeded |
Verification passes; all acceptance criteria are met. |
FailedUnverifiable |
Success or failure cannot be confirmed because verification is insufficient. |
FailedBlocked |
The loop cannot proceed because of a tool, permission, dependency, or external blocker. |
FailedBudget |
Time or cost budget is exhausted. |
FailedSafety |
Safety, policy, or approval risk is detected. |
FailedSpecGap |
The objective is underspecified; success criteria cannot be defined. |
AbortedByHuman |
The operator explicitly stops the run. |
No Claude Code plugin is required to validate or inspect a loop contract. From the cloned repo root:
python3 -m loop doctor /path/to/workspace
python3 -m loop inspect /path/to/workspaceTo run it against a loop in any other directory, install the core once (editable):
pip install -e . # optional faster manifest parsing: pip install -e ".[yaml]"
python3 -m loop doctor /path/to/workspacepython3 -m loop resolves the bundled loop/ package from the repo root;
pip install -e . puts it on your path so the CLI works from any directory. The
core is pure-stdlib — PyYAML is an optional extra, not a requirement.
The portable core lives in loop/ and validates schema-bearing artifacts in
schemas/:
loop-engineer/manifest@1loop-engineer/state@1loop-engineer/tasks@1loop-engineer/terminal@1
claude plugin marketplace add SollanSystems/loop-engineer
claude plugin install loop-engineer@loop-engineerRestart Claude Code to load all 9 skills. (Local dev: clone the repo and run
claude plugin marketplace add "$PWD" instead.)
Requirements: Python 3.10+ for the portable validator/inspector; Claude Code
for the plugin — no other dependencies. Optional integrations (e.g.
claude-code-orchestration for /verify-slice) are layered on when present and
never required; every skill runs on the bundled core alone.
The Claude Code plugin is the reference UI over the portable loop contract.
/loop-engineer
→ /loop-architect
→ /loop-contract
→ /loop-run
→ /loop-repair when a gate fails
→ /loop-flywheel after terminal state
| Skill | One line |
|---|---|
| loop-engineer | Router: broad intent → the right spoke map. |
| loop-architect | Classifies the scenario and selects the loop architecture + physical realization. |
| loop-contract | Scaffolds the repo-OS operating contract: SPEC.md, WORKFLOW.md, TASKS.json, RUNLOG.md, .loop/. |
| Skill | One line |
|---|---|
| loop-run | Runs the state machine iteration by iteration, approval-gated, running the contract's verify gate (optionally /verify-slice). |
| loop-repair | Runs a bounded patch-and-repair loop with a structured repair record. |
| loop-runtime-monitor | Watches an in-flight run from outside; flags stall, repair-churn, and budget overrun. |
| Skill | One line |
|---|---|
| loop-evals | Designs the 7-layer eval suite and makes false-completion-rate + repair-productivity first-class. |
| loop-flywheel | Turns traces and failures into new eval cases; manages memory compaction. |
| loop-inspector | Audits an existing loop directory read-only; emits a scored gap report. |
Not sure which spoke to use? Start with /loop-engineer; it routes the task.
loop/— portable contract core and CLI:doctor(aliases:validate,verify) andinspect.schemas/— JSON schemas for contract artifacts.skills/— Claude Code skill suite.reference/— protocol, architecture, eval, safety, and platform reference docs.scripts/— validators, runtime monitor, anti-cheat scanner, benchmark harness, rollout ledger.examples/— sample loop contracts, includingexamples/coverage-repair.
Loop Engineer deliberately composes with existing agent runtimes and workflow harnesses. It defines the loop contract above them; it does not try to replace their execution engines.
/goal,/loop, and agent runtimes execute loops.- LangGraph, AutoGen, ruflo, claude-code-flow, and similar tools orchestrate agents.
- Superpowers-style harnesses gate software-development phases.
Loop Engineer defines the operating contract above those engines: what success means, what evidence proves it, when repair is allowed, and how the loop must terminate.
What this suite owns:
- 7 typed terminal states — a contract primitive, so no run ends in a silent "completed."
false-completion-rate— measurable with the bundled held-out gate and anti-cheat scan (computed from real runs; no baseline ships yet).repair-productivity— the fraction of repair attempts that measurably move verification forward.- Repo-native loop state — survives compaction, crashes, and handoff.
- Deterministic-gate-before-rubric ordering — model judges are advisory, not the first line of proof.
What is table-stakes rather than oversold: on-disk contracts, bounded repair caps, and deterministic verification gates are shared with mature harnesses. Loop Engineer's claim is the loop-as-design-object framing plus the typed termination and loop-health metrics on top.
Release readiness is gate-backed, not asserted by README prose.
uv run --with pyyaml python3 -B scripts/validate_frontmatter.py
uv run --with pyyaml python3 -B scripts/self_eval.py
uv run --with pyyaml --with pytest python3 -B -m pytest -q -p no:cacheprovider scripts
python3 -m py_compile loop/*.py scripts/*.py
claude plugin validate --strict .claude-plugin/plugin.jsonThe structural self-eval checks skill presence, frontmatter, cross-links, terminal-state coverage, repair-record fields, eval metrics, templates, secret patterns, dispatch examples, the bring-your-own-verifier default, the MIT license, and README differentiation.
- Version:
0.3.4 - Release tag:
loop-engineer--v0.3.4(cut at publish) - License: MIT
- Primary interface: Claude Code plugin
- Portable core: Python CLI + JSON schemas
- Current reference example:
examples/coverage-repair
Deep content lives in reference/ and is loaded on demand by the skills:
reference/architecture-matrix.md— architecture comparison + scenario→realization table.reference/loop-patterns.md— PreFlect, milestone, patch-and-repair, flywheel, manager-orchestrator, plan-then-execute.reference/repo-os-contract.md— repo-OS layout, artifact schemas, state machine.reference/prompt-templates.md— bootstrap, goal-launch, repair-loop, and short prompt templates.reference/eval-suite.md— 7-layer eval suite, first-class metrics, flywheel schedule.reference/safety-and-approvals.md— escalation ladder, approval lifecycle, anti-cheat.reference/platform-map.md— portable-core mapping across Claude, Codex, Hermes, and Google.
git clone https://github.com/SollanSystems/loop-engineer.git
cd loop-engineer
python3 -m loop inspect examples/coverage-repairThen scaffold a contract for your own loop with /loop-contract, or read
reference/repo-os-contract.md for the artifact schemas.
MIT — Sollan Systems