CI does not verify examples/eval-demo scores against committed score-*.txt snapshots

## Gap

`examples/eval-demo/score-baseline.txt` and `score-after-drift.txt` are pre-committed snapshots of `iec eval` output (9/9 and 5/9). Nothing in CI re-runs the command and diffs against them:

- `.github/workflows/ci.yml` runs `lint`, `test` (pytest), and `self-check` (`iec check`). None invoke `iec eval` against the demo.
- `tests/integration/test_eval.py` and `tests/test_eval.py` do not reference the `examples/eval-demo` fixtures or the `score-*.txt` files.

So if the `iec eval` output format, the checks, or the demo fixtures (`baseline/`, `after-drift/`, `eval/*/checks.yaml`) change, the committed `.txt` snapshots silently go stale and nobody finds out from CI.

## Why it matters

The book chapter `content/quality/agent-evaluation.md` (Agent Evaluation and Regression) prints these exact scores and named failures as proof that the eval workflow works. A repo that teaches "tests are proof, not ritual" should not ship example output that CI never re-derives. Verified today the committed numbers still match the binary:

```
iec eval --path baseline --eval-dir eval      # Score: 9/9 (100%)
iec eval --path after-drift --eval-dir eval   # Score: 5/9 (55%)
```

But that check was manual and one-off.

## Proposed fix

Add a CI step (or an integration test) that runs both commands from `examples/eval-demo` and diffs stdout against the committed `score-*.txt` files, failing on any difference. Either:

- a small job in `ci.yml` that runs the two commands and `diff`s captured output against the committed files, or
- a regeneration script plus `git diff --exit-code examples/eval-demo/score-*.txt` so the snapshots are guaranteed reproducible.

This keeps the demo (and the book chapter that cites it) honest as the CLI evolves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI does not verify examples/eval-demo scores against committed score-*.txt snapshots #1

Gap

Why it matters

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

CI does not verify examples/eval-demo scores against committed score-*.txt snapshots #1

Description

Gap

Why it matters

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions