Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
980856b
CI(stress-branch): unique-per-run concurrency group for parallel disp…
bentoner Jun 16, 2026
5dec024
Plan: de-flake Test 17d got97 assertion (CI stress find)
bentoner Jun 16, 2026
9f76c29
Plan v2: address review round 1 (rc-set {0,1,97,98}; drop-free WAITIN…
bentoner Jun 16, 2026
58c3741
Test 17d: de-flake got97>=1 (CI stress find on windows-2025 unit)
bentoner Jun 16, 2026
b430d73
CI(stress-branch): add CPU/disk load wrapper to surface timing flakes
bentoner Jun 16, 2026
2e483de
AGENTS.md: record the CI flake-hunt mission + formal diagnosis loop
bentoner Jun 16, 2026
06c6d8e
Interop Test 5: de-flake via deterministic pwsh orphan (CPU-load find)
bentoner Jun 16, 2026
3270fbd
AGENTS.md: record Test 31(a) diagnosis + hunt status (16/50, halted o…
bentoner Jun 16, 2026
b7af810
AGENTS.md: no CI credit limit (public repo, unlimited CI) — keep going
bentoner Jun 16, 2026
51a1753
Test 31(a): de-flake the leaked-claim discovery-route race (CI load f…
bentoner Jun 16, 2026
810ee41
AGENTS.md: mark Test 31(a) fixed (51a1753); resume hunt; ignore *.sta…
bentoner Jun 16, 2026
19a28fd
Test 32b: cover F2 — steal rename WON but read-back verification FAILED
bentoner Jun 16, 2026
c762899
AGENTS.md: record F2 coverage addition (Test 32b, 19a28fd) and hunt r…
bentoner Jun 16, 2026
9438da0
docs: add failure-modes design map for scope decisions
bentoner Jun 17, 2026
402dc1e
docs(failure-modes): review round 1 — sharpen core guarantee; fix clo…
bentoner Jun 17, 2026
57b1418
docs(failure-modes): review round 2 — generalize the no-silent-loss b…
bentoner Jun 17, 2026
534a007
ben comments: docs/failure-modes.md
bentoner Jun 17, 2026
959cca9
Revert "ben comments: docs/failure-modes.md"
bentoner Jun 17, 2026
a5df9d9
c converged: docs/failure-modes.md
bentoner Jun 17, 2026
9048400
Plan proposal: guarantees spec + failure-modes follow-ups (await Ben …
bentoner Jun 17, 2026
2617449
Plan: lock decisions D-a..e; add Bucket 6 / Phase 1b (load-testing st…
bentoner Jun 17, 2026
0397aaa
docs: load-&-matrix testing strategy recommendation (Ben f, Phase 1b …
bentoner Jun 17, 2026
aeba95c
docs(load-testing): apply Codex factual review (cgroup probe-required…
bentoner Jun 17, 2026
8ba6341
Plan: §9 accepted; add Bucket 7 (steering coverage / Phase 1c) + Buck…
bentoner Jun 17, 2026
e1f31a5
Phase 1a + 1c: add guarantees contract + steering-coverage gap-list
bentoner Jun 17, 2026
b504f87
Phase 2 plan: implementation plan for Buckets 2/3/4/6/8
bentoner Jun 17, 2026
26c9c29
Phase 2 plan: fold review round (Claude + Codex); both verdict sound-…
bentoner Jun 17, 2026
3789be9
Bucket 8 item 1: TAP output + 1..N plan line + undercount sentinel
bentoner Jun 17, 2026
dbecc02
Bucket 3: documentation edits (envelope, single-clock, network-FS, up…
bentoner Jun 17, 2026
750be3c
Bucket 4: correctness/envelope test split (D-c; GCL_ENVELOPE_TIER)
bentoner Jun 17, 2026
cbc1eca
Bucket 2A wave 1: steering tests 37-40 (rename-refused, step-3.3, for…
bentoner Jun 17, 2026
dee1543
Bucket 2A waves 2-3: steering tests 41-47 (Tier-A coverage complete)
bentoner Jun 17, 2026
3f7bd23
Bucket 2B: fault-injection tests 48-50 (F4, F2/J1, F1)
bentoner Jun 17, 2026
ba443c7
Fix Test 37 (rename-refused) portability on macOS/BSD mv
bentoner Jun 17, 2026
f471857
Plan: REOPEN D-d (merge-to-main strategy) — cherry-pick vs tidy-rebas…
bentoner Jun 17, 2026
353a2dd
Plan: RESOLVE D-d — mild tidy-up, merge to main via GitHub PR (extent…
bentoner Jun 17, 2026
4ee5899
Bucket 8 item 2: GCL_TEST_ONLY single-test selector
bentoner Jun 17, 2026
b8e2951
Bucket 8 item 3: extract tests/_harness.sh (shared TAP/selector/helpers)
bentoner Jun 17, 2026
d2ac607
Plan changelog: Bucket 8 items 2+3 done (selector + _harness.sh extra…
bentoner Jun 17, 2026
6f20a5b
Bucket 6a: de-stress tests.yml + record no-branch-protection decision
bentoner Jun 17, 2026
43cb648
Bucket 6b: graduate tests/with-load.sh (calibrated ratio + load-manif…
bentoner Jun 17, 2026
36b0033
chore: gitignore test-output/ (runtime CI/test artifact dir)
bentoner Jun 17, 2026
6a33cbe
Bucket 6e: Axis-A waiter-count sweep (GCL_TEST_SWEEP), nightly/deep-only
bentoner Jun 17, 2026
792ab90
Bucket 6c: nightly.yml (load matrix + kcov + idempotent issue triage)
bentoner Jun 17, 2026
9cce97d
Bucket 6d: deep-sweep.yml (on-demand deep flake hunt)
bentoner Jun 17, 2026
309cf39
docs(failure-modes): mark F1/F2/F4/J1/E3 TESTED; F3 document-only
bentoner Jun 17, 2026
d6d643f
Plan: Windows unit-suite CI sharding subplan (Phase 2, under review)
bentoner Jun 17, 2026
849ed82
Plan: fold round-1 review of windows-shard subplan (3 reviewers)
bentoner Jun 17, 2026
e277de3
Plan: fold round-2 (confirm) review of windows-shard subplan + kcov note
bentoner Jun 17, 2026
5095645
Plan: windows-shard subplan CONVERGED (round-3 Codex clean, sound-to-…
bentoner Jun 17, 2026
a01a8e3
Test sharding: GCL_TEST_SHARD=i/n round-robin gate in the harness
bentoner Jun 17, 2026
2de66ff
CI: split the windows-unit leg into 2 round-robin shards (tests.yml)
bentoner Jun 17, 2026
5e881c8
Plan: record windows-shard CI result (33% faster; balance underdelive…
bentoner Jun 17, 2026
f45e5d2
Plan: fixed shard split = "Test 1" vs "not Test 1" (measured, balanced)
bentoner Jun 17, 2026
89de803
Revert "CI: split the windows-unit leg into 2 round-robin shards (tes…
bentoner Jun 17, 2026
143e280
Revert "Test sharding: GCL_TEST_SHARD=i/n round-robin gate in the har…
bentoner Jun 17, 2026
80268f4
Plan: extract the concurrency canary (Test 1) into its own file; supe…
bentoner Jun 17, 2026
57ade63
Plan: fold single Codex review of the canary-split plan (2 real catches)
bentoner Jun 17, 2026
5fe15c9
Extract the concurrency canary (Test 1) into its own suite file
bentoner Jun 18, 2026
b1eb0a8
CI: run the canary as its own parallel cell (all arches) + wire night…
bentoner Jun 18, 2026
76bed1a
Plan: record canary-split CI result (~50% faster, all green, simpler …
bentoner Jun 18, 2026
7a242ee
Make the canary suite executable (kcov invokes it directly)
bentoner Jun 18, 2026
0730925
nightly kcov: select merged report by lines-covered + pipefail the kc…
bentoner Jun 18, 2026
9d00d44
docs: reflect the canary as a 4th suite (file table, run cmds, re-att…
bentoner Jun 18, 2026
89e25d6
docs: don't overstate the canary suite (mutual exclusion only, not cr…
bentoner Jun 18, 2026
86302c5
Fix split-introduced staleness: unit suite + CI comments no longer cl…
bentoner Jun 18, 2026
a662ce7
comments: fix two stale/inaccurate canary-split comments
bentoner Jun 18, 2026
c10aca0
comments: sweep remaining canary-split staleness (kcov section header…
bentoner Jun 18, 2026
6a9053b
Phase-4 round-1: fix test + CI review findings
bentoner Jun 18, 2026
e8f192b
Phase-4 round-1: documentation accuracy + merge scope
bentoner Jun 18, 2026
beaf943
Phase-4 round-2: fix dangling load-testing ref + guarantees line-numb…
bentoner Jun 18, 2026
1b0bf7d
Phase-4 round-3: fix stale "recommend/doc-gap" framing in failure-mod…
bentoner Jun 18, 2026
2b6f8de
Phase-4 round-4: de-reference deleted .plans/ content + fix stale sec…
bentoner Jun 18, 2026
1351edb
Phase-4 round-5: socket/device witness, dangling §E ref, four suites,…
bentoner Jun 18, 2026
f7363e7
Phase-4 round-6: fix stale correctness-issue body in nightly-triage.sh
bentoner Jun 18, 2026
5e46b20
Merge-prep: remove branch-local agent-workflow artifacts (not needed …
bentoner Jun 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file removed .agent/.gitkeep
Empty file.
223 changes: 223 additions & 0 deletions .github/scripts/nightly-triage.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
#!/usr/bin/env bash
# nightly-triage.sh — classify a nightly stress run's results and file/append a
# single labelled GitHub issue per (date, class), idempotently.
#
# Invoked by the `triage` job in .github/workflows/nightly.yml AFTER it has
# downloaded every matrix cell's `test-output/` artifact (each into a directory
# named `nightly-logs-<cell-id>/`, each carrying that cell's own
# `cell-conclusion.txt`). It reads only files on disk + `gh`; it makes no test
# decisions of its own beyond parsing the preserved logs.
#
# CLASSIFICATION:
# correctness — any `^FAIL:` line in a suite log (a genuine assertion failure).
# Files/append a `nightly-correctness` issue. The one class that
# demands investigation. (A job that concluded `failure`/timed out
# WITHOUT a `^FAIL:` line is infra, not correctness — see below.)
# envelope — no FAIL anywhere, but at least one `WARN[env-relaxed]` line in a
# log of a cell that *succeeded*. Tracked (`nightly-envelope`); the
# three wall-clock envelope assertions stretched under load — by
# design under GCL_ENVELOPE_TIER=relax — so NO investigation action.
# infra — a cell's artifact is missing, the cell job neither succeeded nor
# cleanly failed-on-an-assertion (timeout / cancelled / checkout
# failure / errored before any suite ran), OR — the EMPTY-ROUND
# GUARD — *no* cell produced any log at all. Filed `nightly-infra`.
# Crucially, "0 FAIL across 0 logs" is NEVER read as green: with no
# evidence we classify infra, not success.
#
# Idempotency: one open issue per (run-date, class). We search open issues by a
# stable title prefix + label; if one exists we append a comment, else we create.
# Re-running triage for the same date therefore appends rather than spamming.
#
# All-green (every cell success, no FAIL, no env warn, every artifact present) ⇒
# NO issue of any kind is filed.
#
# Inputs (environment):
# ARTIFACTS_DIR dir holding the downloaded per-cell artifact directories
# (default: ./artifacts). Each cell dir is `nightly-logs-<id>/`.
# (Per-cell job conclusions are read from FILES, not env: each stress cell writes
# its own `result` — success|failure|cancelled|skipped — to
# `<cell-dir>/cell-conclusion.txt` under always(), and the script
# reads that file directly. Ground truth PER CELL, never a matrix
# aggregate.)
# EXPECTED_CELLS space-separated list of cell ids that were supposed to run
# (default: the six N1..N6 ids). Lets the empty-round / missing-
# artifact guard know what to expect.
# RUN_DATE UTC date stamp for the issue title (default: today, UTC).
# GITHUB_REPOSITORY / GH_TOKEN(GITHUB_TOKEN) the usual `gh` env.
# DRY_RUN=1 print the `gh` actions instead of running them (for local tests).
set -uo pipefail

ARTIFACTS_DIR="${ARTIFACTS_DIR:-./artifacts}"
EXPECTED_CELLS="${EXPECTED_CELLS:-N1 N2 N3 N4 N5 N6}"
RUN_DATE="${RUN_DATE:-$(date -u +%Y-%m-%d)}"
DRY_RUN="${DRY_RUN:-0}"

log() { printf '%s\n' "$*" >&2; }

# A cell's log directory and its suite logs (may be absent ⇒ infra).
cell_logdir() { printf '%s/nightly-logs-%s' "$ARTIFACTS_DIR" "$1"; }

# ── Read a cell's OWN recorded conclusion from its artifact (ground truth: each
# stress cell writes job.status to cell-conclusion.txt under always()). Absent
# file ⇒ `unknown` (handled like a missing artifact). ──────────────────────────
cell_conclusion() {
local cell="$1" f val=""
f="$(cell_logdir "$cell")/cell-conclusion.txt"
if [ -f "$f" ]; then
val="$(tr -d '[:space:]' < "$f" 2>/dev/null)"
fi
printf '%s' "${val:-unknown}"
}

# ── Classify each expected cell. Accumulate evidence lines per class. ───────────
correctness_evidence=""
envelope_evidence=""
infra_evidence=""

any_log_seen=0 # for the empty-round guard

for cell in $EXPECTED_CELLS; do
dir="$(cell_logdir "$cell")"
concl="$(cell_conclusion "$cell")"

# Gather this cell's suite logs (unit/interop/integration *.log under the artifact).
logs=()
if [ -d "$dir" ]; then
while IFS= read -r f; do logs+=("$f"); done \
< <(find "$dir" -type f -name '*.log' 2>/dev/null)
fi

if [ "${#logs[@]}" -eq 0 ]; then
# No artifact / no logs for an expected cell. Distinguish: a clean job that
# somehow uploaded nothing is still suspect ⇒ infra (we cannot prove it green).
infra_evidence+="- ${cell}: no logs found (artifact missing or empty; job conclusion='${concl}')"$'\n'
log "[$cell] INFRA: no logs (conclusion=$concl)"
continue
fi
any_log_seen=1

# Scan the logs.
cell_fail=0
cell_envwarn=0
fail_lines=""
for f in "${logs[@]}"; do
if grep -qE '^FAIL:' "$f" 2>/dev/null; then
cell_fail=1
# Keep up to 5 FAIL lines per log as evidence.
fail_lines+="$(grep -nE '^FAIL:' "$f" 2>/dev/null | head -5 | sed "s#^# ${f##*/}: #")"$'\n'
fi
if grep -qE 'WARN\[env-relaxed\]' "$f" 2>/dev/null; then
cell_envwarn=1
fi
done

if [ "$cell_fail" -eq 1 ]; then
# A real `^FAIL:` assertion line ⇒ correctness, regardless of job conclusion.
correctness_evidence+="- ${cell}: job='${concl}', FAIL lines present:"$'\n'"${fail_lines}"
log "[$cell] CORRECTNESS (cell_fail=$cell_fail conclusion=$concl)"
elif [ "$concl" != "success" ]; then
# Logs exist but the job did not cleanly succeed and there is no assertion FAIL:
# failure-without-^FAIL / timeout / cancelled / errored late ⇒ infra, not
# correctness and not green (a failure WITHOUT a FAIL line is a step
# timeout/late error, which is infra).
infra_evidence+="- ${cell}: logs present but job conclusion='${concl}' (failure/timeout/cancel without ^FAIL: line)"$'\n'
log "[$cell] INFRA (conclusion=$concl, no FAIL)"
elif [ "$cell_envwarn" -eq 1 ]; then
envelope_evidence+="- ${cell}: succeeded with WARN[env-relaxed] (envelope assertion(s) stretched under load — expected)"$'\n'
log "[$cell] ENVELOPE (success + env-relaxed warn)"
else
log "[$cell] OK (success, no FAIL, no env warn)"
fi
done

# ── EMPTY-ROUND GUARD: if not a single expected cell produced any log, the run
# errored before any suite ran (checkout failure, total infra collapse). That is
# INFRA, never green — do not let "0 FAIL across 0 logs" pass as success. ──────
if [ "$any_log_seen" -eq 0 ]; then
empty_msg="EMPTY ROUND: none of the expected cells (${EXPECTED_CELLS}) produced any suite log. The workflow errored before any suite ran (checkout failure / total infra collapse) — this is NOT a passing nightly."
infra_evidence="${empty_msg}"$'\n'"${infra_evidence}"
log "EMPTY-ROUND GUARD fired: no logs from any cell."
fi

# ── File/append issues, idempotently, one per (date, class). ────────────────────
# Title prefix is stable per class+date so search-then-append is reliable.
file_issue() { # $1=class-label $2=title $3=body
local label="$1" title="$2" body="$3" existing=""

if [ "$DRY_RUN" = 1 ]; then
log "DRY_RUN: would search open issues label=$label title~='$title'"
log "DRY_RUN: title='$title'"
log "DRY_RUN: body:"; printf '%s\n' "$body" >&2
return 0
fi

# Search OPEN issues with this label whose title exactly matches (idempotency key).
# `gh issue list --search` uses GitHub search; we additionally filter the JSON by
# exact title to avoid a substring collision.
existing="$(gh issue list --state open --label "$label" \
--search "$title in:title" --json number,title \
--jq ".[] | select(.title == \"$title\") | .number" 2>/dev/null | head -1)"

if [ -n "$existing" ]; then
log "Appending to existing issue #$existing ($label)"
if gh issue comment "$existing" --body "$body" >/dev/null; then
log "Appended comment to #$existing"
else
log "WARN: failed to append to #$existing"
fi
else
log "Creating new issue ($label): $title"
if gh issue create --title "$title" --label "$label" --body "$body" >/dev/null; then
log "Created issue ($label)"
else
log "WARN: failed to create issue ($label)"
fi
fi
}

run_url="${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY:-}/actions/runs/${GITHUB_RUN_ID:-}"
filed=0

if [ -n "$correctness_evidence" ]; then
body="Nightly stress run on **${RUN_DATE}** has CORRECTNESS failures (a \`^FAIL:\` assertion line in a suite log). **Investigate.**

$correctness_evidence
Run: ${run_url}

(Auto-filed by nightly-triage.sh; idempotent per (date, class) — re-runs append.)"
file_issue "nightly-correctness" "Nightly correctness failure — ${RUN_DATE}" "$body"
filed=1
fi

if [ -n "$infra_evidence" ]; then
body="Nightly stress run on **${RUN_DATE}** had INFRA issues (missing artifact / timeout / cancel / a cell job that failed or errored WITHOUT any \`^FAIL:\` line). Not a product failure, but the run did not produce trustworthy results — re-dispatch or investigate the runner.

$infra_evidence
Run: ${run_url}

(Auto-filed by nightly-triage.sh; idempotent per (date, class).)"
file_issue "nightly-infra" "Nightly infra issue — ${RUN_DATE}" "$body"
filed=1
fi

# Envelope is filed ONLY when there is no correctness failure (a correctness issue
# subsumes it — under a red run the env warns are noise). Tracked, no action.
if [ -z "$correctness_evidence" ] && [ -n "$envelope_evidence" ]; then
body="Nightly stress run on **${RUN_DATE}**: no correctness failures, but envelope (wall-clock) assertions were relaxed under load (\`WARN[env-relaxed]\`). This is EXPECTED under GCL_ENVELOPE_TIER=relax — tracked, **no investigation needed** unless it becomes persistent at low load.

$envelope_evidence
Run: ${run_url}

(Auto-filed by nightly-triage.sh; idempotent per (date, class).)"
file_issue "nightly-envelope" "Nightly envelope warning — ${RUN_DATE}" "$body"
filed=1
fi

if [ "$filed" -eq 0 ]; then
log "ALL GREEN: every expected cell succeeded, no FAIL, no env warn, all artifacts present. No issue filed."
fi

# Triage itself succeeds whenever it ran to completion — it must not red the
# workflow for finding failures (those are surfaced as issues). It only fails if it
# could not run at all (handled by `set -uo pipefail` on a genuine scripting error).
exit 0
Loading