Skip to content

feat(skills): escalating-cascade judge + alignment-drift self-improvement (Phase 5)#25

Merged
QodeXcli merged 1 commit into
mainfrom
feat/judge-cascade
Jun 25, 2026
Merged

feat(skills): escalating-cascade judge + alignment-drift self-improvement (Phase 5)#25
QodeXcli merged 1 commit into
mainfrom
feat/judge-cascade

Conversation

@QodeXcli

Copy link
Copy Markdown
Owner

Phase 5 — escalating-cascade judge with an alignment-drift self-improvement loop. Companion to Phase 4 (#24).

Why a cascade, not an ensemble

Running several heavy models per decision crawls on local hardware. Instead: a fast Tier-1 model scores every candidate on a 4-axis rubric, and we escalate to a heavy Tier-2 model only when Tier-1 is genuinely unsure — so clear-cut cases finish locally and the cloud is paid only when it matters.

Escalation gate (pure, tested)

shouldEscalate(scores) fires on either:

  • Twilight zone — average lands in the grey middle (5.5–7.5), or
  • High variance — σ > 2.5 across the rubric dimensions (e.g. safety 10 but efficiency 2 → the model is confused, not confident).

parseRubricScores clamps 1–10 and fails closed (unparseable Tier-1 ⇒ escalate). rubricToVerdict passes only when clearly good and safe.

Self-improvement (Feedback Alignment Drift)

On every escalation we log |Tier2 − Tier1| to ~/.qodex/judge-drift.jsonl. isDriftRising compares the recent vs prior window; buildCalibrationBlock injects Tier-2's worst disagreements as few-shot examples into Tier-1's prompt so the local judge re-calibrates over time.

Integration

curator.ts: Tier-1 (judgeRoute) scores with the rubric + calibration; escalates to Tier-2 (learning.judgeModelTier2) when unsure; the rubric verdict feeds the existing independence + human-protection promotion gate unchanged.

Tests

11 (escalation gates: twilight / high-variance / confident-pass / confident-reject; verdict incl. unsafe→fail; parse clamp + fail-closed; drift mean / rising-window / calibration). ✅ typecheck · ✅ full suite (1200) · ✅ build.

…ment (Phase 5)

Per the proposed architecture: instead of an ensemble of heavy models per
decision (which crawls on local hardware), the judge CASCADES — a fast Tier-1
model scores every candidate on a 4-axis rubric, and we escalate to a heavy
Tier-2 model ONLY when Tier-1 is genuinely unsure. So clear-cut cases finish
locally; the cloud is paid only when it matters.

- src/skills/learning/judge-cascade.ts (PURE, unit-tested): RubricScores +
  rubricAverage/rubricStdDev; shouldEscalate — escalate on the TWILIGHT ZONE
  (avg 5.5–7.5) OR HIGH VARIANCE (σ > 2.5, dimensions disagree → model confused);
  rubricToVerdict; buildRubricPrompt/parseRubricScores (clamps 1–10, fails closed →
  escalate). Self-improvement: alignmentDrift (mean |Tier2−Tier1|), isDriftRising
  (recent vs prior window), buildCalibrationBlock (inject Tier-2's worst
  corrections as few-shot so Tier-1 re-calibrates).
- curator.ts integration: Tier-1 (judgeRoute) scores with the rubric + any
  calibration block; if it can't parse or shouldEscalate fires AND a Tier-2 model
  is configured (learning.judgeModelTier2), it re-scores with Tier-2 and logs the
  drift to ~/.qodex/judge-drift.jsonl. The rubric verdict feeds the existing
  independence + human-protection promotion gate unchanged.
- config: learning.judgeModelTier2 (the heavy judge; unset ⇒ no escalation).

Tests: 11 (escalation gates: twilight / high-variance / confident-pass /
confident-reject; rubric verdict incl. unsafe→fail; parse clamp + fail-closed;
drift mean, rising-window, calibration block). typecheck + full suite (1200) +
build green.
@QodeXcli QodeXcli merged commit 8a6e20c into main Jun 25, 2026
2 checks passed
@QodeXcli QodeXcli deleted the feat/judge-cascade branch June 25, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant