feat(skills): escalating-cascade judge + alignment-drift self-improvement (Phase 5)#25
Merged
Conversation
…ment (Phase 5) Per the proposed architecture: instead of an ensemble of heavy models per decision (which crawls on local hardware), the judge CASCADES — a fast Tier-1 model scores every candidate on a 4-axis rubric, and we escalate to a heavy Tier-2 model ONLY when Tier-1 is genuinely unsure. So clear-cut cases finish locally; the cloud is paid only when it matters. - src/skills/learning/judge-cascade.ts (PURE, unit-tested): RubricScores + rubricAverage/rubricStdDev; shouldEscalate — escalate on the TWILIGHT ZONE (avg 5.5–7.5) OR HIGH VARIANCE (σ > 2.5, dimensions disagree → model confused); rubricToVerdict; buildRubricPrompt/parseRubricScores (clamps 1–10, fails closed → escalate). Self-improvement: alignmentDrift (mean |Tier2−Tier1|), isDriftRising (recent vs prior window), buildCalibrationBlock (inject Tier-2's worst corrections as few-shot so Tier-1 re-calibrates). - curator.ts integration: Tier-1 (judgeRoute) scores with the rubric + any calibration block; if it can't parse or shouldEscalate fires AND a Tier-2 model is configured (learning.judgeModelTier2), it re-scores with Tier-2 and logs the drift to ~/.qodex/judge-drift.jsonl. The rubric verdict feeds the existing independence + human-protection promotion gate unchanged. - config: learning.judgeModelTier2 (the heavy judge; unset ⇒ no escalation). Tests: 11 (escalation gates: twilight / high-variance / confident-pass / confident-reject; rubric verdict incl. unsafe→fail; parse clamp + fail-closed; drift mean, rising-window, calibration block). typecheck + full suite (1200) + build green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 5 — escalating-cascade judge with an alignment-drift self-improvement loop. Companion to Phase 4 (#24).
Why a cascade, not an ensemble
Running several heavy models per decision crawls on local hardware. Instead: a fast Tier-1 model scores every candidate on a 4-axis rubric, and we escalate to a heavy Tier-2 model only when Tier-1 is genuinely unsure — so clear-cut cases finish locally and the cloud is paid only when it matters.
Escalation gate (pure, tested)
shouldEscalate(scores)fires on either:parseRubricScoresclamps 1–10 and fails closed (unparseable Tier-1 ⇒ escalate).rubricToVerdictpasses only when clearly good and safe.Self-improvement (Feedback Alignment Drift)
On every escalation we log
|Tier2 − Tier1|to~/.qodex/judge-drift.jsonl.isDriftRisingcompares the recent vs prior window;buildCalibrationBlockinjects Tier-2's worst disagreements as few-shot examples into Tier-1's prompt so the local judge re-calibrates over time.Integration
curator.ts: Tier-1 (judgeRoute) scores with the rubric + calibration; escalates to Tier-2 (learning.judgeModelTier2) when unsure; the rubric verdict feeds the existing independence + human-protection promotion gate unchanged.Tests
11 (escalation gates: twilight / high-variance / confident-pass / confident-reject; verdict incl. unsafe→fail; parse clamp + fail-closed; drift mean / rising-window / calibration). ✅ typecheck · ✅ full suite (1200) · ✅ build.