feat(skills): champion-relative reward + skill rollback + UCB README by QodeXcli · Pull Request #28 · QodeXcli/QodeX

QodeXcli · 2026-06-26T01:28:03Z

Follow-ups to the UCB1 work. Note: item #1 (rewardWeights in config) already shipped in #27 — verified learning.versioning.rewardWeights is present. This PR does the other three.

#2 — Composite reward normalized RELATIVE TO THE CHAMPION

Efficiency is now measured against the champion (the stable version is the baseline a challenger must beat), not the max across arms. championRef() = the champion's per-exec tokens/ms; a version at champion cost scores the 0.5 efficiency baseline, free → 1.0, 2× cost → 0.0. ucbScores + decideChampion use the champion reference. Result: a challenger is rewarded specifically for being cheaper/faster than the champion it's challenging.

#3 — `qodex skill rollback <name> <version>`

rollbackToVersion() + CLI: snap a versioned skill's champion back to any earlier version (un-retires it, drops the challenger) — the manual safety lever alongside the bandit's auto-convergence.

#4 — README UCB1 docs

A new "Skill versioning & A/B testing (UCB1)" section: flat-manifest storage, the bandit, composite/champion-relative reward, the full config block, and a real skill versions / skill rollback example.

Live-verified

Rollback → that version becomes champion + challenger cleared (and a missing version returns false). Champion-relative reward gives the champion a 0.5 efficiency baseline and ranks a 2× costlier challenger below it.

Tests

Updated to the champion-ref shape + 1 new (over-budget challenger penalized). ✅ typecheck · ✅ full suite (1223) · ✅ build.

… + UCB README Follow-ups to the UCB1 work (rewardWeights config already shipped in #27): - Composite-reward efficiency is now normalized RELATIVE TO THE CHAMPION (the stable version is the baseline). championRef() takes the champion's per-exec tokens/ms; a version at champion cost scores the 0.5 efficiency baseline, free → 1.0, twice the cost → 0.0. So a challenger is rewarded for being cheaper/faster than the champion it's trying to unseat (not just relative to the other arm). ucbScores + decideChampion both use the champion reference. - `qodex skill rollback <name> <version>` + rollbackToVersion(): snap a versioned skill's champion back to any earlier version (un-retires it, drops the challenger). The manual safety lever alongside the bandit's auto-convergence. - README: a "Skill versioning & A/B testing (UCB1)" section — flat manifest storage, the bandit, composite/champion-relative reward, the full config block, and a real `skill versions` / `skill rollback` example. Live-verified: rollback to a version makes it champion + clears the challenger; champion-relative reward gives the champion a 0.5 efficiency baseline and ranks a 2× costlier challenger below it. Tests updated to the champion-ref shape + 1 new (over-budget challenger penalized). typecheck + full suite (1223) + build green.

QodeXcli merged commit 2520925 into main Jun 26, 2026
2 checks passed

QodeXcli deleted the feat/ucb-reward-rollback branch June 26, 2026 01:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): champion-relative reward + skill rollback + UCB README#28

feat(skills): champion-relative reward + skill rollback + UCB README#28
QodeXcli merged 1 commit into
mainfrom
feat/ucb-reward-rollback

QodeXcli commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

QodeXcli commented Jun 26, 2026

#2 — Composite reward normalized RELATIVE TO THE CHAMPION

#3 — qodex skill rollback <name> <version>

#4 — README UCB1 docs

Live-verified

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

#3 — `qodex skill rollback <name> <version>`