Skip to content

Harden QR checker and benchmark cases#151

Open
msaroufim wants to merge 1 commit into
mainfrom
qr-reward-hardening
Open

Harden QR checker and benchmark cases#151
msaroufim wants to merge 1 commit into
mainfrom
qr-reward-hardening

Conversation

@msaroufim

Copy link
Copy Markdown
Member

Summary

  • Add heterogeneous mixed QR inputs that interleave well-conditioned and ill-conditioned matrices inside one batch.
  • Add mixed and ill-conditioned benchmark cases so robustness affects ranking, not just correctness gating.
  • Check the LAPACK-style factor residual per matrix, so one large/easy matrix cannot hide a failed matrix in the same batch.
  • Explicitly reject NaN/Inf materialized Q, R, Q.T @ A, Q.T @ Q, and Q @ R paths before residual comparisons.

Validation

  • python3 -m py_compile problems/linalg/qr_py/reference.py problems/linalg/qr_py/eval.py problems/linalg/qr_py/submission.py problems/linalg/qr_py/task.py
  • git diff --check

Notes

  • I did not run any production admin problem update command.
  • I found 6 existing QR submissions in the production DB with public passing B200 scores and failed secret B200 runs; this PR is meant to make that class of hidden-case failure visible in the problem definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant