fix: access-key token expiry parsed as relative — tokens never refresh (CIP-3233); release 2.2.4#408
Conversation
…-auth (CIP-3233) stack-auth's AccessKeyRefresher computed expires_at as `now + auth_resp.expiry`, but CTS /api/authorise returns `expiry` as an ABSOLUTE Unix epoch (the JWT `exp` claim), not a relative duration. The sum landed ~decades in the future, so AutoRefresh never considered the token expired and never refreshed it; ZeroKMS enforced the real ~15-min exp, so encrypt/decrypt failed ~15 min after startup until the pod restarted. Use the value as-is: `expires_at: auth_resp.expiry`. Also corrects the access-key test fixtures, which mocked `expiry` as a small relative value (e.g. 3600) and thereby hid the bug — they now model an absolute epoch (now + N) like the real CTS. Adds a regression test asserting an absolute `expiry` yields expires_in ~= the intended TTL (fails under the pre-fix `now + expiry` arithmetic). This is the actual root cause of the customer's 15-minute failures; the 2.2.3 CancelGuard backport (CIP-3159) is unrelated hardening and did not help. Confirmed against a live production token: response.expiry == JWT exp (absolute), exp - iat == 900.
Patch release carrying the access-key token-expiry fix (CIP-3233): bump workspace version 2.2.3 -> 2.2.4 and promote the Unreleased CHANGELOG entry to [2.2.4].
📝 WalkthroughWalkthrough
ChangesAccess Key Token Expiry Fix and v2.2.4 Release
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…tack-auth patch (CIP-3233) Moves Proxy off the `vendor/stack-auth` `[patch.crates-io]` workaround and onto the current released cipherstash-client group, built against the fixed stack-auth. Background: 2.2.4 (PR #408) shipped the CIP-3233 access-key token-refresh fix via a vendored stack-auth patched on top of the 0.34.1-alpha.4 source. cipherstash-client 0.38.0 links stack-auth 0.38.0, which carries the same fix from crates.io, so the vendored copy and patch are no longer needed. Changes: - cipherstash-client / cipherstash-config / cts-common: 0.34.1-alpha.4 -> 0.38.0 (carries the API migration from PR #406's 0.37.0 upgrade; 0.37 -> 0.38 needed no further source changes) - Remove `[patch.crates-io] stack-auth = { path = "vendor/stack-auth" }`, the `exclude = ["vendor/stack-auth"]` workspace entry, and the vendor/stack-auth tree - stack-auth now resolves from crates.io (0.38.0); single version of the cipherstash-client group in the lock (zerokms-protocol 0.12.19) Verified: `cargo check --workspace`, `cargo clippy --workspace --all-targets`, and `cargo test --workspace --lib` (111 proxy unit tests) all pass. Integration tests need a live DB/ZeroKMS and were not run here.
Summary
Fixes the actual root cause of the customer's "ZeroKMS auth fails ~15 minutes after startup" issue, which the 2.2.3 CancelGuard backport (CIP-3159) did not resolve. Ships as 2.2.4.
Linear: CIP-3233.
NOTE: this fix is for the vendored
stack-auth- the permanent fix is in https://github.com/cipherstash/cipherstash-suite/pull/2036Root cause
stack-auth's access-key refresher computed the token's local expiry asnow + auth_resp.expiry. But CTS/api/authorisereturnsexpiryas an absolute Unix epoch — it is literally the JWTexpclaim — not a relative duration. The sum lands ~decades in the future, soAutoRefreshnever considers the token expired and never refreshes it. ZeroKMS enforces the JWT's real ~15-minuteexp, so every encrypt/decrypt 401s ~15 minutes after a process starts and stays broken until restart.Confirmed against a live production token:
Pre-fix, stack-auth computed
expires_at = 1781743221 + 1781744121 ≈ year 2083.The fix
(The OAuth path is untouched — it correctly uses the relative
expires_in.)Tests
expiryas a small relative value (3600), which is exactly what hid the bug. They now model an absolute epoch (now + N) like the real CTS.access_key_expiry_is_absolute_epoch_not_relative): mocks an absoluteexpiryand asserts the resultingexpires_in()≈ the intended TTL. Verified it fails under the pre-fixnow + expiryarithmetic (expires_in()≈ 1.7e9) and passes with the fix.cargo check -p cipherstash-proxyclean. Release version/changelog drift guard satisfied locally.Notes
stack-auth(HEAD/0.37.0) and affects any long-running access-key consumer. Tracked in CIP-3233 with a cross-repo bump checklist; an upstream fix PR follows. Once Proxy moves to acipherstash-clientbuilt against fixed stack-auth, drop thevendor/stack-auth+[patch.crates-io]workaround.Summary by CodeRabbit
Bug Fixes
Chores