[codex] Harden multiversion iterator validation#3656
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3656 +/- ##
==========================================
- Coverage 59.15% 58.20% -0.96%
==========================================
Files 2262 2178 -84
Lines 187031 177468 -9563
==========================================
- Hits 110644 103294 -7350
+ Misses 66438 65011 -1427
+ Partials 9949 9163 -786
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
PR SummaryMedium Risk Overview validationIterator now skips keys with no current value (writeset, read cache, or MV entry) via Iterator validation runs replay in a goroutine with panic recovery (logged, treated as invalid), splits logic into Adds regression tests for stale keys after writeset rewrite (blocking parent iterator), panic recovery, and multiple estimates completing without hanging. Reviewed by Cursor Bugbot for commit 04a43a7. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
A conservative, well-tested fix that prevents a nil-pointer dereference in validation-iterator value resolution and contains iterator panics as validation failures. Logic is correct and safe (skipping vanished keys only ever pushes toward re-execution, never a false pass); only minor non-blocking simplification notes.
Findings: 0 blocking | 5 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- The per-iterator validation goroutine no longer serves its original purpose. It previously ran concurrently so the main goroutine could listen for aborts on a channel; now aborts are polled inline via iteratorValidationAborted and the main goroutine just blocks on
<-validChannel. Sincerecover()works within the same goroutine, panic containment does not require the goroutine either. Consider running validateIteratorReplay inline with a deferred recover to avoid a goroutine spawn per iteration tracker on this hot validation path (validateIterator is called once per tracker in checkIteratorAtIndex). - Codex and Cursor second-opinion passes both produced no material findings (codex-review.md notes it could not run the package tests due to blocked network access for the Go 1.25.6 toolchain; cursor-review.md was empty).
- Confirm the new tests run under CI's race detector — they spin up goroutines coordinated via channels and a 5s timeout; the blocking/panic mock stores share state behind mutexes which looks correct, but worth ensuring
go test -racepasses for this package in CI. - 2 suggestion(s)/nit(s) flagged inline on specific lines.
| return false | ||
| } | ||
| // return whether we found the exact number of expected keys | ||
| return foundKeys >= len(expectedKeys) |
There was a problem hiding this comment.
[nit] iteratorValidationAborted(abortChan) is polled five times per loop iteration (lines 329, 335, 343, 361) plus once after the loop. This is harmless since it's a non-blocking select, but the repetition is heavy — a single check at the top of each iteration plus the post-loop check would be equivalent given nothing inside the loop body produces an abort (aborts originate from validationIterator.Value() invoked indirectly via mergeIterator.Valid()/Key()). Minor cleanup only.
| } | ||
|
|
||
| // try to get value from the writeset, otherwise try to get from multiversion store, otherwise try to get from parent iterator | ||
| func (vi *validationIterator) Value() []byte { |
There was a problem hiding this comment.
[nit] Value() deliberately reads vi.Iterator.Key() (the raw underlying key) rather than calling the overridden Key() which would run skipRemovedKeys. This is correct in practice because the merge iterator always positions via Valid()/Key() before calling Value(), and when it isn't positioned, GetLatestBeforeIndex returning nil now yields nil safely. A one-line comment noting this invariant (Value assumes the iterator is already positioned on a present key) would help future readers, since the asymmetry between Value() and the other overrides is subtle.
Summary
Root Cause
validationIterator.Value()assumed every captured key still had a multiversion value. When a lower-index transaction was re-executed and stopped writing one of those keys, the stale key could remain in the validation iterator's key list whileGetLatestBeforeIndexreturned nil, leading to a nil pointer dereference.Validation
go test ./sei-cosmos/store/multiversion ./sei-cosmos/tasks ./giga/deps/tasks -count=1git diff --check