feat(seidb): dump-flatkv computes per-bucket/total LtHash by blindchaser · Pull Request #3665 · sei-protocol/sei-chain

blindchaser · 2026-06-29T22:55:30Z

Summary

Extends the seidb dump-flatkv command to compute and verify the lattice hash (LtHash) of scanned FlatKV state and to throttle its disk reads so it can safely run against a live, block-producing node. The core scan path (dumpFlatKVFromStore) gains two parameters — withLtHash and readLimitMiBps — and two new abstractions: a per-bucket streaming bucketLtHasher and a byte-throughput read limiter.

sei-db/tools/cmd/seidb/operations/dump_flatkv.go:
- Adds --lthash (default true) and --read-limit-mb (default 64) flags, plus extended command documentation covering live-node and Kubernetes usage. --read-limit-mb is validated to be >= 0 (0 means unlimited).
- newReadLimiter builds a golang.org/x/time/rate limiter from a MiB/s ceiling. The burst is one second of budget but never less than minReadBurstBytes (4 MiB), guaranteeing a single large row (e.g. contract bytecode) can never exceed the burst and deadlock WaitN. Returns nil (unlimited) when the ceiling is <= 0. The scan calls WaitN(len(key)+len(val)) per row so throughput is measured on actual key+value bytes read.
- bucketLtHasher incrementally accumulates a bucket's LtHash from a stream of raw physical key / serialized value pairs. Keys and values are cloned on add because the iterator may reuse the backing slices on Next(). Pairs are buffered up to lthashBatchCap (8192) then folded into the running accumulator via MixIn; batching is safe because the LtHash group is associative.
- LtHash is always computed for all four buckets regardless of the --bucket write filter, so the printed TOTAL equals the node's committed global LtHash even when only one bucket is written to disk. Each bucket feeds exactly the bytes the FlatKV store hashes into its per-DB working LtHash, so per-bucket checksums match per-DB committed LtHashes and their MixIn matches the global root.
- verifyFlatKVLtHash cross-checks the re-scanned total against CommittedRootHash() from snapshot metadata, printing a PASS/FAIL block and returning an error (non-zero exit) on mismatch. A store with no committed LtHash (zero-hash, e.g. a snapshot predating LtHash metadata) is treated as "nothing to verify" and skipped rather than failing spuriously.

Test plan

sei-db/tools/cmd/seidb/operations/dump_flatkv_test.go:
- Updated existing TestDumpFlatKVFromStoreAllBuckets and TestDumpFlatKVFromStoreSingleBucket calls for the new withLtHash/readLimitMiBps signature (LtHash on, throttling off).
- Added TestBucketLtHasherMatchesSingleShot, which uses more than two full batches (lthashBatchCap*2 + 17 pairs) to exercise the incremental MixIn path. It asserts each batched per-bucket checksum equals a single-shot ComputeLtHash over the same pairs (associativity), and that the MixIn of the per-bucket accumulators equals one ComputeLtHash over the union — the exact per-bucket + total relationship the command reports.

…tle, metadata verify

cursor · 2026-06-29T22:56:17Z

PR Summary

Low Risk
Changes are confined to the offline seidb dump tool and diagnostics; they do not alter FlatKV commit or consensus paths, though a failed LtHash verify will fail the command by design.

Overview
Extends seidb dump-flatkv so a full FlatKV scan can compute per-bucket and total LtHash, compare the total to CommittedRootHash from snapshot metadata (PASS/FAIL, non-zero exit on mismatch), and throttle read throughput when run against a live node.

New flags --lthash (default on) and --read-limit-mb (default 64 MiB/s; 0 = unlimited). The scan always walks the full keyspace; --bucket only limits which hex dump files are written, while LtHash still covers all four buckets so the total stays comparable to the node root. A streaming bucketLtHasher batches pairs into ComputeLtHash / MixIn, with key/value cloning for iterator safety; newReadLimiter uses golang.org/x/time/rate with burst clamping so large rows do not break WaitN. Verification is skipped when metadata has no committed LtHash (zero hash).

Tests update dumpFlatKVFromStore call sites for the new parameters and add TestBucketLtHasherMatchesSingleShot for batched vs single-shot LtHash equivalence.

^{Reviewed by Cursor Bugbot for commit 8ba6f87. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-29T22:57:08Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Jun 29, 2026, 10:57 PM

codecov · 2026-06-29T22:57:51Z

Codecov Report

❌ Patch coverage is 63.63636% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.21%. Comparing base (bb69d38) to head (8ba6f87).

Files with missing lines	Patch %	Lines
sei-db/tools/cmd/seidb/operations/dump_flatkv.go	63.63%	27 Missing and 5 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3665      +/-   ##
==========================================
- Coverage   59.17%   58.21%   -0.96%     
==========================================
  Files        2262     2178      -84     
  Lines      187053   177514    -9539     
==========================================
- Hits       110682   103338    -7344     
+ Misses      66433    65024    -1409     
+ Partials     9938     9152     -786

Flag	Coverage Δ
sei-chain-pr	`26.60% <63.63%> (?)`
sei-db	`70.41% <ø> (ø)`
sei-db-state-db	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
sei-db/tools/cmd/seidb/operations/dump_flatkv.go	`51.79% <63.63%> (+10.35%)`	⬆️

... and 84 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.

^{Reviewed by Cursor Bugbot for commit 8ba6f87. Configure here.}

cursor · 2026-06-29T22:58:03Z

+			}
+			if err := limiter.WaitN(ctx, n); err != nil {
+				return fmt.Errorf("read rate limiter: %w", err)
+			}


Read throttle undercharges large rows

Medium Severity

When a scanned row’s len(key)+len(val) exceeds the limiter burst, the scan calls WaitN once with n clamped to the burst instead of consuming tokens for the full row size. Large values (for example contract bytecode) can therefore be read with far less throttling than --read-limit-mb implies, weakening the live-node disk protection this flag is meant to provide.

^{Reviewed by Cursor Bugbot for commit 8ba6f87. Configure here.}

seidroid

The PR cleanly extends seidb dump-flatkv with per-bucket/total LtHash computation, committed-root verification, and a read-throughput limiter; the logic (slice cloning, batch associativity, burst clamping, nil-map reads) is correct and well-tested. No blockers — only a couple of minor robustness/UX notes.

Findings: 0 blocking | 4 non-blocking | 1 posted inline

Blockers

None at the file/PR level.

Non-blocking

Cursor's second-opinion review (./cursor-review.md) is empty — that pass produced no output.
Behavior change: --lthash now defaults to true and --read-limit-mb defaults to 64, so existing dump-flatkv invocations now do extra LtHash work and are throttled to 64 MiB/s by default. This is documented in the command help, but consider calling it out in release notes/changelog so offline users aren't surprised by the slower default.
Minor redundancy: printFlatKVLtHash already computes the total LtHash but discards it, and verifyFlatKVLtHash re-sums the per-bucket accumulators from scratch. Cheap (just MixIn of four vectors), but the total could be computed once and reused.
1 suggestion(s)/nit(s) flagged inline on specific lines.

seidroid · 2026-06-29T23:00:16Z

+	// zero LtHash. Treat that as "nothing to verify against" rather than a
+	// spurious failure (e.g. a snapshot that predates LtHash metadata).
+	zero := lthash.New().Checksum()
+	if bytes.Equal(committedTotal, zero[:]) {


[suggestion] This conflates "no committed LtHash metadata" with "committed metadata present but zero/corrupt." The zero-LtHash checksum is also what an empty store and a store whose committed metadata was incorrectly zeroed both produce, so a non-empty store with zeroed/corrupt committed metadata is reported as skipped rather than failing verification — exactly the case verification exists to catch. Consider detecting metadata presence explicitly (e.g. an IsZero()/has-metadata signal from the store) instead of inferring absence from the checksum value. (Also raised by Codex.)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ba6f87ef0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-29T23:00:30Z

+	// zero LtHash. Treat that as "nothing to verify against" rather than a
+	// spurious failure (e.g. a snapshot that predates LtHash metadata).
+	zero := lthash.New().Checksum()
+	if bytes.Equal(committedTotal, zero[:]) {


Skip verification when the source snapshot lacks LtHash

When the selected snapshot predates LtHash metadata but the cloned changelog replays later blocks, LoadVersion starts the LtHash accumulators at zero and catchup persists a non-zero hash for only the replayed WAL deltas. This check uses the post-open CommittedRootHash() and only skips the all-zero case, so it no longer recognizes that the baseline snapshot had no verifiable LtHash and compares a full scan of snapshot+WAL data against that partial hash, causing the default dump-flatkv --lthash path to fail on upgraded live stores until a new snapshot with LtHash metadata exists. Please track whether the selected snapshot had LtHash metadata before replay and skip verification in that case.

Useful? React with 👍 / 👎.

claude

LGTM — additive changes to the offline seidb dump-flatkv tool with no impact on commit/consensus paths.

Extended reasoning...

Overview

The PR extends an offline diagnostic CLI (seidb dump-flatkv) with two additions: (1) per-bucket and total LtHash computation, verified against snapshot metadata, and (2) byte-throughput read throttling so the scan can safely run against a live node. Changes are isolated to sei-db/tools/cmd/seidb/operations/dump_flatkv.go and its test file — no production commit/consensus code is touched.

Security risks

None substantive. The tool is read-only against a hard-linked snapshot clone. No new authentication, crypto, or RPC surfaces are introduced. The LtHash usage feeds raw iterator bytes into the same lthash package the FlatKV store itself uses for committed root hashing — no new cryptographic logic. A failed verification fails the command by design (intended exit-non-zero contract).

Level of scrutiny

Low. This is offline tooling whose worst-case failure is a wrong dump or a misleading PASS/FAIL on a CLI command — there is no on-chain or persisted-state effect. The default --read-limit-mb 64 is conservative and --lthash=true adds work that is purely diagnostic.

Other factors

The author handled the tricky edges explicitly: keys/values are cloned before retention (iterator slice-reuse), the limiter burst is floored at 4 MiB so an oversized row can't deadlock WaitN, and a zero committed-LtHash is treated as 'nothing to verify' rather than spuriously failing. A new test (TestBucketLtHasherMatchesSingleShot) exercises the batched MixIn path against a single-shot ComputeLtHash over the union, which is the right invariant to assert. Bug hunter found nothing and the cursor[bot] also flagged it low risk.

feat(seidb): dump-flatkv computes per-bucket/total LtHash, read throt…

8ba6f87

…tle, metadata verify

cursor Bot reviewed Jun 29, 2026

View reviewed changes

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 29, 2026

View reviewed changes

claude Bot reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(seidb): dump-flatkv computes per-bucket/total LtHash#3665

feat(seidb): dump-flatkv computes per-bucket/total LtHash#3665
blindchaser wants to merge 1 commit into
mainfrom
yiren/flatkv-dump

blindchaser commented Jun 29, 2026

Uh oh!

cursor Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 29, 2026

Uh oh!

seidroid Bot left a comment

Uh oh!

seidroid Bot Jun 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

blindchaser commented Jun 29, 2026

Summary

Test plan

Uh oh!

cursor Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 29, 2026

Choose a reason for hiding this comment

Read throttle undercharges large rows

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

Uh oh!

seidroid Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented Jun 29, 2026 •

edited

Loading

codecov Bot commented Jun 29, 2026 •

edited

Loading