Skip to content

Prevent stale fs-store writes after lock cleanup#4678

Open
tnull wants to merge 1 commit into
lightningdevkit:mainfrom
tnull:2026-06-fs-store-version-lock-cleanup
Open

Prevent stale fs-store writes after lock cleanup#4678
tnull wants to merge 1 commit into
lightningdevkit:mainfrom
tnull:2026-06-fs-store-version-lock-cleanup

Conversation

@tnull

@tnull tnull commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Reserve write versions while holding the per-path lock map mutex so cleanup cannot remove the version state between version allocation and lock reference acquisition.

Add a regression test for the ordering invariant.

Co-Authored-By: HAL 9000

This finding was discovered by Project Loupe

Reserve write versions while holding the per-path lock map mutex so cleanup cannot remove the version state between version allocation and lock reference acquisition.

Add a regression test for the ordering invariant.

Co-Authored-By: HAL 9000

This finding was discovered by Project Loupe
@ldk-reviews-bot

ldk-reviews-bot commented Jun 10, 2026

Copy link
Copy Markdown

👋 Thanks for assigning @benthecarman as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-claude-review-bot

Copy link
Copy Markdown
Collaborator

No issues found.

I reviewed the only changed file (lightning-persister/src/fs_store/common.rs) in full, including the surrounding versioning/locking machinery (get_new_version_and_lock_ref, execute_locked_write, clean_locks, write_impl).

The change correctly closes the race: by allocating the version and cloning the per-path lock ref within a single critical section under the locks mutex, clean_locks can no longer remove the map entry between version allocation and lock-ref acquisition. Since a writer holds its Arc ref for the duration of its write (keeping the entry's strong_count > 2), any overlapping writer allocating a version under the same mutex is guaranteed to observe and share the same RwLock, preserving the monotonic last_written_version invariant. An older version can only ever obtain a fresh (reset-to-0) entry once the previous writer has fully completed and cleaned up — in which case there is no conflict.

I also checked for regressions introduced by the change:

  • No lock-order inversion (the locks mutex is never acquired while an inner RwLock guard is held).
  • No new deadlock (the mutex now wraps only the fetch_add and the or_default() clone; get_inner_lock_ref is no longer called while the lock is held).
  • The added test compiles (private fields/methods accessible within the same module; super::* plus explicit imports don't conflict) and validates the post-condition robustly even under different thread scheduling.

The timing/sleep-based test is mildly fragile and verifies the mutex-gating proxy rather than the exact clean-up race, but it is correct and not a defect.

@benthecarman benthecarman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants