Concurrency hardening for BfTree parallel workloads#1158
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens diskann-bftree for concurrent workloads by synchronizing neighbor-list mutations (preventing lost updates), correcting dual-store (full + quant) lifecycle behavior, and adding concurrent stress tests to validate the new behavior.
Changes:
- Add striped per-vertex
RwLocksynchronization toNeighborProvider(read locks for reads, write locks for mutations) and introduce concurrent stress tests. - Adjust hard-delete behavior to also delete quantized vectors when present via a small abstraction over the quant store.
- Reorder dual-store writes in
set_elementto write full-precision first, then quantized.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| diskann-bftree/src/neighbors.rs | Adds striped locking around neighbor-list read/modify/write and adds concurrent stress tests. |
| diskann-bftree/src/provider.rs | Adds quant-store cleanup on delete via DeleteQuant and reorders full/quant write sequencing in set_element. |
| diskann-bftree/src/quant.rs | Adds quant-store delete support used by hard-delete cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Jordan, what is the perf impact of this change? Is ingestion speed the same as before? thanks |
aea01c1 to
04a1dd8
Compare
Add striped RwLock to NeighborProvider to eliminate the TOCTOU race in append_vector's read-modify-write cycle. Under 8-thread contention on the same vertex, the unprotected path loses 11-51% of edges. Striped locks (16384 stripes, Fibonacci multiply-shift hash): - Constant ~128 KB memory regardless of dataset size (vs ~8 GB at 1B vectors for per-vertex locks) - RwLock allows concurrent readers during search; only writers serialize - Read lock on get_neighbors, write lock on set/append/delete - Internal get_neighbors_unlocked avoids deadlock in append_vector Dual-store write ordering (SetElement for QuantVectorProvider): - Write full-precision (authoritative) before quant - If quant write fails, worst case is a vector missing from quantized search but still reachable in full-precision (benign) Hard-delete cleanup (Delete trait): - Added DeleteQuant trait; QuantVectorProvider deletes its entry, NoStore is a no-op - Single generic Delete impl with Q: DeleteQuant bound - Neighbor adjacency cleanup handled by upper DiskANNIndex layer Tests: - 3 concurrent stress tests proving the TOCTOU fix - Verified 10/10 failures without locks, 10/10 passes with locks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
04a1dd8 to
c94cdcc
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1158 +/- ##
==========================================
+ Coverage 89.46% 89.48% +0.01%
==========================================
Files 487 488 +1
Lines 92102 92378 +276
==========================================
+ Hits 82401 82664 +263
- Misses 9701 9714 +13
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Move StripedLocks from NeighborProvider to BfTreeProvider so that delete and set_element operations are synchronized under the same lock table as neighbor mutations. This eliminates a potential race between concurrent delete and set_element on the same vertex ID. - Extract StripedLocks into its own module (locks.rs) - BfTreeProvider owns Arc<StripedLocks>, shared with NeighborAccessor - Dynamic stripe count based on available_parallelism (4x cores, min 64) - delete() and set_element() now acquire write locks per vertex Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot give it another review. |
I re-reviewed and found one remaining concurrency issue: beam expansion paths were reading neighbors without taking the striped read lock. I fixed both call sites in commit Validation run:
|
562c341 to
535d925
Compare
|
@harsha-simhadri after chatting a bit with Mark and and changing the approach to locking in the provider. I ran a few more wiki100k streaming benchmarks on my devbox instance it looks like the perf overhead is negligible. |
Problem
NeighborProvider::append_vector performs a read-modify-write cycle (read existing neighbors → append → write back) without synchronization. Under concurrent mutation, the last writer wins and silently drops the other's edges. Stress testing shows 11-51% edge loss under 8-thread contention on a single vertex.
Additionally, the dual-store SetElement wrote quant before full-precision, meaning if the full-precision write failed after quant succeeded, it would leave a quantized ghost with no backing data. And Delete did not clean up the quant store entry.
Solution
Striped RwLock on NeighborProvider
Why not per-vertex locks?
BfTree's value proposition is spilling to disk for datasets exceeding RAM. At 1B vectors, per-vertex RwLock<()> would consume 8 GB of in-memory locks — defeating the purpose. Striped locks keep memory constant at the cost of rare false contention (0.006% collision rate at 16K stripes).
Dual-store write ordering
Quant store cleanup on delete
Testing