Skip to content

fix(bftree): validate record sizes at construction #1166

Open
JordanMaples wants to merge 3 commits into
mainfrom
jordanmaples/bftree_validations
Open

fix(bftree): validate record sizes at construction #1166
JordanMaples wants to merge 3 commits into
mainfrom
jordanmaples/bftree_validations

Conversation

@JordanMaples

Copy link
Copy Markdown
Contributor

Problem

BfTree::insert returns a LeafInsertResult indicating success or failure, but all three sub-providers (vectors, neighbors, quant) were discarding it unconditionally. When a record exceeds cb_max_record_size, the insert silently fails and later surfaces as a misleading "vector not found" error during search.

This is the root cause of #1159 (#1159): the default cb_max_record_size (1952 bytes) is too small for 1536D f32 vectors (8-byte key + 6144-byte value = 6152 bytes), but the failure was swallowed at insert time.

Solution

Early validation at provider construction:

BfTreeProviderParameters::validate::() checks that cb_max_record_size is large enough for both the vector and neighbor stores before any BfTree is created. For the quant store, validation is performed inside the Poly impl of CreateQuantProvider::create, where quantizer.bytes() is known — no separate call needed.

Defense in depth at every insert site:

All insert() calls in vectors.rs, neighbors.rs, and quant.rs now match on LeafInsertResult::InvalidKV and propagate a clear ANNError. This catches cases where validation is bypassed (e.g., loading from a snapshot with a changed config).

Error message example:

vector_provider_config: cb_max_record_size (1952) is too small;
a record requires 6152 bytes (8-byte key + 6144-byte value);
increase cb_max_record_size to at least 6152

Fixes #1159. Supersedes #1164

Testing

  • test_validate_rejects_undersized_vector_config — 1536D f32 + default config
  • test_validate_rejects_undersized_neighbor_config — max_degree=500 exceeding default
  • test_validate_accepts_valid_config — properly sized config passes

JordanMaples and others added 2 commits June 15, 2026 09:03
… failures

BfTree::insert returns LeafInsertResult but all three sub-providers
(vectors, neighbors, quant) were discarding it, silently dropping
inserts when the record exceeds cb_max_record_size. This caused
misleading 'vector not found' errors at search time.

Fixes:
- Add BfTreeProviderParameters::validate() that checks vector and
  neighbor config record sizes upfront at provider construction
- Add validate_quant() for post-quantizer-construction validation
- Match on LeafInsertResult::InvalidKV in all insert paths (vectors.rs,
  neighbors.rs, quant.rs) and propagate as ANNError

The validation catches the root cause reported in issue #1159: default
cb_max_record_size (1952) is too small for 1536D f32 vectors (6152
bytes needed), but the error was swallowed and surfaced later as a
confusing 'not found' during search.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move quant validation into the Poly<dyn Quantizer> impl of
CreateQuantProvider where quantizer.bytes() is known. This ensures
the quant store config is validated at construction without requiring
a separate validate_quant() call that callers might forget.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents silent bf-tree insert failures caused by undersized cb_max_record_size settings by validating record sizes at BfTreeProvider construction time and by propagating LeafInsertResult::InvalidKV errors at all bf-tree insert sites, addressing the misleading “vector not found” failures described in #1159.

Changes:

  • Add BfTreeProviderParameters::validate::<T>() to preflight vector + neighbor record-size requirements against cb_max_record_size before constructing a provider.
  • Propagate bf-tree insert failures in vectors.rs, neighbors.rs, and quant.rs by matching on LeafInsertResult and returning a clear ANNError.
  • Add record-size validation for quantized vectors inside CreateQuantProvider for Poly<dyn Quantizer> and add regression tests for validation behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
diskann-bftree/src/vectors.rs Handle LeafInsertResult::InvalidKV from vector-store inserts and surface as ANNError.
diskann-bftree/src/neighbors.rs Handle LeafInsertResult::InvalidKV from neighbor-store inserts and surface as ANNError.
diskann-bftree/src/quant.rs Handle LeafInsertResult::InvalidKV from quant-store inserts and surface as ANNError.
diskann-bftree/src/provider.rs Add early cb_max_record_size validation for vector/neighbor configs, validate quant record size during quant provider creation, and add validation tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 71.42857% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.43%. Comparing base (1bdf1a1) to head (980f1c4).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
diskann-bftree/src/quant.rs 38.88% 11 Missing ⚠️
diskann-bftree/src/provider.rs 82.14% 5 Missing ⚠️
diskann-bftree/src/neighbors.rs 55.55% 4 Missing ⚠️
diskann-bftree/src/vectors.rs 66.66% 4 Missing ⚠️

❌ Your patch status has failed because the patch coverage (71.42%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1166      +/-   ##
==========================================
- Coverage   89.46%   89.43%   -0.04%     
==========================================
  Files         487      487              
  Lines       92102    92196      +94     
==========================================
+ Hits        82401    82457      +56     
- Misses       9701     9739      +38     
Flag Coverage Δ
miri 89.43% <71.42%> (-0.04%) ⬇️
unittests 89.09% <71.42%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-bftree/src/lib.rs 63.63% <100.00%> (+16.26%) ⬆️
diskann-bftree/src/neighbors.rs 92.73% <55.55%> (-1.22%) ⬇️
diskann-bftree/src/vectors.rs 90.76% <66.66%> (-1.63%) ⬇️
diskann-bftree/src/provider.rs 90.80% <82.14%> (-0.16%) ⬇️
diskann-bftree/src/quant.rs 86.12% <38.88%> (-3.40%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that validating like this isn't particularly robust long-term. For example, hooking the validation into CreateQuantProvider means someone can still call QuantVectorProvider::new_with_config without validation. Would it be better to validate directly in the provider constructors since then there will be one source of truth?

If there's a concern that validating in a constructor is too late (in the case of the top level provider, we may want to validate before creating anything to avoid unnecessary work), there can be custom config states for each sub-provider that does the prevalidation.

Just some thoughts - if you're happy with this direction, I'm happy to approve.

Addresses Mark's review feedback: validation now lives in each
provider's constructor (VectorProvider, NeighborProvider,
QuantVectorProvider) so it cannot be bypassed by calling
new_with_config directly.

- Add shared validate_record_size() helper in lib.rs
- Each constructor validates before creating the BfTree
- Remove BfTreeProviderParameters::validate() and duplicate
  check in CreateQuantProvider::create()
- Keep LeafInsertResult::InvalidKV checks at insert sites as
  defense-in-depth

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples

Copy link
Copy Markdown
Contributor Author

I'm a bit worried that validating like this isn't particularly robust long-term. For example, hooking the validation into CreateQuantProvider means someone can still call QuantVectorProvider::new_with_config without validation. Would it be better to validate directly in the provider constructors since then there will be one source of truth?

If there's a concern that validating in a constructor is too late (in the case of the top level provider, we may want to validate before creating anything to avoid unnecessary work), there can be custom config states for each sub-provider that does the prevalidation.

Just some thoughts - if you're happy with this direction, I'm happy to approve.

I think ultimately validation in each of the constructors will be the most maintainable. I've updated the PR with constructor validations.

JordanMaples added a commit that referenced this pull request Jun 18, 2026
- Update bf-tree dependency from 0.4.9 to 0.5
- Migrate save/load to cpr_snapshot/new_from_cpr_snapshot
- Add use_snapshot field to BfTreeProvider, BfTreeProviderParameters,
  SavedParams, and BfTreeStoreConfig
- Guard save_bftree against unconfigured snapshot support
- Move record size validation into provider constructors (from PR #1166)
- Remove dead BfTreeParams::to_config and copy_snapshot_if_needed
- Add save_path support and round-trip tests for both full-precision
  and spherical (quantized) bftree indexes
- Remove #[ignore] on spherical round-trip test (0.5 fixes the panic)
- Require use_snapshot in benchmark JSON input configs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

error building index [bftree-full-precision]

4 participants