Skip to content

Introduce diskann-record crate to support serialization + deserialization of DiskANN indexes#1188

Open
suhasjs wants to merge 3 commits into
mainfrom
users/suhasja/saveload-core
Open

Introduce diskann-record crate to support serialization + deserialization of DiskANN indexes#1188
suhasjs wants to merge 3 commits into
mainfrom
users/suhasja/saveload-core

Conversation

@suhasjs

@suhasjs suhasjs commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

This PR is part 1/2 of #1079 and only introduces the diskann-record crate. See #1079 for sample output when the save::Save and load::Load traits from this crate are implemented for a simple in-memory index.

  • Does this PR have a descriptive title that could go in our release notes?
  • Does this PR add any new dependencies?
  • Does this PR modify any existing APIs?
  • Is the change to the API backwards compatible?
  • Should this result in any changes to our documentation, either updating existing docs or adding new ones?

Reference Issues/PRs

Part 1/2 of #1079. Also see #737.

What does this implement/fix? Briefly explain your changes.

Introduces two new traits (diskann_record::save::Save and diskann_record::load::Load)and a new create diskann-record to support file-based serialization + deserialization.

Any other comments?

…oadable] traits [THIS PR] + impls for structs [TODO]
@suhasjs suhasjs self-assigned this Jun 18, 2026
@suhasjs suhasjs requested review from a team and Copilot June 18, 2026 17:52
@suhasjs suhasjs added enhancement New feature or request rust Pull requests that update rust code labels Jun 18, 2026
@suhasjs suhasjs moved this to Done in DiskANN backlog Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

Sidecar artifact path handling allows unsafe/incorrect path shapes (including potential directory escape on save) and needs validation hardening before merging.

Pull request overview

Introduces a new diskann-record crate that defines a versioned JSON-manifest + sidecar-artifact framework for persisting DiskANN-related structures, including a Save/Load trait surface, wire-level value model, and basic round-trip tests. This is positioned as the foundational crate for the follow-up PR that will implement these traits for real index types.

Changes:

  • Added save + load modules with Save/Load and Saveable/Loadable traits, plus save_fields! / load_fields! macros.
  • Implemented wire types (Value, Record, Handle), schema Version, and a lossless Number container for manifest numeric values.
  • Integrated the new crate into the workspace (members + workspace dependency) and added initial unit tests validating round-trips and handle escape rejection.
File summaries
File Description
diskann-record/src/lib.rs Crate-level API/docs, reserved-key policy, 64-bit platform assertion, and end-to-end tests.
diskann-record/src/version.rs Defines Version and its string serialization/deserialization form.
diskann-record/src/number.rs Adds Number wire type and safe narrowing conversions.
diskann-record/src/save/mod.rs Save-side traits, entry point, macros, and primitive Saveable impls.
diskann-record/src/save/context.rs Save-side context and sidecar writer + manifest finalization logic.
diskann-record/src/save/error.rs Save-side error wrapper.
diskann-record/src/save/value.rs Wire-level Value/Record/Handle representations and serde behavior.
diskann-record/src/load/mod.rs Load-side traits, entry point, macros, and primitive Loadable impls.
diskann-record/src/load/context.rs Load-side context/object/array APIs and sidecar reader.
diskann-record/src/load/error.rs Load-side error type and recoverable-vs-critical classification.
diskann-record/Cargo.toml New crate manifest and dependencies.
Cargo.toml Adds diskann-record to the workspace and workspace dependencies.
Cargo.lock Records the new workspace package entry.

Copilot's findings

  • Files reviewed: 12/13 changed files
  • Comments generated: 4

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-record/src/save/context.rs
Comment thread diskann-record/src/save/context.rs Outdated
Comment thread diskann-record/src/load/context.rs
Comment thread diskann-record/src/save/mod.rs
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.19106% with 313 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.24%. Comparing base (3aa44ac) to head (f064610).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann-record/src/load/context.rs 69.56% 56 Missing ⚠️
diskann-record/src/save/context.rs 53.63% 51 Missing ⚠️
diskann-record/src/save/value.rs 70.34% 51 Missing ⚠️
diskann-record/src/load/error.rs 42.46% 42 Missing ⚠️
diskann-record/src/lib.rs 86.61% 32 Missing ⚠️
diskann-record/src/save/error.rs 0.00% 24 Missing ⚠️
diskann-record/src/number.rs 57.14% 18 Missing ⚠️
diskann-record/src/save/mod.rs 66.66% 18 Missing ⚠️
diskann-record/src/version.rs 60.00% 14 Missing ⚠️
diskann-record/src/load/mod.rs 86.27% 7 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1188      +/-   ##
==========================================
- Coverage   89.46%   89.24%   -0.22%     
==========================================
  Files         487      497      +10     
  Lines       92170    93154     +984     
==========================================
+ Hits        82460    83136     +676     
- Misses       9710    10018     +308     
Flag Coverage Δ
miri 89.24% <68.19%> (-0.22%) ⬇️
unittests 88.90% <68.19%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-record/src/load/mod.rs 86.27% <86.27%> (ø)
diskann-record/src/version.rs 60.00% <60.00%> (ø)
diskann-record/src/number.rs 57.14% <57.14%> (ø)
diskann-record/src/save/mod.rs 66.66% <66.66%> (ø)
diskann-record/src/save/error.rs 0.00% <0.00%> (ø)
diskann-record/src/lib.rs 86.61% <86.61%> (ø)
diskann-record/src/load/error.rs 42.46% <42.46%> (ø)
diskann-record/src/save/context.rs 53.63% <53.63%> (ø)
diskann-record/src/save/value.rs 70.34% <70.34%> (ø)
diskann-record/src/load/context.rs 69.56% <69.56%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suhas - I have one big architectural comment about allowing pluggable backend contexts that I think will address many of the concerns about how heavy this is as a dependency and support for VFS that I think is probably worth doing. Happy to help out if needed.

dir: PathBuf,
metadata: PathBuf,
files: Mutex<HashSet<String>>,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the biggest things we can do in this PR is put the ContextInner types behind a trait object:

// Save path
pub trait SaveContext {
    type Output;
    fn write(&self, key: &str) -> Result<Write<'_>>;
    fn finish(self, value: Value<'_>) -> Result<Self::Output>;
}

trait GetWrite {
    fn write(&self, key: &str) -> Result<Write<'_>>;
}

impl<T> GetWrite for T
where
    T: SaveContext,
{
    fn write(&self, key: &str) -> Result<Write<'_>> {
        <T as SaveContext>::write(key)
    }
}

pub struct Context<'a> {
    inner: &'a dyn GetWrite,
}

impl<'a> Context<'a> {
    pub(super) fn new(inner: &'a dyn GetWrite) -> Self {
        Self { inner }
    }
}

pub fn save<T, C>(x: &T, context: C) -> Result<C::Output>
where
    T: Saveable,
    C: SaveContext,
{
    let value = x.save(Context::new(&context))?;
    context.finish(value)
)

and the load path is

pub trait LoadContext {
    fn value(&self) -> Result<&Value<'_>>>;
    fn read(&self, key: &str) -> Result<Reader<'_>>;
}

pub struct Context<'a> {
    inner: &'a dyn LoadContext,
    value: &'a Value<'a>,
}

impl<'a> Context<'a> {
    pub(super) fn new(inner: &'a dyn LoadContext, value: &'a Value<'a>) -> Self {
         Self {
              inner,
              value
         }
    }
}

pub fn load<'a, T, C>(context: &'a C) -> Result<T>
where
    T: Loadable<'a>,
    C: LoadContext,
{
    let value = context.value()?;
    T::load(Context::new(context, &value))
}

This gets us a few things:

  1. The crate with no features can have no actual implementations of the {Save/Load}Context traits. This means no serde/serde_json as required dependencies be default, making diskann-record a lighter-weight dependency for lower-level crates (will need to make the serialize implementation for Value feature-gated).
  2. It provides a great hook for plugging in a virtual-filesystem for testing.
  3. Additional testing can use a purely in-memory implementation completely adjacent to a VFS backed system for even less overhead (will need to add a Value::into_owned implementation).

/// borrowing parameter `'a` lets [`Value::String`], [`Value::Bytes`], and nested
/// records reuse memory owned by the caller without copying.
#[derive(Debug)]
pub enum Value<'a> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth moving value up one level since it's shared between saving and loading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request rust Pull requests that update rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants