Introduce diskann-record crate to support serialization + deserialization of DiskANN indexes#1188
Introduce diskann-record crate to support serialization + deserialization of DiskANN indexes#1188suhasjs wants to merge 3 commits into
diskann-record crate to support serialization + deserialization of DiskANN indexes#1188Conversation
…oadable] traits [THIS PR] + impls for structs [TODO]
There was a problem hiding this comment.
⚠️ Not ready to approve
Sidecar artifact path handling allows unsafe/incorrect path shapes (including potential directory escape on save) and needs validation hardening before merging.
Pull request overview
Introduces a new diskann-record crate that defines a versioned JSON-manifest + sidecar-artifact framework for persisting DiskANN-related structures, including a Save/Load trait surface, wire-level value model, and basic round-trip tests. This is positioned as the foundational crate for the follow-up PR that will implement these traits for real index types.
Changes:
- Added
save+loadmodules withSave/LoadandSaveable/Loadabletraits, plussave_fields!/load_fields!macros. - Implemented wire types (
Value,Record,Handle), schemaVersion, and a losslessNumbercontainer for manifest numeric values. - Integrated the new crate into the workspace (members + workspace dependency) and added initial unit tests validating round-trips and handle escape rejection.
File summaries
| File | Description |
|---|---|
| diskann-record/src/lib.rs | Crate-level API/docs, reserved-key policy, 64-bit platform assertion, and end-to-end tests. |
| diskann-record/src/version.rs | Defines Version and its string serialization/deserialization form. |
| diskann-record/src/number.rs | Adds Number wire type and safe narrowing conversions. |
| diskann-record/src/save/mod.rs | Save-side traits, entry point, macros, and primitive Saveable impls. |
| diskann-record/src/save/context.rs | Save-side context and sidecar writer + manifest finalization logic. |
| diskann-record/src/save/error.rs | Save-side error wrapper. |
| diskann-record/src/save/value.rs | Wire-level Value/Record/Handle representations and serde behavior. |
| diskann-record/src/load/mod.rs | Load-side traits, entry point, macros, and primitive Loadable impls. |
| diskann-record/src/load/context.rs | Load-side context/object/array APIs and sidecar reader. |
| diskann-record/src/load/error.rs | Load-side error type and recoverable-vs-critical classification. |
| diskann-record/Cargo.toml | New crate manifest and dependencies. |
| Cargo.toml | Adds diskann-record to the workspace and workspace dependencies. |
| Cargo.lock | Records the new workspace package entry. |
Copilot's findings
- Files reviewed: 12/13 changed files
- Comments generated: 4
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1188 +/- ##
==========================================
- Coverage 89.46% 89.24% -0.22%
==========================================
Files 487 497 +10
Lines 92170 93154 +984
==========================================
+ Hits 82460 83136 +676
- Misses 9710 10018 +308
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Suhas - I have one big architectural comment about allowing pluggable backend contexts that I think will address many of the concerns about how heavy this is as a dependency and support for VFS that I think is probably worth doing. Happy to help out if needed.
| dir: PathBuf, | ||
| metadata: PathBuf, | ||
| files: Mutex<HashSet<String>>, | ||
| } |
There was a problem hiding this comment.
I think one of the biggest things we can do in this PR is put the ContextInner types behind a trait object:
// Save path
pub trait SaveContext {
type Output;
fn write(&self, key: &str) -> Result<Write<'_>>;
fn finish(self, value: Value<'_>) -> Result<Self::Output>;
}
trait GetWrite {
fn write(&self, key: &str) -> Result<Write<'_>>;
}
impl<T> GetWrite for T
where
T: SaveContext,
{
fn write(&self, key: &str) -> Result<Write<'_>> {
<T as SaveContext>::write(key)
}
}
pub struct Context<'a> {
inner: &'a dyn GetWrite,
}
impl<'a> Context<'a> {
pub(super) fn new(inner: &'a dyn GetWrite) -> Self {
Self { inner }
}
}
pub fn save<T, C>(x: &T, context: C) -> Result<C::Output>
where
T: Saveable,
C: SaveContext,
{
let value = x.save(Context::new(&context))?;
context.finish(value)
)and the load path is
pub trait LoadContext {
fn value(&self) -> Result<&Value<'_>>>;
fn read(&self, key: &str) -> Result<Reader<'_>>;
}
pub struct Context<'a> {
inner: &'a dyn LoadContext,
value: &'a Value<'a>,
}
impl<'a> Context<'a> {
pub(super) fn new(inner: &'a dyn LoadContext, value: &'a Value<'a>) -> Self {
Self {
inner,
value
}
}
}
pub fn load<'a, T, C>(context: &'a C) -> Result<T>
where
T: Loadable<'a>,
C: LoadContext,
{
let value = context.value()?;
T::load(Context::new(context, &value))
}This gets us a few things:
- The crate with no features can have no actual implementations of the
{Save/Load}Contexttraits. This means no serde/serde_json as required dependencies be default, makingdiskann-recorda lighter-weight dependency for lower-level crates (will need to make the serialize implementation for Value feature-gated). - It provides a great hook for plugging in a virtual-filesystem for testing.
- Additional testing can use a purely in-memory implementation completely adjacent to a VFS backed system for even less overhead (will need to add a
Value::into_ownedimplementation).
| /// borrowing parameter `'a` lets [`Value::String`], [`Value::Bytes`], and nested | ||
| /// records reuse memory owned by the caller without copying. | ||
| #[derive(Debug)] | ||
| pub enum Value<'a> { |
There was a problem hiding this comment.
Probably worth moving value up one level since it's shared between saving and loading.
This PR is part 1/2 of #1079 and only introduces the
diskann-recordcrate. See #1079 for sample output when thesave::Saveandload::Loadtraits from this crate are implemented for a simple in-memory index.Reference Issues/PRs
Part 1/2 of #1079. Also see #737.
What does this implement/fix? Briefly explain your changes.
Introduces two new traits (
diskann_record::save::Saveanddiskann_record::load::Load)and a new creatediskann-recordto support file-based serialization + deserialization.Any other comments?