Benchmarking json schemas#1154
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds JSON Schema generation (via schemars) and a tree-style terminal renderer so diskann-benchmark users can discover benchmark input fields/variants using --schema and --field, rather than reading source.
Changes:
- Add
schemars::JsonSchemacoverage across benchmark input/tolerance DTOs and generate per-input JSON Schemas fromInput::Raw. - Introduce
diskann-benchmark-runner::schemato render schemas (and drill into sub-fields) as human-readable CLI documentation. - Add schema/serialization drift tests and wire new CLI flags + README documentation.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-disk/src/build/configuration/quantization_types.rs | Adds a test intended to guard drift across QuantizationType variants/serialization. |
| diskann-disk/Cargo.toml | Adds serde_json as a dev-dependency for new tests. |
| diskann-benchmark/src/utils/mod.rs | Derives JsonSchema for SimilarityMeasure used in inputs. |
| diskann-benchmark/src/inputs/multi_vector.rs | Derives JsonSchema for multi-vector input types. |
| diskann-benchmark/src/inputs/graph_index.rs | Derives JsonSchema broadly; adds manual JsonSchema for StartPointStrategyRef + drift test; annotates schema override for the remote-serde field. |
| diskann-benchmark/src/inputs/filters.rs | Derives JsonSchema for filter-related inputs. |
| diskann-benchmark/src/inputs/exhaustive.rs | Derives JsonSchema for exhaustive-benchmark inputs. |
| diskann-benchmark/src/inputs/disk.rs | Adds schema proxy for QuantizationType (to avoid schemars dependency in diskann-disk) and derives JsonSchema for disk-index inputs. |
| diskann-benchmark/src/inputs/bftree.rs | Derives JsonSchema for bf_tree inputs. |
| diskann-benchmark/src/backend/multi_vector/driver.rs | Derives JsonSchema for multi-vector tolerance input. |
| diskann-benchmark/src/backend/disk_index/benchmarks.rs | Derives JsonSchema for disk-index tolerance input. |
| diskann-benchmark/README.md | Documents --schema and --field usage. |
| diskann-benchmark/Cargo.toml | Adds schemars dependency for input schema generation. |
| diskann-benchmark-simd/src/lib.rs | Derives JsonSchema for SIMD input/tolerance types. |
| diskann-benchmark-simd/Cargo.toml | Adds schemars dependency. |
| diskann-benchmark-runner/src/utils/num.rs | Implements JsonSchema for NonNegativeFinite. |
| diskann-benchmark-runner/src/utils/datatype.rs | Derives JsonSchema for DataType. |
| diskann-benchmark-runner/src/test/typed.rs | Updates test inputs/tolerances to derive JsonSchema. |
| diskann-benchmark-runner/src/test/dim.rs | Updates test inputs/tolerances to derive JsonSchema. |
| diskann-benchmark-runner/src/schema.rs | Adds schema renderer + path resolver + unit tests. |
| diskann-benchmark-runner/src/lib.rs | Exposes the new schema module. |
| diskann-benchmark-runner/src/input.rs | Requires Input::Raw: JsonSchema and adds Registered::schema() plumbing. |
| diskann-benchmark-runner/src/files.rs | Derives JsonSchema for InputFile. |
| diskann-benchmark-runner/src/app.rs | Adds --schema/--field CLI flags and wiring to render schema docs + example. |
| diskann-benchmark-runner/Cargo.toml | Adds colored + schemars dependencies. |
| Cargo.toml | Adds schemars to workspace dependencies. |
| Cargo.lock | Locks new schemars/colored (and transitive) dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Ensures the manual `JsonSchema` impl stays in sync with actual variants. | ||
| /// If a variant is added to `QuantizationType`, this match will fail to compile. | ||
| #[test] | ||
| fn schema_covers_all_quantization_variants() { |
| } | ||
| s | ||
| } | ||
| Some("number") => "number".to_string(), |
| let generator = | ||
| schemars::generate::SchemaSettings::default().into_generator(); | ||
| let schema = generator.into_root_schema_for::<T::Raw>(); | ||
| serde_json::to_value(schema).unwrap_or_default() |
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (65.70%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1154 +/- ##
==========================================
- Coverage 89.47% 89.30% -0.17%
==========================================
Files 486 487 +1
Lines 92161 92810 +649
==========================================
+ Hits 82458 82883 +425
- Misses 9703 9927 +224
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Adds schemars-based JSON Schema generation and a custom tree-style terminal renderer for benchmark input types. Users can run `inputs <name> --schema` to see field documentation with types, optionality, enum variants, and descriptions — followed by the example JSON. Implementation: - Add schemars 1.2 to workspace; derive JsonSchema on all input types - Custom renderer in diskann-benchmark-runner/src/schema.rs with: - Colored output (field names bold, types cyan, variants yellow) - Multi-line description alignment - Handles internally/externally-tagged enums, newtypes with $ref - MAX_DEPTH guard against recursive schemas - Manual JsonSchema impls for custom-serde types: - NonNegativeFinite (number with minimum) - StartPointStrategyRef (externally-tagged enum, with drift test) - QuantizationTypeSchema proxy (keeps schemars out of diskann-disk) - JsonSchema bound added to Input::Raw trait Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
83b159e to
f764227
Compare
To increase clarity into what the available options are for our json inputs, Mark and I discussed walking the ASTs of the json body using Schemar and rendering a breakdown of the possible options. This adds JSON Schema documentation for benchmark inputs using the
--schemaand--fieldoptionsSummary
Adds schemars-based JSON Schema generation and a custom tree-style terminal renderer so users can discover benchmark input fields without reading source code.
Usage
Full schema for an input type
cargo run --release -p diskann-benchmark -- inputs graph-index-build --schema
Drill into a specific field
cargo run --release -p diskann-benchmark -- inputs graph-index-build --field source.start_point_strategy
What's included
Sample output
Full schema
cargo run -p diskann-benchmark --all-features -- inputs --schema graph-index-build-bftree-spherical-quantizationSingle Field
cargo run -p diskann-benchmark --all-features -- inputs --schema graph-index-build-bftree-spherical-quantization --field build.start_point_strategy