SOLR-18267: Add flat vector index with no HNSW by adamjq · Pull Request #4492 · apache/solr

adamjq · 2026-06-01T19:28:41Z

https://issues.apache.org/jira/browse/SOLR-18267

Description

There are certain use cases, such as highly selective filters on large datasets, where it can be more efficient to perform a brute-force KNN search as a post-filter, instead of during ANN search.

Solr currently supports this use case with the vectorSimilarity Function and an fq, but still requires an HNSW graph to be built during indexing when using DenseVectorField, even if it's not used during search. The goal of this feature is to avoid paying the cost of HNSW graph construction and rebuilding ingestion when ANN search isn't used.

Solution

This PR introduces a new knnAlgorithm=flat option to DenseVectorField that uses Lucene99FlatVectorsFormat. This stores vectors in the index (.vec/.vemf files) without building the HNSW graph (.vex/.vem files).

Lucene99FlatVectorsFormat is not registered in Lucene's SPI, so this PR includes a wrapper class SolrFlatVectorFormat that delegates to Lucene99FlatVectorsFormat as a workaround. There are examples in other Lucene-based engines using a similar pattern to provide a flat vector format for exact KNN search that wraps Lucene99FlatVectorsFormat.

Limitations

This PR currently doesn't support:

knnAlgorithm=flat for quantized variants
search across flat dense vector fields using the knn, knn_text_to_vector and vectorSimilarity query parsers. Only the vectorSimilarity function query is initially supported.

Both features could be shipped as follow-ups.

AI Disclosure: Claude was used to assist with this PR. All code has been reviewed and tested by me.

Tests

Unit tests for Dense Vector Fields and quantized variants.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide
I have added a changelog entry for my change

alessandrobenedetti

It's a nice addition, it would be great to have this in and then solve the limitations!

adamjq · 2026-06-03T15:42:56Z

+ * @lucene.spi {@value #NAME}
+ * @since 10.1
+ */
+public final class SolrFlatVectorFormat extends KnnVectorsFormat {


I went back-and-forth about the naming of this class, because it is used as the name of segment files. For example: _0_SolrFlatVectorFormat_0.vec and _0_SolrFlatVectorFormat_0.vemf. Since the name is baked into the index, versioning it might make it easier to evolve in the future.

An alternative would be to add a version in the name like Solr101FlatVectorFormat (indicating it was introduced in Solr 10.1), or a similar approach.

@alessandrobenedetti do you have a strong opinion about either approach?

mmmm I never liked the versioning in the name of the class, to be honest, let's see if anybody has any suggestions and let's take it from there!

I would name it something to indicate the Lucene's version and codec. There doesn't seem to be any consistent convention here. Some codecs don't embed versions in their file names but some do. When they do it is typically the Lucene version that is embedded. I don't see a huge downside to having the codec/Lucene version displayed more prominently when inspecting index files. For reference, the vector codecs do seem to have more descriptive file names so I am leaning towards that naming pattern for consistency:

*_Lucene99HnswVectorsFormat_0.vec *_Lucene99HnswVectorsFormat_0.vem

I see now why we need this wrapper. But is it really a "Solr Flat Vector" format? I feel it is a bit of a stretch to call it that as the implementation is entirely Lucene and this is just to work around exposing it as an SPI. I suppose you can change the Lucene implementation under the hood without changing this "Solr format" but then you lose the benefit of the naming which is to immediately know that you have, say, lucene flat vector files from two different versions just by looking at the index files.

Edit: I don't think you can actually hot swap another Lucene implementation here otherwise you won't know how to read the old index files if you ever upgrade, right? This makes me think versioning is the way to go otherwise the unversioned SolrVectorFormat will forever be tied to Lucene99 format and any iteration will be versioned and that will be confusing to look at. So we should version it from the start.

I agree that having a version from the start is a good idea given the potential for this to evolve in the future. The question would be should be it the Solr version (e.g. Solr101FlatVectorFormat for Solr 10.1), or the Lucene Version (e.g. SolrLucene99FlatVectorsFormat).

The class is currently a very thin wrapper that delegates to the Lucene version. However, I could see the need to extend functionality to support the KNN query parser search logic. It's looks like Elastic had a similar discussion in this thread, so there's precedent there.

I've updated the PR to use Solr101FlatVectorFormat but am open to changing it

adamjq added 2 commits June 1, 2026 14:40

SOLR-18267: Add flat vector index with no HNSW

c58cd3b

Add changelog entry

80f894b

github-actions Bot added documentation Improvements or additions to documentation tests cat:search cat:schema labels Jun 1, 2026

adamjq marked this pull request as ready for review June 1, 2026 20:09

alessandrobenedetti approved these changes Jun 3, 2026

View reviewed changes

Comment thread solr/core/src/java/org/apache/solr/schema/DenseVectorField.java

Comment thread solr/core/src/test-files/solr/collection1/conf/bad-schema-densevector-flat-binaryQuantized.xml

Comment thread solr/core/src/test-files/solr/collection1/conf/bad-schema-densevector-flat-scalarQuantized.xml

adamjq added 3 commits June 3, 2026 10:17

Address PR comments

f46d944

Improve comment about Lucene99FlatVectorsFormat SPI limitation

bf63de6

Format with gradlew tidy

be37b1c

adamjq commented Jun 3, 2026

View reviewed changes

adamjq added 3 commits June 3, 2026 16:22

Update test and docs

5e285b2

Add fq case to vectorSimilarity test case

ac2afb7

Rename wrapper class to Solr101FlatVectorFormat to include version name

d1cd307

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SOLR-18267: Add flat vector index with no HNSW#4492

SOLR-18267: Add flat vector index with no HNSW#4492
adamjq wants to merge 8 commits into
apache:mainfrom
adamjq:SOLR-18267-add-flat-vector-index

adamjq commented Jun 1, 2026 •

edited

Loading

Uh oh!

alessandrobenedetti left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamjq Jun 3, 2026

Uh oh!

alessandrobenedetti Jun 5, 2026

Uh oh!

kotman12 Jun 5, 2026

Uh oh!

kotman12 Jun 5, 2026 •

edited

Loading

Uh oh!

adamjq Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adamjq commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Limitations

Tests

Checklist

Uh oh!

alessandrobenedetti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamjq Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

alessandrobenedetti Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

kotman12 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

kotman12 Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamjq Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adamjq commented Jun 1, 2026 •

edited

Loading

kotman12 Jun 5, 2026 •

edited

Loading