Predict IVF-PQ FP16 overflow and auto-switch to FP32#2246
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new ChangesFP16 Overflow Detection and CAGRA Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Line 39: Remove the debug printf statement at line 39 in the
ivf_pq_fp16_overflow.cuh file. This printf executes within a loop that iterates
over every dimension of every sampled row, causing severe performance
degradation and output flooding. Simply delete the entire printf line as it
serves no purpose in production code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 086831a1-8cdd-42f2-aaaf-8ac33b75c173
📒 Files selected for processing (2)
cpp/src/neighbors/detail/cagra/cagra_build.cuhcpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh (1)
70-88:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHIGH: Missing validation could cause kernel launch with zero grid blocks.
If
n_rows == 0(empty dataset) or ifsample_fractionis extremely small with a moderate dataset size (e.g.,n_rows=5000, sample_fraction=0.0001),n_samplebecomes 0, resulting ingrid_size = 0. Launching a kernel with zero blocks is invalid.Consider adding a minimum sample size guard:
Suggested fix
int64_t n_sample = static_cast<int64_t>(sample_fraction * static_cast<double>(n_rows)); if (n_rows <= 1000) { n_sample = n_rows; // for small datasets, just use them all and skip the sampling overhead } else if (n_rows > 100000) { // cap the sample size to 100k for speed and keep memory use within the limited workspace resource n_sample = 100000; } + + // Handle empty dataset or degenerate sample size + if (n_sample <= 0) { return 0.0f; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh` around lines 70 - 88, Add a minimum sample size guard to prevent grid_size from being zero when launching the kernel. After the existing size constraint checks for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition to ensure n_sample is at least 1, since if n_sample is 0, the subsequent calculation of grid_size will be 0, resulting in an invalid kernel launch with zero blocks. This guards against both empty datasets (n_rows == 0) and extremely small sample fractions.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Around line 70-88: Add a minimum sample size guard to prevent grid_size from
being zero when launching the kernel. After the existing size constraint checks
for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition
to ensure n_sample is at least 1, since if n_sample is 0, the subsequent
calculation of grid_size will be 0, resulting in an invalid kernel launch with
zero blocks. This guards against both empty datasets (n_rows == 0) and extremely
small sample fractions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a794a6f7-8a2b-4f29-ac12-49b5d25d10ab
📒 Files selected for processing (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
mfoerste4
left a comment
There was a problem hiding this comment.
Thanks @huuanhhuyn for the PR. Please utilize raft for the norm computation if possible.
mfoerste4
left a comment
There was a problem hiding this comment.
LGTM, thanks @huuanhhuyn for the PR!
tfeher
left a comment
There was a problem hiding this comment.
Thanks Huy for the PR! It looks good, but I have one concern regarding the "defensive margin". Let's see what others think about this questien.
|
/ok to test 7f41b35 |
achirkin
left a comment
There was a problem hiding this comment.
Hi @huuanhhuyn, thanks for tackling this long-standing problem!
I have some doubts about the approach though. Have you considered using the IVF-PQ search or index itself to decide if FP16 is permissible? Examples:
- Call IVF-PQ search on a small subset of queries and detect if any of the results are infinite/invalid; if so, adjust the search params to increase one or both internal dtypes. This is agnostic of the selected distance type.
- Calculate the distance maximum like you do here, but use the cluster centers of the IVF-PQ index rather than the source data. This gives a small number of points that cover the dataset well.
- A combination of 1+2: call IVF-PQ search using its own cluster centers as queries.
- Calculate per-component min/max of the cluster centers to get a bounding box; use the diagonal of the bounding box as a proxy for the spread of points in the dataset.
| float estimate_max_squared_norm( | ||
| raft::resources const& handle, | ||
| raft::mdspan<const DataT, raft::matrix_extent<int64_t>, raft::row_major, Accessor> dataset) |
There was a problem hiding this comment.
What about the other distance types? Would it make sense to generalize it to pass the whatever distance type is requested by the constructed index?
There was a problem hiding this comment.
This catches the overflow on IVF-PQ search only. build_knn_graph by IVF_PQ only allows 3 distance types cagra_build.cuh#L1635 (L2Expanded, CosineExpanded, InnerProduct).
HNSW excludes further the use of Cosine distance hnsw.hpp#L123
I have refactored the code for explicitness with a switch around line 104 of the same file.
Tested: overflow detected with the remaining distance types - L2Expanded and InnerProduct
Sampling is not necessary anyway.
7f41b35 to
3ddd98a
Compare
Large-magnitude unnormalized datasets (e.g. SIFT_1M) contains vectors whose norm overflowed the FP16 internal search types of IVF-PQ.
This PR detects it by sampling a fraction from the dataset, compute its L2 squared norm and estimate the max squared distance from the samples. When the distance reaches FP16 overflow range, the internal types fall back to FP32.
We computed max norm on the top 20k samples of the dataset