Specialize non-reducing kernel: drop reduction-only iszero hoist by lkdvos · Pull Request #69 · QuantumKitHub/Strided.jl

lkdvos · 2026-06-24T01:05:09Z

What

The dimension-1 inner core in _mapreduce_kernel_expr was generated with an
iszero(stride_1_1) branch:

if iszero(stride_1_1)          # hoist A1[I1] out of the loop
    a = A1[I1]; @simd ...; A1[I1] = a
else
    @simd ... A1[I1] = op(A1[I1], f(...)) ...
end

This hoist only earns its keep for reductions, where the destination has
stride 0 along the inner loop dim (the same element is accumulated into
repeatedly). For the non-reducing path — op === nothing, i.e.
map!/permutedims!/copy!/fill!/conj!, all of which reach the kernel via
_mapreduce_fuse!(f, nothing, nothing, ...) — every destination element is
written exactly once, so stride_1_1 is always nonzero and the hoist branch is
dead. Yet it was still emitted, generating the inner @simd body twice and
leaving a runtime branch in the hot map!/permute loop.

This PR keys the inner-core construction on op: the non-reducing case emits a
single direct @simd loop (one body, no branch); the reducing case is unchanged.

It is correct even in the (non-occurring) zero-stride case for op === nothing:
the plain loop simply overwrites A1[I1] each iteration (last write wins),
which matches the hoisted form.

Why it's safe

This is a pure generation-time specialization of the staged Expr — it does
not introduce any function-call boundary inside the loop nest, so it avoids the
whole-nest-optimization pitfalls that make the bandwidth-bound permute path
fragile.

Benchmarks

Measured with benchmark/runtime_bench.jl and benchmark/compile_bench.jl,
baseline vs branch run back-to-back on the same machine, using the unchanged
reduce_* cases as a drift control (this workstation drifts a few % between
runs).

	changed (permute/add)	control (reductions, unchanged)	read as
Runtime (single-thread)	−0.9% mean	−2.6% mean (drift)	neutral — no permute regression
Compile	−5.2% (add −8.4%, permute −2.7%)	−3.1% (drift)	~2% real reduction for non-reducing

So: runtime-neutral, a small compile-time win for map!/permute specializations,
and one fewer dead branch + half the inner-body codegen on the non-reducing path.

Full test suite passes (including GPU/JLArray/CuArray paths).

🤖 Generated with Claude Code

The dim-1 inner core in `_mapreduce_kernel_expr` carried an `iszero(stride_1_1)` branch that hoists `A1[I1]` out of the `@simd` loop. That hoist only matters for reductions, where the destination stride along the inner dim is zero. For the non-reducing path (`op === nothing`: map!/permute/copy!/fill!/...) every destination element is written exactly once, so the branch is dead and the loop body was needlessly generated twice. Emit the plain `@simd` loop (one body, no runtime branch) for `op === nothing`, keeping the hoist for reductions. Runtime-neutral (no permute regression) and trims a bit of non-reducing compile time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lkdvos · 2026-06-24T01:09:30Z

TLDR here: there was a branch that is never hit for non-reductions, which we can eliminate to speed up compilation and which might actually slightly improve runtimes since we can slightly reduce the size of the generated code.

I did some benchmarks that show slight improvements, nothing major but every small bit helps!

codecov · 2026-06-24T01:21:36Z

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/mapreduce.jl	90.90%	2 Missing ⚠️

Files with missing lines	Coverage Δ
src/mapreduce.jl	`80.76% <90.90%> (+0.15%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lkdvos requested a review from Jutho June 24, 2026 01:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specialize non-reducing kernel: drop reduction-only iszero hoist#69

Specialize non-reducing kernel: drop reduction-only iszero hoist#69
lkdvos wants to merge 1 commit into
mainfrom
ld-nonreduce-kernel

lkdvos commented Jun 24, 2026

Uh oh!

lkdvos commented Jun 24, 2026

Uh oh!

codecov Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lkdvos commented Jun 24, 2026

What

Why it's safe

Benchmarks

Uh oh!

lkdvos commented Jun 24, 2026

Uh oh!

codecov Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented Jun 24, 2026 •

edited

Loading