Cray CCE: drop simd from the GPU offload directive#1606
Open
sbryngelson wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the Cray CCE-specific OpenMP GPU offload loop directive to drop the simd clause, aligning it with the existing AMD directive and avoiding potentially over-constraining GPU codegen on CCE.
Changes:
- Removed
simdfrom the CCEomp_start_directiveGPU offload loop directive string. - Removed
simdfrom the matching CCEomp_end_directivestring.
… and performance-neutral)
9ab1089 to
d4b9320
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1606 +/- ##
=======================================
Coverage 60.51% 60.51%
=======================================
Files 83 83
Lines 19905 19905
Branches 2950 2950
=======================================
Hits 12046 12046
Misses 5866 5866
Partials 1993 1993 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On the Cray CCE path the GPU-offload loop directive carries a
simdclause. This drops it (and thematching
end):CCE-only — gated by
MFC_COMPILER == CCE_COMPILER_IDinsrc/common/include/omp_macros.fpp.The NVIDIA branch (
target teams loop) and the AMD branch (target teams distribute parallel do,already without
simd) are untouched. This brings the CCE directive in line with AMD's.Rationale: a cleanup, not a perf change
simdis a CPU-vectorization construct; on a SIMT GPU it provides no lane-vectorization and iseffectively vestigial. Removing it is:
Correctness-neutral. The 22 GPU Riemann-kernel regression tests (HLL / hypoelastic / MHD /
RMHD / IBM / examples) all pass, identical to with
simd.Performance-neutral. 7-case benchmark sweep on Frontier (CCE 19.0.0, MI250X), grind in
ns/gp/eq/rhs (mean of 2 reps), comparing two builds that differ only by this clause:
The result is mixed and within measurement noise — rep-to-rep scatter was ~4–5%, larger than
every per-case delta (e.g.
lfis nominally slower withoutsimd). Net is ≈ neutral(geometric mean ~0.6% toward no-
simd, not significant). No performance claim is made — theCI bench lane provides the authoritative numbers. This change is justified solely as removing a
clause that does nothing useful on a SIMT target.
Risk
Minimal: two-line, CCE-gated string change; no behavior change on any other compiler; correctness
verified on the GPU regression suite.