Skip to content

Cray CCE: drop simd from the GPU offload directive#1606

Open
sbryngelson wants to merge 1 commit into
MFlowCode:masterfrom
sbryngelson:cce-drop-simd
Open

Cray CCE: drop simd from the GPU offload directive#1606
sbryngelson wants to merge 1 commit into
MFlowCode:masterfrom
sbryngelson:cce-drop-simd

Conversation

@sbryngelson

@sbryngelson sbryngelson commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

On the Cray CCE path the GPU-offload loop directive carries a simd clause. This drops it (and the
matching end):

- !$omp target teams distribute parallel do simd defaultmap(firstprivate:scalar)
+ !$omp target teams distribute parallel do defaultmap(firstprivate:scalar)

CCE-only — gated by MFC_COMPILER == CCE_COMPILER_ID in src/common/include/omp_macros.fpp.
The NVIDIA branch (target teams loop) and the AMD branch (target teams distribute parallel do,
already without simd) are untouched. This brings the CCE directive in line with AMD's.

Rationale: a cleanup, not a perf change

simd is a CPU-vectorization construct; on a SIMT GPU it provides no lane-vectorization and is
effectively vestigial. Removing it is:

  • Correctness-neutral. The 22 GPU Riemann-kernel regression tests (HLL / hypoelastic / MHD /
    RMHD / IBM / examples) all pass, identical to with simd.

  • Performance-neutral. 7-case benchmark sweep on Frontier (CCE 19.0.0, MI250X), grind in
    ns/gp/eq/rhs (mean of 2 reps), comparing two builds that differ only by this clause:

    case with simd no simd ratio
    5eq_rk3_weno3_hll 5.118 5.099 1.00
    5eq_rk3_weno3_hllc 4.860 4.783 0.98
    5eq_rk3_weno3_lf 4.802 4.930 1.03
    hypo_hll 3.090 3.040 0.98
    ibm 12.159 12.160 1.00
    igr 3.439 3.357 0.98
    viscous_weno5_sgb_acoustic 7.947 7.914 1.00

    The result is mixed and within measurement noise — rep-to-rep scatter was ~4–5%, larger than
    every per-case delta (e.g. lf is nominally slower without simd). Net is ≈ neutral
    (geometric mean ~0.6% toward no-simd, not significant). No performance claim is made — the
    CI bench lane provides the authoritative numbers. This change is justified solely as removing a
    clause that does nothing useful on a SIMT target.

Risk

Minimal: two-line, CCE-gated string change; no behavior change on any other compiler; correctness
verified on the GPU regression suite.

Copilot AI review requested due to automatic review settings June 16, 2026 15:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the Cray CCE-specific OpenMP GPU offload loop directive to drop the simd clause, aligning it with the existing AMD directive and avoiding potentially over-constraining GPU codegen on CCE.

Changes:

  • Removed simd from the CCE omp_start_directive GPU offload loop directive string.
  • Removed simd from the matching CCE omp_end_directive string.

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.51%. Comparing base (b4be438) to head (d4b9320).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1606   +/-   ##
=======================================
  Coverage   60.51%   60.51%           
=======================================
  Files          83       83           
  Lines       19905    19905           
  Branches     2950     2950           
=======================================
  Hits        12046    12046           
  Misses       5866     5866           
  Partials     1993     1993           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants