1454-projection-optimization by danieljvickers · Pull Request #1549 · MFlowCode/MFC

danieljvickers · 2026-06-09T20:41:47Z

Description

In many-particle cases, the limiting factor in the IBM compute is by far the time it takes to project the immersed boundaries onto the grid. This is fundamentally rooted in how we parallelize the work. The current code parallelizes of x, y, and z, but sequentially iterates through the IB patches. In cases where there are many IBs that are small, we are launching several (thousands) of GPU kernels each time step, but each kernel only works on hundreds of grid cells. Adding an option to parallelize over the thousands of particles should be significantly larger parallelism and thus optimize the projection.

In order to maintain the functionality of both parallelism methods, I need a separate set of geometry bounding checks. Since we perform a check in icpp patches and now 2 in IB patches, this is a significant amount of redundant code that must be maintained. To be somewhat forward-looking, I opted to merge all geometry checking into a single module that can be called from both forms of IB parallelism and the icpp pre_processing code. This should clean up the code nicely and significantly reduce code maintenance going forward. Since we can now change cylinder orientation with angles, I also deprecate the unneeded cylinder length checks.

The end result is the creation of a new module in common, the deletion of duplicate code, and a new parallelism path for IB patches when utilizing GPU compute.

Closes #1454, #1532, #1543

Type of change (delete unused ones)

New feature
Refactor

Testing

All tests pass on GNU compiler

Checklist

Check these like this [x] to indicate which of the below applies.

I added or updated tests for new behavior
I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)

GPU results match CPU results
Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not retriggered automatically. To request a review, comment on the PR:

@claude full review — Claude full review (also triggers on PR open/reopen/ready)
Or add label claude-full-review — Claude full review via label

…tch geometries

github-actions · 2026-06-09T20:44:30Z

Claude Code Review

Head SHA: aa507f7

Files changed:

15
src/simulation/m_ib_patches.fpp
src/common/m_patch_geometries.fpp
src/common/m_model.fpp
src/pre_process/m_icpp_patches.fpp
src/simulation/m_ibm.fpp
src/simulation/m_time_steppers.fpp
src/simulation/m_checker.fpp
src/simulation/m_global_parameters.fpp
src/pre_process/m_global_parameters.fpp
src/post_process/m_global_parameters.fpp

Findings

1. Data race in `s_apply_ib_patches_ib_parallelism` — non-deterministic IB marker writes when patches overlap

File: src/simulation/m_ib_patches.fpp — s_apply_ib_patches_ib_parallelism

Both the 2D and 3D variants GPU-parallelize over patch_id (outermost loop), then serially iterate over i, j, k inside each GPU thread. If two patches contain the same cell, two distinct GPU threads will write to the same ib_markers%sf(i, j, k) location concurrently without synchronization — a data race that produces non-deterministic results.

$:GPU_PARALLEL_LOOP(private='[i, il, ir, j, jl, jr, k, kl, kr, ...]', copyin='[xp, yp, zp]')
do patch_id = 1, num_ibs          ! parallelized — each thread owns a patch_id
    ...
    do k = kl, kr
        do j = jl, jr
            do i = il, ir
                ...
                if (...) ib_markers%sf(i, j, k) = encoded_patch_id  ! races with other threads

The old s_apply_ib_patches_grid_cell_parallelism serialized the patch loop (outer serial, inner GPU), so later patches deterministically overwrote earlier ones. The new IB-parallelism path breaks that invariant. The checker in m_checker.fpp only validates that ib is enabled — it does not require non-overlapping patches. Any case where patches touch (including particle-cloud spheres that may be placed adjacently) will silently produce wrong IB markers.

2. Dropped `k <= Np_local` upper-bound guard in `f_is_inside_airfoil` — latent out-of-bounds while-loop

File: src/common/m_patch_geometries.fpp — f_is_inside_airfoil

The deleted s_ib_airfoil guarded its upper-surface while-loop with k <= Np_local:

! old s_ib_airfoil — guarded:
do while (ib_airfoil_grids(airfoil_id)%upper(k)%x < xy_local(1) .and. k <= Np_local)

The replacement function omits the guard entirely:

! new f_is_inside_airfoil — unguarded:
do while (ib_airfoil_grids(airfoil_id)%upper(k)%x < x)
    k = k + 1
end do

The early-return check x <= ib_airfoil(airfoil_id)%c provides protection in exact arithmetic, but if a grid cell centre lands at c only by floating-point accumulation (rounding slightly above the stored chord endpoint), the loop will walk off the end of the array. Because f_is_inside_airfoil is tagged GPU_ROUTINE(parallelism='[seq]') and called inside GPU_PARALLEL_LOOP, an out-of-bounds access here silently corrupts GPU memory.

…nolithic subroutine for grid-cell parallelism

…draft of the grid parallelism subroutine. Deprecated cylinder value to only favor length_x being used

…tion of length_y and length_z for cylinders. All tests pass on GNU compilers for full test suite.

danieljvickers · 2026-06-10T16:30:05Z

All AI review comments are now irrelevant because of significant changes that have occurred between now and that review.

I doubt that this will pass the test suite on the first try, but on the off chance that it does we should not yet merge it. A data product of computational optimization performance should be presented before this PR is merged. Otherwise, it is unnecessary refactoring of the code.

…-bounds read

codecov · 2026-06-10T22:04:10Z

Codecov Report

❌ Patch coverage is 59.55056% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.65%. Comparing base (825adb2) to head (d1764ed).

Files with missing lines	Patch %	Lines
src/simulation/m_ib_patches.fpp	63.08%	58 Missing and 21 partials ⚠️
src/common/m_patch_geometries.fpp	52.77%	9 Missing and 8 partials ⚠️
src/simulation/m_ibm.fpp	0.00%	6 Missing ⚠️
src/pre_process/m_icpp_patches.fpp	50.00%	2 Missing and 2 partials ⚠️
src/common/m_model.fpp	0.00%	1 Missing ⚠️
src/simulation/m_time_steppers.fpp	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1549      +/-   ##
==========================================
- Coverage   60.95%   60.65%   -0.30%     
==========================================
  Files          82       83       +1     
  Lines       19926    19856      -70     
  Branches     2924     2945      +21     
==========================================
- Hits        12145    12043     -102     
- Misses       5805     5820      +15     
- Partials     1976     1993      +17

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

danieljvickers · 2026-06-12T03:23:40Z

Successful runtime comparison on Frontier. This plot shows run times for the s_apply_ib_patches subroutines.

In this test, I ran with 8k particles being projected onto the grid. Master is the current master branch (before this is merged) as a baseline of performance. The other two plots show the two types of parallelism now available from this branch.

the grid parallelism is the current Master branch functionality. This demonstrates that the performance is equivalent. The IB parallelism is the new parallelism introduced by this feature. In this particular case of 8k sphere with only about 300 grid cells per sphere, there was a 150x performance improvement over the previous parallelism. This this subroutine is called 3x per time step in the test, this is a 327 ms time reduction PER TIME STEP.

This plot should demonstrate that this feature is useful and should be merged. I will now move on to resolving any remaining issues with the test suite and hopefully getting this on master.

Resolves conflicts with the m_global_parameters_common refactor: - post_process m_global_parameters: keep master's relocation of shear/proc vars to m_global_parameters_common; keep PR's ib_airfoil/ib_airfoil_grids stubs needed by the new common m_patch_geometries module - simulation m_global_parameters: keep master's GPU_DECLARE (ib, num_ibs, ib_coefficient_of_friction now declared in m_global_parameters_common); relocate the PR's stl_models GPU_DECLARE to the TYPED_DECLS gpu flag in toolchain/mfc/params/definitions.py (auto-generated for sim)

sbryngelson · 2026-06-12T05:22:12Z

I've pushed a merge of master into this branch (ac1681a, a true merge commit — no history rewritten) to resolve the conflicts from the recently-merged global-parameters/registry refactors (#1550–#1556). What changed and why:

src/post_process/m_global_parameters.fpp — master moved shear_*, proc_coords, start_idx into the new shared m_global_parameters_common; kept your ib_airfoil/ib_airfoil_grids stubs that the new common m_patch_geometries module needs to compile for post_process.

src/simulation/m_global_parameters.fpp — the GPU_DECLARE lines for ib, num_ibs, and ib_coefficient_of_friction now live in m_global_parameters_common, so the manual list reduces to ib_airfoil_grids. Your addition of stl_models to the GPU declare list was relocated to the new parameter pipeline: derived-type declarations (and their sim-side GPU_DECLARE) are auto-generated from the TYPED_DECLS table in toolchain/mfc/params/definitions.py — I flipped the stl_models gpu flag there to True, which emits the same declare in the generated decls.

Heads-up for future commits on this branch: Fortran parameter declarations, namelist bindings, and MPI broadcasts are now generated from toolchain/mfc/params/definitions.py at build time (see docs/documentation/contributing.md). Your many_ib_patch_parallelism registration came through cleanly and its broadcast is auto-generated — no action needed.

Verification on my end: ./mfc.sh format, ./mfc.sh precheck, the toolchain pytest suite (342 passed), and a full CPU build of all three targets all pass on the merged tree. Nothing remains on your side; CI should run on the updated head.

danieljvickers · 2026-06-12T11:01:21Z

I am very glad you have merged this for me because I screwed up the previous one significantly, lol.

…ers/MFC into 1454-projection-optimization

…ler issue

…utine

danieljvickers · 2026-06-13T23:04:58Z

@sbryngelson I assume you are going to want to wait merging this until after we resolve the master branch AMD issue with the 2D viscous shock tube case. So I am going to just table it here. But otherwise it looks like we pass the full test suite and it is ready* to merge.

sbryngelson · 2026-06-13T23:06:27Z

@sbryngelson I assume you are going to want to wait merging this until after we resolve the master branch AMD issue with the 2D viscous shock tube case. So I am going to just table it here. But otherwise it looks like we pass the full test suite and it is ready* to merge.

@danieljvickers noted!

…tate; adopt master's IB-patch parallelism refactor (MFlowCode#1603/MFlowCode#1549) post_process m_global_parameters: kept MFlowCode#1290 nidx/neighbor_ranks + master's ib_airfoil decls. m_ib_patches: took master's two-mode s_apply_ib_patches dispatcher and re-applied MFlowCode#1290's x_domain->glb_bounds substitution (10 sites, both parallelism modes) so IB periodic wrapping uses the global extent under MPI decomposition. Verified: 3-target CPU build, precheck clean.

Initial modifications to allow us to use a unified boudns check on pa…

f85f983

…tch geometries

danieljvickers added 9 commits June 9, 2026 21:44

Removed all of the old outdated subroutines and merged them into a mo…

4c94454

…nolithic subroutine for grid-cell parallelism

part of the way through adding a secondary loop

6f76374

Unified some of the patch geometry functions. Finished out the first …

ec363df

…draft of the grid parallelism subroutine. Deprecated cylinder value to only favor length_x being used

Passes circle tests

72529c9

All tests pass, except for STL mdoels

ce800e4

All tests pass on GNU compilers

f5dc0d7

Added patch geometries to docs and fixed spelling

3cd4661

Missed some spelling in my git add

9b0b59d

Integrated with icpp patches and update documents to note the depreca…

aeaa21e

…tion of length_y and length_z for cylinders. All tests pass on GNU compilers for full test suite.

danieljvickers marked this pull request as ready for review June 10, 2026 16:19

danieljvickers requested a review from sbryngelson as a code owner June 10, 2026 16:19

Merged with master

4f88e03

danieljvickers added 3 commits June 10, 2026 14:28

Need to hard-code index 0 in ib markers array in 2D to prevent out-of…

8c2ffb2

…-bounds read

Updated a GPU paralellism macro that contained a removed variable

a2a3e8e

Formatting

f168b40

danieljvickers and others added 11 commits June 10, 2026 23:03

Passes test suite on NVHPC GPU OpenACC Parallelism

dac7c47

Missing length from a private statement

8680a89

Accidentally moved length to a copyin, not private

16ca6d1

Format

471c924

Fixed airfoils on cray GPU

2741604

Merge branch 'master' into 1454-projection-optimization

c985fc0

Merge branch 'master' into 1454-projection-optimization

d99331b

Resolved issues from merge conflict

c31b231

Merge branch 'master' into 1454-projection-optimization

4380b0a

Some loop optimizations and debug multi-particle periodicity

04a5bf0

Forgot the radius in the private statement of 2D multi-ib parallelism

f06fefc

danieljvickers added 5 commits June 13, 2026 11:02

reworked some of the moment compute to be more stable

7030ffd

Merge branch 'master' into 1454-projection-optimization

9136b81

Merge branch '1454-projection-optimization' of github.com:danieljvick…

91f7df6

…ers/MFC into 1454-projection-optimization

Protecting the moment of inertia check as a stop gap here for a compi…

d0f6d25

…ler issue

Cleanup and appear to have gotten stability in the m_ib_patches subro…

fcf303a

…utine

sbryngelson marked this pull request as draft June 15, 2026 03:38

danieljvickers mentioned this pull request Jun 15, 2026

fix: correct axis normalization in s_compute_moment_of_inertia #1597

Closed

4 tasks

Merge branch 'master' into 1454-projection-optimization

aa507f7

danieljvickers marked this pull request as ready for review June 15, 2026 14:38

Merge branch 'master' into 1454-projection-optimization

d1764ed

sbryngelson merged commit b4be438 into MFlowCode:master Jun 16, 2026
82 checks passed

danieljvickers deleted the 1454-projection-optimization branch June 16, 2026 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1454-projection-optimization #1549

1454-projection-optimization #1549
sbryngelson merged 33 commits into
MFlowCode:masterfrom
danieljvickers:1454-projection-optimization

danieljvickers commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

danieljvickers commented Jun 10, 2026

Uh oh!

codecov Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

danieljvickers commented Jun 12, 2026

Uh oh!

sbryngelson commented Jun 12, 2026

Uh oh!

danieljvickers commented Jun 12, 2026

Uh oh!

danieljvickers commented Jun 13, 2026

Uh oh!

sbryngelson commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

danieljvickers commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change (delete unused ones)

Testing

Checklist

AI code reviews

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Findings

1. Data race in s_apply_ib_patches_ib_parallelism — non-deterministic IB marker writes when patches overlap

2. Dropped k <= Np_local upper-bound guard in f_is_inside_airfoil — latent out-of-bounds while-loop

Uh oh!

danieljvickers commented Jun 10, 2026

Uh oh!

codecov Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

danieljvickers commented Jun 12, 2026

Uh oh!

sbryngelson commented Jun 12, 2026

Uh oh!

danieljvickers commented Jun 12, 2026

Uh oh!

danieljvickers commented Jun 13, 2026

Uh oh!

sbryngelson commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

danieljvickers commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

1. Data race in `s_apply_ib_patches_ib_parallelism` — non-deterministic IB marker writes when patches overlap

2. Dropped `k <= Np_local` upper-bound guard in `f_is_inside_airfoil` — latent out-of-bounds while-loop

codecov Bot commented Jun 10, 2026 •

edited

Loading