Skip to content

fix(rotated-bc): Schur preconditioning parity + 3D rotation nullspace — Zhong #248 blow-out (44 its -> 1)#306

Open
lmoresi wants to merge 2 commits into
developmentfrom
bugfix/rotated-freeslip-schur-pc
Open

fix(rotated-bc): Schur preconditioning parity + 3D rotation nullspace — Zhong #248 blow-out (44 its -> 1)#306
lmoresi wants to merge 2 commits into
developmentfrom
bugfix/rotated-freeslip-schur-pc

Conversation

@lmoresi

@lmoresi lmoresi commented Jul 3, 2026

Copy link
Copy Markdown
Member

Problem

Rotated strong free-slip was dramatically slower than the alternatives on curved/deformed boundaries — a blow-out in the Schur-complement iterations. On the Zhong et al. (2008) spherical response benchmark (#248 configuration, serial cellSize 1/8, tol 1e-7):

method outer its reason Ut err Ub err rotation content max|L|
Nitsche γ=10 1 2 0.48% 0.63% 1.6e-4
block-constrained 5 2 0.11% 0.37% 1.6e-4
rotated GAMG (before) 44 2 0.31% 0.14% 6.7e-4
rotated FMG (before) diverged (-11)
rotated GAMG (after) 1 2 0.23% 0.15% 1.6e-4
rotated FMG (after) converges (4 its at debug size; full-size run to follow) 2

np=8 (official 1/8 parallel case): after = 1 outer it, values identical to serial. Deformed-box (bulged top, per-node computed normals): 1 outer it. Deformed free-surface models paid the blow-out per Newton iteration per timestep.

Root causes (all in the self-contained rotated fieldsplit)

  1. The native 1/μ pressure-mass Schur preconditioner was dropped. Only snes.getJacobian()[0] was read; the Pmat carrying the DS JacobianPreconditioner block _pp_G0 (what the standard path uses via schur_precondition=a11) was never assembled. The solve substituted selfp + Jacobi — weak on curved/stretched cells and poisoned along the boundary strip by the unit constraint diagonals.
  2. 3D rigid rotations were never handled. _rotated_nullspace was 2D-only; a spherical shell with free-slip on both boundaries leaves THREE rotation modes on the operator — near-singular Krylov grind plus visible rotation pollution in the solution (the elevated angular momentum above; the correctness half of gthyagi's Zhong 2008 isoviscous response: constrained vs Nitsche free-slip behavior #248 observations).
  3. No constant-pressure nullspace on the inner Schur solve (the hand-built IS fieldsplit doesn't propagate it) — singular on enclosed domains.
  4. Custom-FMG velocity block on closed shells: the Galerkin coarse operator inherits the rotation nullspace, so the redundant/LU coarse solve hit a zero pivot (SUBPC_ERROR, outer reason -11 — FMG on a sphere never worked). The coarse solve is now SVD. The FMG route remains the preferred velocity block; GAMG (now tuned to native parity) is fallback-only.

Also in this PR

  • KSP/PC + PtAP + Schur-pmat reuse across Newton iterations in the nonlinear driver (was a full rebuild per iteration); info["ksp_its"] reports the linear iteration counts.
  • Constraint rows scaled by mean |diag(A_vv)| instead of 1.0 (exactness unchanged — solution rows are explicitly zeroed post-solve).
  • Analytic normals are unwrapped before lambdify, so mesh.CoordinateSystem.unit_e_0 works directly (previously a NameError); stray non-coordinate symbols get a clear error.
  • Environment (separate commit): numpy <2.5 cap (2.5.0 breaks locally-built petsc4py — verified with clean rebuilds; conda-forge binaries/CI unaffected) and ./uw now purges the shared petsc4py build/ cache before installing (stale-ABI objects were being silently reused across envs/worktrees).

Tests

  • New: 3D spherical-shell rotated free-slip in test_1018 (nullspace + iteration ceiling + machine-zero leak + 3-mode gauge removal) and a partition-independence golden in test_1064.
  • test_1018 14/14 serial; test_1064 7/7 at np2 and np4. All existing goldens unchanged (the fix changes the preconditioner, not the solution — L2 agree to ~1e-10).

Closes the performance half of #248's rotated free-slip story; benchmark scripts and logs in ~/+Simulations/rotated_schur_pc_study/.

Underworld development team with AI support from Claude Code

lmoresi added 2 commits July 3, 2026 20:47
…ullspace (#248)

The self-contained rotated free-slip fieldsplit silently downgraded the
native Stokes preconditioning, blowing out the outer Schur iterations on
curved/deformed boundaries (Zhong #248 spherical benchmark: 44 outer its
vs Nitsche 2 / block-constrained 5; deformed free-surface models pay this
per Newton iteration):

* The 1/mu pressure-mass Schur preconditioner was dropped: only
  snes.getJacobian()[0] was read, so the Pmat carrying the DS
  JacobianPreconditioner _pp_G0 block was never assembled, and the solve
  fell back to selfp + Jacobi. Now the Pmat is assembled alongside J and
  its p-p block is installed as a USER Schur preconditioner (exactly the
  native path's schur_precondition=a11; Q is identity on pressure).
* 3D rigid rotations were never treated: _rotated_nullspace was 2D-only,
  so a spherical shell with free-slip on both boundaries left THREE
  rotation modes on the operator (near-singular Krylov grind + rotation
  pollution in the answer). All e_k x r modes are now built, constraint-
  checked per mode, attached rotated, and projected out of the solution.
* The hand-built IS fieldsplit never propagated the constant-pressure
  nullspace to the inner Schur solve (singular on enclosed domains) —
  now attached to the Schur complement KSP.
* The GAMG fallback velocity block was bare preonly; it now mirrors the
  native tuning (fgmres 0.1*tol, agg nsmooths 2, additive, repartition).
  The custom-FMG route stays the preferred velocity block, unchanged in
  design, but its Galerkin coarse operator inherits the rotation
  nullspace on closed shells — the redundant/LU coarse solve hit a zero
  pivot (outer reason -11); the coarse solve is now SVD.
* Constraint rows are scaled by mean |diag(A_vv)| instead of 1.0 (unit
  diagonals amid eta/h^2 entries poisoned diagonal-based approximations
  along the entire rotated surface).
* The nonlinear driver now reuses the KSP/PC context, the PtAP product
  and the Schur pmat across Newton iterations (values refreshed in
  place) instead of a full rebuild per iteration; per-iteration linear
  its are reported in the info dict (ksp_its).
* Analytic normals are unwrapped before lambdify, so curvilinear forms
  (e.g. mesh.CoordinateSystem.unit_e_0) reduce to Cartesian instead of
  dying with a NameError; stray symbols get a clear error.

Zhong benchmark (serial 1/8, tol 1e-7): rotated GAMG 44 -> 1 outer it,
rotated FMG diverged -> converged, angular-momentum content back to
Nitsche levels. New 3D spherical-shell tests in test_1018 (serial) and
test_1064 (parallel golden, np2/np4); all existing goldens unchanged.

Underworld development team with AI support from Claude Code
…uild cache in uw

Two independent environment traps that both present as
"numpy.dtype size changed ... Expected 96 from C header, got 88":

* numpy 2.5.0 (pulled in by the >=2 range after #301) breaks LOCALLY
  BUILT petsc4py at import — verified with a clean rebuild against
  2.5.0 (fails) vs 2.4.6 (works). conda-forge petsc4py binaries are
  fine, so CI stays green while every AMR env breaks. Cap at <2.5
  until petsc4py/Cython support numpy 2.5.
* The in-tree petsc4py build/ cache (shared PETSc source tree) is
  reused by pip across envs and worktrees even when numpy or the
  PETSc arch changed, silently installing stale-ABI objects. ./uw now
  removes it before every petsc4py install.

Also of note (not in this commit): pypi gmsh 4.15.1 fails to mesh
SphericalShellInternalBoundary at cellSize <= 0.25 ("Identical points
in triangulation"); gmsh 4.13.1 works. Pin locally if you hit it.

Underworld development team with AI support from Claude Code

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes major performance and correctness issues in the self-contained rotated strong free-slip Stokes solve by restoring native Schur-preconditioning parity and properly handling 3D rigid-rotation nullspaces (notably for spherical shells / Zhong #248), and adds regression tests plus environment safeguards for locally-built petsc4py.

Changes:

  • Reintroduces the native 1/μ pressure-mass Schur preconditioner (Pmat p–p block) and propagates pressure/rotation nullspaces in the rotated fieldsplit-Schur path, including 3D rotation-mode handling and MG coarse-solve robustness.
  • Reuses rotated operators / KSP context across Newton iterations and improves constraint-row scaling to avoid boundary-strip conditioning pathologies.
  • Adds new 3D spherical-shell rotated free-slip tests (serial + partition-independence) and caps numpy <2.5 to avoid local petsc4py ABI breakage; updates uw helper to purge petsc4py build cache.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
uw Purges shared petsc4py in-tree build cache before local installs to avoid stale-ABI reuse.
tests/test_1018_rotated_freeslip.py Adds a 3D spherical-shell rotated free-slip regression covering nullspace handling, iteration bound, and leakage.
tests/parallel/test_1064_rotated_freeslip_parallel.py Adds partition-independence golden for 3D spherical-shell rotated free-slip behavior.
src/underworld3/utilities/rotated_bc.py Restores native Schur-preconditioning parity, adds 3D rotation nullspace support, and reuses solver context across Newton iterations.
pixi.toml Caps numpy to <2.5 due to verified local petsc4py import ABI failure on numpy 2.5.0.
pixi.lock Updates lockfile to reflect the numpy cap (pins numpy 2.4.6 artifacts).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +762 to +768
vopts = PETSc.Options()
vpfx = vel_pc.getOptionsPrefix() or ""
vopts.setValue(vpfx + "mg_coarse_pc_type", "svd")
vopts.delValue(vpfx + "mg_coarse_redundant_pc_type")
vel_pc.setFromOptions()
vel_pc.setUp()
vopts.delValue(vpfx + "mg_coarse_pc_type")
@lmoresi

lmoresi commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

Follow-up numbers (full-size FMG + repeat-solve timing)

Full-size FMG Zhong run (serial 1/8, non-nested coarse shell 0.19): now converges, reason 2 in 5 outer iterations (was PC failure, reason -11), solution identical to the GAMG run to ~5 digits. Caveat: wall clock was ~5.9 h, essentially all in the 3D barycentric prolongation build for the non-nested pair — the debug-size sphere (0.3/0.45) takes ~2 min total and GAMG at full size solves in ~53 s. So the SVD-coarse fix makes rotated FMG correct on closed 3D shells, but the non-nested 3D transfer build needs optimisation before it is practical at this size; use GAMG or a nested (refine()-based) hierarchy on 3D shells meanwhile. (2D annulus FMG is unaffected — fast and bit-identical goldens.)

Repeat-solve timing vs Nitsche (same process, solves 2–4 after JIT, benchmark size): Nitsche 45.1/46.2/43.3 s vs rotated 60.5/50.8/43.4 s — the first two rotated repeats were inflated by the concurrent FMG benchmark; the least-contended samples are 43.3 vs 43.4 s. Per-solve wall clock is effectively 1:1 at equal tolerance: assembly + MG setup dominates and is shared, and the rotated extras (Q build, PtAP, Pmat extraction, fieldsplit setup) are lost in that budget. Historical note: the OLD configuration also ran ~1:1 on this isoviscous benchmark — 44 cheap outer iterations (preonly bare-GAMG velocity, selfp+Jacobi) cost about the same as Nitsche's 1–2 expensive ones. The blow-out was a robustness problem: the weak-preconditioner iteration count grows with viscosity contrast / deformation / resolution (and multiplies per Newton step), while the restored native preconditioning stays flat.

Underworld development team with AI support from Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants