fix(rotated-bc): Schur preconditioning parity + 3D rotation nullspace — Zhong #248 blow-out (44 its -> 1)#306
fix(rotated-bc): Schur preconditioning parity + 3D rotation nullspace — Zhong #248 blow-out (44 its -> 1)#306lmoresi wants to merge 2 commits into
Conversation
…ullspace (#248) The self-contained rotated free-slip fieldsplit silently downgraded the native Stokes preconditioning, blowing out the outer Schur iterations on curved/deformed boundaries (Zhong #248 spherical benchmark: 44 outer its vs Nitsche 2 / block-constrained 5; deformed free-surface models pay this per Newton iteration): * The 1/mu pressure-mass Schur preconditioner was dropped: only snes.getJacobian()[0] was read, so the Pmat carrying the DS JacobianPreconditioner _pp_G0 block was never assembled, and the solve fell back to selfp + Jacobi. Now the Pmat is assembled alongside J and its p-p block is installed as a USER Schur preconditioner (exactly the native path's schur_precondition=a11; Q is identity on pressure). * 3D rigid rotations were never treated: _rotated_nullspace was 2D-only, so a spherical shell with free-slip on both boundaries left THREE rotation modes on the operator (near-singular Krylov grind + rotation pollution in the answer). All e_k x r modes are now built, constraint- checked per mode, attached rotated, and projected out of the solution. * The hand-built IS fieldsplit never propagated the constant-pressure nullspace to the inner Schur solve (singular on enclosed domains) — now attached to the Schur complement KSP. * The GAMG fallback velocity block was bare preonly; it now mirrors the native tuning (fgmres 0.1*tol, agg nsmooths 2, additive, repartition). The custom-FMG route stays the preferred velocity block, unchanged in design, but its Galerkin coarse operator inherits the rotation nullspace on closed shells — the redundant/LU coarse solve hit a zero pivot (outer reason -11); the coarse solve is now SVD. * Constraint rows are scaled by mean |diag(A_vv)| instead of 1.0 (unit diagonals amid eta/h^2 entries poisoned diagonal-based approximations along the entire rotated surface). * The nonlinear driver now reuses the KSP/PC context, the PtAP product and the Schur pmat across Newton iterations (values refreshed in place) instead of a full rebuild per iteration; per-iteration linear its are reported in the info dict (ksp_its). * Analytic normals are unwrapped before lambdify, so curvilinear forms (e.g. mesh.CoordinateSystem.unit_e_0) reduce to Cartesian instead of dying with a NameError; stray symbols get a clear error. Zhong benchmark (serial 1/8, tol 1e-7): rotated GAMG 44 -> 1 outer it, rotated FMG diverged -> converged, angular-momentum content back to Nitsche levels. New 3D spherical-shell tests in test_1018 (serial) and test_1064 (parallel golden, np2/np4); all existing goldens unchanged. Underworld development team with AI support from Claude Code
…uild cache in uw Two independent environment traps that both present as "numpy.dtype size changed ... Expected 96 from C header, got 88": * numpy 2.5.0 (pulled in by the >=2 range after #301) breaks LOCALLY BUILT petsc4py at import — verified with a clean rebuild against 2.5.0 (fails) vs 2.4.6 (works). conda-forge petsc4py binaries are fine, so CI stays green while every AMR env breaks. Cap at <2.5 until petsc4py/Cython support numpy 2.5. * The in-tree petsc4py build/ cache (shared PETSc source tree) is reused by pip across envs and worktrees even when numpy or the PETSc arch changed, silently installing stale-ABI objects. ./uw now removes it before every petsc4py install. Also of note (not in this commit): pypi gmsh 4.15.1 fails to mesh SphericalShellInternalBoundary at cellSize <= 0.25 ("Identical points in triangulation"); gmsh 4.13.1 works. Pin locally if you hit it. Underworld development team with AI support from Claude Code
There was a problem hiding this comment.
Pull request overview
This PR fixes major performance and correctness issues in the self-contained rotated strong free-slip Stokes solve by restoring native Schur-preconditioning parity and properly handling 3D rigid-rotation nullspaces (notably for spherical shells / Zhong #248), and adds regression tests plus environment safeguards for locally-built petsc4py.
Changes:
- Reintroduces the native 1/μ pressure-mass Schur preconditioner (Pmat p–p block) and propagates pressure/rotation nullspaces in the rotated fieldsplit-Schur path, including 3D rotation-mode handling and MG coarse-solve robustness.
- Reuses rotated operators / KSP context across Newton iterations and improves constraint-row scaling to avoid boundary-strip conditioning pathologies.
- Adds new 3D spherical-shell rotated free-slip tests (serial + partition-independence) and caps numpy
<2.5to avoid local petsc4py ABI breakage; updatesuwhelper to purge petsc4py build cache.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| uw | Purges shared petsc4py in-tree build cache before local installs to avoid stale-ABI reuse. |
| tests/test_1018_rotated_freeslip.py | Adds a 3D spherical-shell rotated free-slip regression covering nullspace handling, iteration bound, and leakage. |
| tests/parallel/test_1064_rotated_freeslip_parallel.py | Adds partition-independence golden for 3D spherical-shell rotated free-slip behavior. |
| src/underworld3/utilities/rotated_bc.py | Restores native Schur-preconditioning parity, adds 3D rotation nullspace support, and reuses solver context across Newton iterations. |
| pixi.toml | Caps numpy to <2.5 due to verified local petsc4py import ABI failure on numpy 2.5.0. |
| pixi.lock | Updates lockfile to reflect the numpy cap (pins numpy 2.4.6 artifacts). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| vopts = PETSc.Options() | ||
| vpfx = vel_pc.getOptionsPrefix() or "" | ||
| vopts.setValue(vpfx + "mg_coarse_pc_type", "svd") | ||
| vopts.delValue(vpfx + "mg_coarse_redundant_pc_type") | ||
| vel_pc.setFromOptions() | ||
| vel_pc.setUp() | ||
| vopts.delValue(vpfx + "mg_coarse_pc_type") |
Follow-up numbers (full-size FMG + repeat-solve timing)Full-size FMG Zhong run (serial 1/8, non-nested coarse shell 0.19): now converges, reason 2 in 5 outer iterations (was PC failure, reason -11), solution identical to the GAMG run to ~5 digits. Caveat: wall clock was ~5.9 h, essentially all in the 3D barycentric prolongation build for the non-nested pair — the debug-size sphere (0.3/0.45) takes ~2 min total and GAMG at full size solves in ~53 s. So the SVD-coarse fix makes rotated FMG correct on closed 3D shells, but the non-nested 3D transfer build needs optimisation before it is practical at this size; use GAMG or a nested ( Repeat-solve timing vs Nitsche (same process, solves 2–4 after JIT, benchmark size): Nitsche 45.1/46.2/43.3 s vs rotated 60.5/50.8/43.4 s — the first two rotated repeats were inflated by the concurrent FMG benchmark; the least-contended samples are 43.3 vs 43.4 s. Per-solve wall clock is effectively 1:1 at equal tolerance: assembly + MG setup dominates and is shared, and the rotated extras (Q build, PtAP, Pmat extraction, fieldsplit setup) are lost in that budget. Historical note: the OLD configuration also ran ~1:1 on this isoviscous benchmark — 44 cheap outer iterations (preonly bare-GAMG velocity, selfp+Jacobi) cost about the same as Nitsche's 1–2 expensive ones. The blow-out was a robustness problem: the weak-preconditioner iteration count grows with viscosity contrast / deformation / resolution (and multiplies per Newton step), while the restored native preconditioning stays flat. Underworld development team with AI support from Claude Code |
Problem
Rotated strong free-slip was dramatically slower than the alternatives on curved/deformed boundaries — a blow-out in the Schur-complement iterations. On the Zhong et al. (2008) spherical response benchmark (#248 configuration, serial cellSize 1/8, tol 1e-7):
np=8 (official 1/8 parallel case): after = 1 outer it, values identical to serial. Deformed-box (bulged top, per-node computed normals): 1 outer it. Deformed free-surface models paid the blow-out per Newton iteration per timestep.
Root causes (all in the self-contained rotated fieldsplit)
snes.getJacobian()[0]was read; the Pmat carrying the DS JacobianPreconditioner block_pp_G0(what the standard path uses viaschur_precondition=a11) was never assembled. The solve substitutedselfp+ Jacobi — weak on curved/stretched cells and poisoned along the boundary strip by the unit constraint diagonals._rotated_nullspacewas 2D-only; a spherical shell with free-slip on both boundaries leaves THREE rotation modes on the operator — near-singular Krylov grind plus visible rotation pollution in the solution (the elevated angular momentum above; the correctness half of gthyagi's Zhong 2008 isoviscous response: constrained vs Nitsche free-slip behavior #248 observations).SUBPC_ERROR, outer reason -11 — FMG on a sphere never worked). The coarse solve is now SVD. The FMG route remains the preferred velocity block; GAMG (now tuned to native parity) is fallback-only.Also in this PR
info["ksp_its"]reports the linear iteration counts.mesh.CoordinateSystem.unit_e_0works directly (previously aNameError); stray non-coordinate symbols get a clear error.<2.5cap (2.5.0 breaks locally-built petsc4py — verified with clean rebuilds; conda-forge binaries/CI unaffected) and./uwnow purges the shared petsc4pybuild/cache before installing (stale-ABI objects were being silently reused across envs/worktrees).Tests
test_1018(nullspace + iteration ceiling + machine-zero leak + 3-mode gauge removal) and a partition-independence golden intest_1064.test_101814/14 serial;test_10647/7 at np2 and np4. All existing goldens unchanged (the fix changes the preconditioner, not the solution — L2 agree to ~1e-10).Closes the performance half of #248's rotated free-slip story; benchmark scripts and logs in
~/+Simulations/rotated_schur_pc_study/.Underworld development team with AI support from Claude Code