Add solver option and use SciPy solver by default#10
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a configurable linear-system solver for FastL2LiR and switches the default from NumPy’s generic solver to SciPy’s solver with assume_a="pos" to enable Cholesky-based solving for the normal equations.
Changes:
- Added a
solveroption toFastL2LiR('scipy'default,'numpy'alternative) and routed all linear solves through it. - Implemented a SciPy
assume_a="pos"solve with fallback toassume_a="sym"onLinAlgError. - Added tests to validate solver selection and to compare NumPy vs SciPy solver outputs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/fastl2lir/fastl2lir.py |
Adds solver selection in __init__ and replaces direct np.linalg.solve calls with a chosen solver function. |
tests/test_fastl2lir.py |
Adds coverage for solver selection and a cross-solver equivalence check. |
pyproject.toml |
Adds SciPy as a runtime dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
515f98f to
e2fd04b
Compare
8aec090 to
439c73e
Compare
Benchmark:
|
| Environment | Python | NumPy | SciPy | BLAS | Architecture | Speedup (scipy / numpy) |
|---|---|---|---|---|---|---|
| RISC-V | 3.12.1 | 1.26.4 | 1.12.0 | OpenBLAS 0.3.26 (system) | RISCV64_GENERIC (no SIMD) | 3.06x |
| x86-64 | 3.11.15 | 1.26.4 | 1.11.4 | OpenBLAS 0.3.23 (bundled) | Prescott (SSE2) | 1.36x |
| x86-64 | 3.11.15 | 2.4.6 | 1.17.1 | scipy-openblas 0.3.31 (shared) | SkylakeX (AVX-512) | 1.03x |
The speedup is largest in environments without SIMD optimization (RISC-V), where the algorithmic advantage of Cholesky decomposition (~half the FLOPs of LU) translates directly to wall-clock time. On modern x86 with AVX-512, SIMD optimization narrows the gap significantly.
Note: FastL2LiR's feature selection path runs with 1 BLAS thread (threadpool_limits(limits=1)), so the 1-thread figures above reflect actual runtime behavior for that code path.
ganow
left a comment
There was a problem hiding this comment.
Thanks for adding the solver option and the speedup. The overall direction looks good, the target matrices are symmetric positive definite so assume_a="pos" is appropriate, and the pos to sym fallback is sound.
I am requesting changes mainly for one blocking issue. Storing the solver as an inline closure on self.__solve makes fitted models unpicklable, which is a backward-compatibility regression for the common workflow of fitting a model, pickling it, and running prediction in another process. Moving the solve functions to top-level scope, as noted in the inline comment, resolves this while preserving the early validation and the one-time branch selection.
The remaining inline comments are smaller. Removing the unused self.__solver, extending the solver tests to cover the feature-selection and dual-form paths as well as the pos to sym fallback, and adding a pickle round-trip test.
…ck tests - Extract _solve_scipy and _solve_numpy as module-level functions so that FastL2LiR instances are picklable and solver logic is statically analysable - Remove unused self.__solver instance variable - Add test_solver_pickle: verifies pickle round-trip preserves W and b - Add test_solver_scipy_fallback: verifies pos->sym fallback completes without error and satisfies the linear system https://claude.ai/code/session_016xWp5Yz2EQz2F4TnWuaymM
ganow
left a comment
There was a problem hiding this comment.
Thank you for carefully addressing all the previous review comments. The blocking pickling issue is resolved by moving the solver functions to module level, and the added tests (the pos-to-sym fallback, the pickle round-trip, and the solver-level equivalence check) land exactly where our discussion pointed. I confirmed locally that the full test suite passes.
That said, I have to apologize: while re-reviewing the updated code, I noticed a few points that I had missed in my initial review pass, including one that stems from the design I myself suggested. I am sorry for not catching these the first time and for extending the review cycle as a result.
None of the new comments are blocking. Two are small suggestions (the SciPy version floor and the attribute access in the new test), and the pickling one is an open question where I would value your opinion on how (or whether) to handle it, rather than a requested change.
| def _solve_scipy(a, b): | ||
| try: | ||
| return sp_linalg.solve(a, b, assume_a="pos", check_finite=False) | ||
| except sp_linalg.LinAlgError: | ||
| return sp_linalg.solve(a, b, assume_a="sym", check_finite=False) | ||
|
|
||
|
|
||
| def _solve_numpy(a, b): | ||
| return np.linalg.solve(a, b) |
There was a problem hiding this comment.
Sorry for bringing this up at this stage. While verifying the fix, I noticed a remaining subtlety in the design I myself suggested, so this is on me as much as anything. Moving the solver to module-level functions fixes pickling within a single environment, but pickle stores functions by reference (module path + qualified name), which leaves an asymmetric constraint across package versions:
- A model pickled with the old version and loaded with this code works for
predict(), but callingfit()again raisesAttributeErrorbecause__solveis missing from the unpickled__dict__. - A model pickled with this version cannot be unpickled at all under the old package, since
_solve_scipydoes not exist there.
One can reasonably argue that pickles should never cross package versions in the first place, so I don't consider this blocking. On the other hand, the fit-on-one-machine / predict-on-another workflow makes version skew plausible in practice. If we wanted to harden this, one option is __getstate__/__setstate__ that stores the solver name and re-resolves the function on load. What do you think? Would this be worth addressing here, in a follow-up, or just documenting?
There was a problem hiding this comment.
Thank you for pointing this out.
Since this issue was introduced by this PR, I thought it would be better to address it here rather than only document it or leave it for a follow-up.
In 7fc620c, the model stores the solver name as state, excludes the resolved solver function from the pickle state, and re-resolves it in __setstate__.
For legacy pickles without a stored solver name, it falls back to the previous NumPy behavior and emits a warning. The new pickle state also avoids storing references to the solver helper functions, so it should be more robust to version skew.
Follow-up: solver benchmark on a typical compute serverSince switching the default solver is a behavior change, I benchmarked the two solvers on hardware representative of where we run feature decoding. What I measuredI modeled the path FastL2LiR actually takes for feature decoding (feature-decoding / bdpy / brain-decoding-cookbook): feature selection is ON, so each output unit solves one Script: https://gist.github.com/ganow/4f89c91006689a7955fac34881b3af2d Run on a 32-core OpenBLAS server with 1 BLAS thread actually enforced (the real decoding setting), on two dependency stacks: # current resolve: numpy 2.x, scipy 1.15.3, OpenBLAS 0.3.31
uv run python bench_solver.py
# the stack many users get under bdpy[all] (numpy<1.24): numpy 1.23.5, scipy 1.10.1, OpenBLAS 0.3.18
uv run --with 'numpy<1.24' --with 'scipy<1.11' python bench_solver.pyResults (ratio = scipy / numpy; > 1 means scipy is slower)float32 (the dtype the pipelines actually use)
float64
Takeaways:
Why scipy only helps in float32 (and why that is mostly a NumPy quirk)The flip is not Cholesky vs LU. Source ( def _commonType(*arrays):
# in lite version, use higher precision (always double or cdouble)
...
return double, result_type # first element (compute type) is always double
# in solve():
t, result_t = _commonType(a, b) # t == double, result_t == single for float32 input
...
return wrap(r.astype(result_t, copy=False)) # compute in double, cast back to float32This is observable in the result accuracy, independent of platform and threading: import numpy as np
from scipy import linalg as sp
n = 1200; rng = np.random.default_rng(0)
X = rng.standard_normal((6000, n)).astype(np.float32)
A = np.ascontiguousarray(X.T @ X + 100*np.eye(n, dtype=np.float32))
b = X.T @ rng.standard_normal(6000).astype(np.float32)
A64, b64 = A.astype(np.float64), b.astype(np.float64)
x_ref = np.linalg.solve(A64, b64)
err = lambda x: np.linalg.norm(x.astype(np.float64) - x_ref) / np.linalg.norm(x_ref)
print("numpy fp32 rel.err", err(np.linalg.solve(A, b))) # ~2.4e-8 (below float32 eps ~1.2e-7) -> computed in double
print("scipy fp32 rel.err", err(sp.solve(A, b, assume_a='pos', check_finite=False))) # ~2.6e-7 (genuine fp32)numpy's float32 result is accurate to below float32 machine epsilon, which is only possible if the computation ran in double. This reproduces on both a laptop and the OpenBLAS server. I am deliberately not quoting absolute per-routine solve times here: they depend heavily on the BLAS build and thread settings and are not portable (on the server, numpy's multi-threaded LAPACK path was far slower than scipy's, which is a separate effect from the dtype upcast). The ratio tables above are the relevant comparison, since they were measured under So the float32 advantage for scipy is partly that SuggestionThe Cholesky path is a genuine win for large float32 problems, so the solver option is valuable. But which solver is faster depends on dtype, matrix size, BLAS build, and CPU, and the crossover moves around (near n_feat≈500 here, but that is hardware/stack specific), so I don't think a reliable automatic dispatcher is feasible. Given that, defaulting to scipy looks a bit opinionated: on the stack many users run, it is a tie at the typical size in float32 and a regression for small or float64 cases. Would it make sense to keep numpy as the default (preserving the existing behavior and avoiding any silent slowdown) and expose the SciPy solver as an opt-in through the |
|
Correction to the comment above: I removed a per-routine timing snippet that quoted absolute solve times measured on my laptop. Re-running that exact snippet on the OpenBLAS server gave very different absolute numbers (numpy's multi-threaded LAPACK path came out roughly 16x slower than scipy's there), so those laptop numbers were not portable and could mislead. What does not change:
If anything, the server result reinforces the point that absolute performance is highly sensitive to BLAS build and threading, which is part of why I don't think an automatic solver dispatcher would be reliable. |
|
@ganow Before deciding the default, I wanted to check one point about the cost-benefit assessment. I agree that the relative ratios are important, but the absolute impact may also matter. My understanding is that most users of the actual feature decoding pipelines use float32, so the impact on the main float32 use case may deserve more weight than the float64 case. For float32, SciPy is indeed slower for small So I wanted to add that the trade-off may not be simply “sometimes faster, sometimes slower.” For the main float32 use case, the cost when SciPy is worse and the benefit when SciPy is better may be quite asymmetric. Even taking this into account, I am slightly leaning toward reverting the default to NumPy to avoid changing the existing behavior. Still, I wanted to check whether this asymmetric absolute impact changes your view on what would be most beneficial for the main users of this model. |
- Add _get_solver_func() to centralize solver selection logic - Reintroduce self.__solver for serializable solver name - Add __getstate__/__setstate__ for cross-version pickle compatibility - Legacy pickles without solver info fall back to numpy with UserWarning - Raise scipy lower bound from >=1.2.3 to >=1.8 - Update test_solver_helpers_agree to use module-level helpers directly - Strengthen test_solver_pickle with predict comparison - Add test_solver_getstate and test_solver_legacy_setstate https://claude.ai/code/session_016xWp5Yz2EQz2F4TnWuaymM
This PR adds a solver option to
FastL2LiRand uses the SciPy-based solver by default.Previously,
FastL2LiRusednp.linalg.solveto solve the normal equations. This treats the coefficient matrix as a general dense matrix. The new default solver usesscipy.linalg.solvewithassume_a="pos", which allows SciPy to use a Cholesky-based solver.The speedup seems particularly noticeable in environments where NumPy/SciPy use the OpenBLAS backend. In my local benchmarks, the SciPy-based solver achieved up to about a 10x speedup over the NumPy-based solver.
Changes
solveroption.Checks
Following
CONTRIBUTING.md, I confirmed thatpytestpasses.I also ran ruff check . and ruff format ., and all checks pass.
Note
The
assume_a="pos"solver may fail when the coefficient matrix has a large condition number. In such cases, the implementation falls back toassume_a="sym".However, even with
assume_a="sym"or the previousnp.linalg.solve-based implementation, there is no guarantee that a good numerical solution is obtained in such cases.A large condition number can occur when
alpha=0or whenalphais very small. In particular, whenalpha=0, returning an OLS estimator would be another possible implementation, but this PR does not address it because it would be a behavior change.