Open up bdpy.mri.fmriprep tests: surface GM, split_task_label, real_data marker, test_metrics FP fix#116
Open up bdpy.mri.fmriprep tests: surface GM, split_task_label, real_data marker, test_metrics FP fix#116kencan7749 wants to merge 4 commits into
bdpy.mri.fmriprep tests: surface GM, split_task_label, real_data marker, test_metrics FP fix#116Conversation
|
Thank you both for your work on the fMRIPrep test workflow. I understand that #116 by @kencan7749 builds on the implementation introduced in #115 by @izpyon and adds further updates. Given this relationship, I think it may be practical to consolidate the review here, provided that the contribution from #115 remains properly credited and both authors are aligned with that direction. @kencan7749, would you be willing to take over the follow-up changes and discussion points originally raised for #115 in this PR? Regarding the real-data workflow from #115One point I would like to revisit is the real-data workflow originally introduced in #115. The current workflow downloads raw BIDS data from OpenNeuro via DataLad and then runs FreeSurfer / fMRIPrep locally. While this is reproducible, it may be too heavy for a test fixture workflow, and DataLad is not currently part of our standard lab workflow. Could you consider revising the real-data fixture workflow so that a small fMRIPrep-processed test fixture is hosted externally, for example on Figshare, and downloaded explicitly only when running the In that case, the repository would contain the test code and download / verification scripts, but not the real-data-derived binary files themselves. It would also be helpful to document the fixture version, checksum, source OpenNeuro dataset, fMRIPrep / FreeSurfer versions, Docker image tag, generation command, and expected directory structure in the README. Regarding the
|
a72b8d9 to
b0cd1b7
Compare
…l_data marker
Phase 1 of bdpy.mri.fmriprep test code opening:
- Extend build_expected_bdata_after_exclude() in test_fmriprep_utils_mock.py
with a data_mode parameter supporting all four surface modes (native and
three standard variants). For surface modes, the helper now loads GIFTI
files via nibabel and hstacks L/R hemispheres to mirror the production
BrainData.__load_surface path, and emits VertexData / VertexLeft /
VertexRight / vertex_index metadata in place of voxel coordinates.
- Add TestCreateBdataFmriprepMock.test_create_bdata_fmriprep_surface_native_gm:
value-level golden-master comparison for surface_native with
with_confounds=True and exclude={"session/run": [[1, 2], None]}, mirroring
the existing volume_native GM coverage. New fixture:
tests/data/mri/golden_master/mock/test_output_fmriprep_subject_surface.h5
- Add TestCreateBdataFmriprepMock.test_create_bdata_fmriprep_split_task_label_single_task:
exercises split_task_label=True. The mock dataset has a single task
("task-mock"), so this verifies the data_labels suffix format and that the
bdata content matches the non-split GM. Multi-task coverage is provided by
test_fmriprep_real.py.
- Introduce a "real_data" pytest marker (declared in pyproject.toml and
applied to TestCreateBdataFmriprepReal). CI can now run
`pytest -m "not real_data"` to exclude the real dataset test cleanly.
- Update tests/mri/fmriprep/scripts/mock/step_1_prepare_gm.sh to track the new
surface GM file, and refresh README.md / TEST_COVERAGE.md.
Verification: `pytest tests/ -m "not slow and not real_data" --tb=line -q`
=> 245 passed, 1 deselected, 0 failed (Phase 0 baseline was 243).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
b0cd1b7 to
387a009
Compare
|
Thank you for the suggestion. Based on the suggestion by @micchu
|
|
Regarding the testing real-data workflow, I'd like to share what I noticed during my investigation and get your opinion. My understanding is that this How should we prepare these fMRIPrep-preprocessed files? Are there candidate datasets, sessions, or runs that you have in mind? If data along the lines of the OpenNeuro datasets that @izpyon used previously would be best suited for this test, it would be a great help if @izpyon could upload the preprocessed files. In addition, I've confirmed that this module cannot be tested with fMRIPrep-derived files alone. Beyond the fMRIPrep-derived files, the following two files are also required:
Both will be standard files included in the raw BIDS data, so we need to be aware of this when uploading the data. Note that *_events.tsv is essential because fMRIPrep does not touch it. However, TR information should also be available in the fMRIPrep-processed files. Actually, the current code uses Lines 188 to 193 in 9d0b724 Lines 610 to 612 in 9d0b724 so I think *_bold.json might become unnecessary in the future. I'd appreciate your thoughts on (1) how to prepare these fMRIPrep-preprocessed files, and (2) how we should handle *_bold.json going forward. |
|
Dear @kencan7749 -san and @micchu -san, Sorry for my late reply. I also apologize for not having been able to complete the task properly at the time. As far as I remember, @micchu -san initially told me that, although the module would ultimately be used with data in the lab environment, we should use an open dataset for real-data testing in light of the openness of the test code. I also remember being advised to use a relatively recent dataset if possible. Based on these conditions, I selected the SoundRecon dataset. If it would be helpful for me to upload the preprocessed data I have, I would be happy to do so. Regarding the minimal dataset, I think using multiple sessions/runs makes sense, especially if we want to test the One minor concern is that, at least in the actual lab environment I have seen, I do not often encounter a BIDS directory containing multiple subjects. Therefore, if maintaining a multi-subject minimal dataset would introduce additional cost or complexity, it might be worth considering whether multi-subject support is strictly necessary for this particular real-data test. That said, multiple sessions/runs do seem useful for testing the expected behavior of the Thank you. |
|
Sorry for the delayed reply. Some of this overlaps with the discussion we had in the regular meeting, but I will also leave my comments here.
Yes, I was the one who asked you to use open data as much as possible. At that time, I had not fully considered that the workflow would require running fMRIPrep. I am very sorry about that. If the fMRIPrep-processed data you created can be used, I think it would be good to use it as the source data for this test fixture. If the working directory still remains on the server, could you please let us know its path? If it has already been deleted, or if sharing it would be difficult, we can prepare the data separately, so please do not hesitate to let us know. Also, if there is any issue with my understanding or assumptions, please feel free to point it out. |
|
Dear @micchu -san, @kencan7749 -san, I sincerely apologize for the inconvenience caused by not being able to complete this work on schedule. I have confirmed that the data generated when I ran the workflow are still available on the HDD server, although access may be slow. I will share the server path together with the relevant fs env path and other related settings via DM. Thank you very much. |
I am continuing the previous contributor's work to open up the test code for
bdpy.mri.fmriprep.This PR builds directly on the previous test-workflow PR, so read against
devthe combined diff introduces the full test workflow plus the additions listed below.What the previous contributor did (detal in in the PR)
tests/mri/fmriprep/:test_fmriprep_mock.py(mock dataset, ~34 tests including acreate_bdata_fmriprepgolden-master comparison)test_fmriprep_real.py(real-data golden-master comparison against OpenNeurods006319)test_fmriprep_utils_mock.py(MockBidsBuilderand expected-data helpers)test_fmriprep_utils.py(shared config andRealDatasetMixin)README.mdandTEST_COVERAGE.md.scripts/mock/step_1_prepare_gm.sh,step_2_run_test.sh).scripts/real/step_1_download.shthroughstep_5_run_test.sh) wiring FreeSurfer, Docker, and fMRIPrep 1.2.1, with_fs_env.sh.examplefor machine-specific overrides..gitignorerules that track only.h5golden masters and the relevant scripts.What I did in this PR
Fixed
tests/evals/test_metrics.pyfailure on the base env (Python 3.8). Replaced 5self.assertTrue(np.array_equal(...))intest_2dandtest_2d_nanwithnp.testing.assert_allclose(rtol=1e-12, atol=0). The pickled expected values drift by ~1 ULP (1.1e-16–2.2e-16) from the current numpy/BLAS output;np.allclosealready passes, so this is FP-rounding version drift rather than an algorithm change. The implementation (bdpy/evals/metrics.py) and the pickle fixtures are untouched.Added value-level golden-master coverage for
surface_native. New testTestCreateBdataFmriprepMock.test_create_bdata_fmriprep_surface_native_gmwithwith_confounds=Trueandexclude={"session/run": [[1, 2], None]}, comparingVertexData,VertexLeft/VertexRight,vertex_index, motion, confounds, and label submeta. The previous surface tests were shape-only, so regressions inBrainData.__load_surfacecould pass undetected. New fixture:tests/data/mri/golden_master/mock/test_output_fmriprep_subject_surface.h5(~2 MB, regeneratable viaTEST_FMRIPREP_CREATE_GOLDEN_MASTER=1).Added mock coverage for the
split_task_label=Truebranch. New testtest_create_bdata_fmriprep_split_task_label_single_taskexercises the branch on the single-task mock dataset, assertingdata_labels == ["sub-0840_task-mock"]and that the per-task BData matches the existing non-split GM. Previously this branch was covered only by the real-data test.Extended
build_expected_bdata_after_exclude()intests/mri/fmriprep/test_fmriprep_utils_mock.pywith a backward-compatibledata_modeparameter. For surface modes it loads GIFTI via nibabel andhstacks left/right hemispheres to mirrorBrainData.__load_surface.Introduced a
real_datapytest marker. Declared inpyproject.tomlunder[tool.pytest.ini_options]and applied toTestCreateBdataFmriprepReal. CI can now usepytest -m "not real_data"for an explicit exclusion; the dynamicunittest.SkipTestfallback inRealDatasetMixin.setUpClassis preserved.Refreshed documentation and scripts.
./test/mri/fmriprep/README.md: added the new fixture and documentedpytest -m "not real_data"in the Real-Data Tests section../test/mri/fmriprep/TEST_COVERAGE.md: updatedLast updated, reclassified the surface_native and split_task_label gaps as covered.scripts/mock/step_1_prepare_gm.sh: tracks the new surface fixture.Verification:
pytest tests/evals/test_metrics.py -v→ 4 passed (Python 3.8 base env)pytest tests/mri/fmriprep/ -m "not real_data" -v→ 36 passed, 1 deselectedpytest tests/ -m "not slow and not real_data" -q→ 245 passed, 1 deselected, 0 failedKnown bugs and weak points, acknowledged but not yet fixed
Implementation-side bugs in
bdpy/mri/fmriprep.py(deferred to a separate PR):BrainData.__dtype is 'surface'/is not 'surface'identity comparisons (L388, L398). Works today by Python short-string interning, but is semantically wrong; will surface as warnings or break on future Python versions. Should be==/!=.del fmriprep.data[sub]/del fmriprep.data[sub][ses]during iteration of theOrderedDictincreate_bdata_fmriprep(L270–L279). Can raiseRuntimeError: dictionary changed size during iteration. Masked today because the mock dataset has a single subject anddelhappens at the end of iteration (see the multi-subject gap below — they hide each other).Test coverage gaps:
split_task_label=True(multiple elements inbdata_list) is covered only by the real-data test. Mock-side coverage requires extendingMockBidsBuilderto emit multipletask-*labels.MockBidsBuilderemits a single subject, so subject-iteration accumulation (last_run,last_block) and partial-subjectexcludeare not value-tested. This is what masks theOrderedDict-mutation bug above.surface_standard,surface_standard_41k,surface_standard_10kare still shape-only; value-level GM would catch path-specific regressions in__load_surfacefor thefsaverage*family.return_list=Falseunwrap path (bdpy/mri/fmriprep.pyL352–L360) has no explicit assertion; all tests usereturn_list=True.label_mapperloading,LabelMapper.dump(), and the non-unique-valueRuntimeError('Invalid label-value mapping')path are not directly asserted.MockBidsBuildertargets 1.2).