Clarify BData contracts for static analysis by ganow · Pull Request #134 · KamitaniLab/bdpy

ganow · 2026-06-08T11:21:38Z

Summary

This PR makes implicit bdpy.bdata contracts explicit so the module is easier to lint, type-check, and refactor safely.

The changes clarify the data model, selector evaluation, metadata/value-label handling, serialization assumptions, and utility merge semantics that bdpy.bdata already relies on, while preserving the existing public API.

Tests

Added focused tests for the assumptions made explicit by the refactor:

column-wise BData.add behavior and metadata scoped with where
full-dataset boolean masks from select(..., return_index=True)
both applyfunc callback result forms
HDF5 header/vmap roundtrip
MetaData missing/description-only behavior
vstack defaults and unsupported merge modes

Verified locally:

uv run pytest tests/bdata/test_bdata.py tests/bdata/test_metadata.py tests/bdata/test_utils.py
uv run mypy bdpy/bdata

Notes

uv run ruff check bdpy/bdata still reports two pre-existing/follow-up findings: boolean comparison in BData.select and zip() without strict= in show_metadata.

…reSelector, and MetaData classes

Represent FeatureSelector RPN tokens as their actual string sequence and introduce a SelectionOperand alias for values pushed onto the select evaluator stack. Use casts at stack pop sites to document the existing runtime assumptions without changing selection behavior.

Disambiguate header and vmap loop variables in HDF5 saving so mypy can infer their separate key types. Assert that inspect.currentframe() returns a frame before walking the call stack, and cast the metadata selection mask where NumPy indexing loses the ndarray return type.

Add explicit annotations for BData utility functions, including the vstack metadata merge strategy. Replace mutable default arguments with None-backed defaults, preserve existing metadata and vmap behavior with targeted casts, and chain metadata comparison assertion failures for clearer errors.

Copilot

Pull request overview

This PR makes implicit bdpy.bdata behavior explicit to support static analysis (typing/linting) while keeping the existing API surface largely intact, and adds targeted tests to lock in those clarified contracts.

Changes:

Added/expanded type annotations and narrowed return types across BData, MetaData, FeatureSelector, and bdata.utils.
Clarified selection/apply semantics (e.g., select(..., return_index=True) returning a full-dataset boolean mask; applyfunc tuple result reindexing).
Added focused tests covering vstack defaults/merge behavior, MetaData description-only updates, HDF5 header/vmap roundtrips, and warning behavior for obsolete APIs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`bdpy/bdata/bdata.py`	Adds typing/overloads and documents behavior; introduces explicit contracts in selection/metadata/vmap paths.
`bdpy/bdata/metadata.py`	Adds type aliases/overloads and refines `MetaData.set/get` behavior for static analysis.
`bdpy/bdata/utils.py`	Tightens `vstack/concat_dataset/metadata_equal` signatures and improves error chaining/casts.
`bdpy/bdata/featureselector.py`	Adds type annotations for tokenization/parsing structures.
`tests/bdata/test_bdata.py`	Adds tests for obsolete warnings, `select(return_index=True)`, `applyfunc` behaviors, and HDF5 roundtrip.
`tests/bdata/test_metadata.py`	Adds tests for description-only updates and unknown `get` field behavior.
`tests/bdata/test_utils.py`	Adds tests for `vstack(successive=None)` and unknown `metadata_merge` validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ganow · 2026-06-10T01:58:49Z

        else:
            # Add new metadata
            self.__key.append(key)
            self.__description.append(description)



This failure mode actually predates this PR (the add-new-key branch is unchanged except for a rename). However, since this PR added the Optional[...] annotation that officially advertises None as a valid input, clarifying this path is in scope. Fixed in 27d1236: set() now fails fast with ValueError before appending the key/description, so no partially-mutated state is left behind. Added a test (test_set_novalue_new_key) covering both the exception and the no-mutation guarantee.

ganow · 2026-06-10T01:59:01Z

        md = self.metadata.get(key, 'value')
+        assert md is not None, f"Meta-data key '{key}' not found."

-        if where is not None:
-            # Mask the metadata array with columns specified with `where`
-            ind = self.metadata.get(where, 'value') == 1
-            md = md[ind]
+        if where is None:
+            return md


The assert was problematic, but for a different reason: it was itself a behavior change introduced by this PR. On dev, a missing key silently returns None (when where is not given). Since this PR is meant to clarify contracts without changing behavior, I reverted to the historical contract instead of raising: in 27d1236, get_metadata is typed Optional[np.ndarray] and returns None for a missing key when where is None. Where the missing key already crashed on dev (the where-masking path, and update()), it now raises an explicit ValueError (matching the codebase convention, e.g. add_vmap) instead of an incidental TypeError. Overloads guarantee np.ndarray (non-Optional) when where is given. Tests updated/added accordingly.

ganow · 2026-06-10T01:59:04Z

        f = inspect.currentframe()
+        assert f is not None, "Failed to get current frame for call stack information."


Fixed in 27d1236. Replaced the assert with an explicit RuntimeError, so the behavior no longer depends on the -O flag.

KenyaOtsuka

I left two comments about inconsistencies between how the APIs are used in the tests and how they are annotated. Other than those points, I think the changes look good.

note: with the Python/dependency versions currently allowed by pyproject.toml, namely Python 3.6 with NumPy 1.19, importing numpy.typing would likely cause a runtime import error. Please make sure that the supported Python version is updated to 3.8+ before the next release(#120) .

…dling - Unify metadata value typing: introduce MetaDataValue alias shared by MetaData.set (as Optional) and BData.add_metadata, which accepts array-like values, not only np.ndarray - Type __to_unicode/__to_bytes with overloads documenting that non-str values (e.g. numeric header values) pass through unchanged - Move numpy.typing import and dependent aliases under TYPE_CHECKING to avoid a runtime dependency on numpy >= 1.21 - MetaData.set: fail fast with ValueError when value is None for a new key, instead of IndexError after partial mutation - get_metadata: drop the assert introduced earlier in this PR and restore the historical None return for missing keys; raise explicit ValueError where the missing key previously crashed (where-masking and update()) - save(): replace assert on inspect.currentframe() with an explicit RuntimeError Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ganow · 2026-06-10T01:59:10Z

@KenyaOtsuka Thanks for the note. Dropping Python <= 3.7 before the next release (#120) is confirmed on our side. That said, NDArray requires numpy >= 1.21 while our floor for newer Pythons is 1.20, so #120 alone would not fully close this. In 27d1236 I moved the numpy.typing import and the aliases depending on it under if TYPE_CHECKING:, which removes the runtime dependency on numpy.typing entirely.

KenyaOtsuka

LGTM. Thank you!

ganow added 8 commits June 8, 2026 18:40

refactor(bdata): enhance type hints and improve method documentation

e31b845

refactor(bdata): remove unnecessary blank line in imports

320cdea

refactor(bdata): improve type hints and documentation in BData, Featu…

e22e739

…reSelector, and MetaData classes

fix(bdata): keep add as current API

4caf306

test(bdata): cover typed BData metadata assumptions

62a84f9

ganow requested a review from Copilot June 8, 2026 11:22

Copilot started reviewing on behalf of ganow June 8, 2026 11:23 View session

ganow added the enhancement label Jun 8, 2026

ganow requested a review from KenyaOtsuka June 8, 2026 11:23

Copilot AI reviewed Jun 8, 2026

View reviewed changes

KenyaOtsuka reviewed Jun 9, 2026

View reviewed changes

Comment thread bdpy/bdata/bdata.py Outdated

Comment thread bdpy/bdata/bdata.py Outdated

KenyaOtsuka approved these changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify BData contracts for static analysis#134

Clarify BData contracts for static analysis#134
ganow wants to merge 9 commits into
KamitaniLab:devfrom
ganow:refactor/manual-lint-fix

ganow commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

ganow Jun 10, 2026

Uh oh!

ganow Jun 10, 2026

Uh oh!

ganow Jun 10, 2026

Uh oh!

KenyaOtsuka left a comment

Uh oh!

Uh oh!

Uh oh!

ganow commented Jun 10, 2026

Uh oh!

KenyaOtsuka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		f = inspect.currentframe()
		assert f is not None, "Failed to get current frame for call stack information."

Conversation

ganow commented Jun 8, 2026

Summary

Tests

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ganow Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

ganow Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

ganow Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

KenyaOtsuka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ganow commented Jun 10, 2026

Uh oh!

KenyaOtsuka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants