Clarify BData contracts for static analysis#134
Conversation
…reSelector, and MetaData classes
Represent FeatureSelector RPN tokens as their actual string sequence and introduce a SelectionOperand alias for values pushed onto the select evaluator stack. Use casts at stack pop sites to document the existing runtime assumptions without changing selection behavior.
Disambiguate header and vmap loop variables in HDF5 saving so mypy can infer their separate key types. Assert that inspect.currentframe() returns a frame before walking the call stack, and cast the metadata selection mask where NumPy indexing loses the ndarray return type.
Add explicit annotations for BData utility functions, including the vstack metadata merge strategy. Replace mutable default arguments with None-backed defaults, preserve existing metadata and vmap behavior with targeted casts, and chain metadata comparison assertion failures for clearer errors.
There was a problem hiding this comment.
Pull request overview
This PR makes implicit bdpy.bdata behavior explicit to support static analysis (typing/linting) while keeping the existing API surface largely intact, and adds targeted tests to lock in those clarified contracts.
Changes:
- Added/expanded type annotations and narrowed return types across
BData,MetaData,FeatureSelector, andbdata.utils. - Clarified selection/apply semantics (e.g.,
select(..., return_index=True)returning a full-dataset boolean mask;applyfunctuple result reindexing). - Added focused tests covering
vstackdefaults/merge behavior,MetaDatadescription-only updates, HDF5 header/vmap roundtrips, and warning behavior for obsolete APIs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
bdpy/bdata/bdata.py |
Adds typing/overloads and documents behavior; introduces explicit contracts in selection/metadata/vmap paths. |
bdpy/bdata/metadata.py |
Adds type aliases/overloads and refines MetaData.set/get behavior for static analysis. |
bdpy/bdata/utils.py |
Tightens vstack/concat_dataset/metadata_equal signatures and improves error chaining/casts. |
bdpy/bdata/featureselector.py |
Adds type annotations for tokenization/parsing structures. |
tests/bdata/test_bdata.py |
Adds tests for obsolete warnings, select(return_index=True), applyfunc behaviors, and HDF5 roundtrip. |
tests/bdata/test_metadata.py |
Adds tests for description-only updates and unknown get field behavior. |
tests/bdata/test_utils.py |
Adds tests for vstack(successive=None) and unknown metadata_merge validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| # Add new metadata | ||
| self.__key.append(key) | ||
| self.__description.append(description) | ||
|
|
There was a problem hiding this comment.
This failure mode actually predates this PR (the add-new-key branch is unchanged except for a rename). However, since this PR added the Optional[...] annotation that officially advertises None as a valid input, clarifying this path is in scope. Fixed in 27d1236: set() now fails fast with ValueError before appending the key/description, so no partially-mutated state is left behind. Added a test (test_set_novalue_new_key) covering both the exception and the no-mutation guarantee.
| md = self.metadata.get(key, 'value') | ||
| assert md is not None, f"Meta-data key '{key}' not found." | ||
|
|
||
| if where is not None: | ||
| # Mask the metadata array with columns specified with `where` | ||
| ind = self.metadata.get(where, 'value') == 1 | ||
| md = md[ind] | ||
| if where is None: | ||
| return md |
There was a problem hiding this comment.
The assert was problematic, but for a different reason: it was itself a behavior change introduced by this PR. On dev, a missing key silently returns None (when where is not given). Since this PR is meant to clarify contracts without changing behavior, I reverted to the historical contract instead of raising: in 27d1236, get_metadata is typed Optional[np.ndarray] and returns None for a missing key when where is None. Where the missing key already crashed on dev (the where-masking path, and update()), it now raises an explicit ValueError (matching the codebase convention, e.g. add_vmap) instead of an incidental TypeError. Overloads guarantee np.ndarray (non-Optional) when where is given. Tests updated/added accordingly.
| f = inspect.currentframe() | ||
| assert f is not None, "Failed to get current frame for call stack information." |
There was a problem hiding this comment.
Fixed in 27d1236. Replaced the assert with an explicit RuntimeError, so the behavior no longer depends on the -O flag.
KenyaOtsuka
left a comment
There was a problem hiding this comment.
I left two comments about inconsistencies between how the APIs are used in the tests and how they are annotated. Other than those points, I think the changes look good.
note: with the Python/dependency versions currently allowed by pyproject.toml, namely Python 3.6 with NumPy 1.19, importing numpy.typing would likely cause a runtime import error. Please make sure that the supported Python version is updated to 3.8+ before the next release(#120) .
…dling - Unify metadata value typing: introduce MetaDataValue alias shared by MetaData.set (as Optional) and BData.add_metadata, which accepts array-like values, not only np.ndarray - Type __to_unicode/__to_bytes with overloads documenting that non-str values (e.g. numeric header values) pass through unchanged - Move numpy.typing import and dependent aliases under TYPE_CHECKING to avoid a runtime dependency on numpy >= 1.21 - MetaData.set: fail fast with ValueError when value is None for a new key, instead of IndexError after partial mutation - get_metadata: drop the assert introduced earlier in this PR and restore the historical None return for missing keys; raise explicit ValueError where the missing key previously crashed (where-masking and update()) - save(): replace assert on inspect.currentframe() with an explicit RuntimeError Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@KenyaOtsuka Thanks for the note. Dropping Python <= 3.7 before the next release (#120) is confirmed on our side. That said, |
Summary
This PR makes implicit
bdpy.bdatacontracts explicit so the module is easier to lint, type-check, and refactor safely.The changes clarify the data model, selector evaluation, metadata/value-label handling, serialization assumptions, and utility merge semantics that
bdpy.bdataalready relies on, while preserving the existing public API.Tests
Added focused tests for the assumptions made explicit by the refactor:
BData.addbehavior and metadata scoped withwhereselect(..., return_index=True)applyfunccallback result formsMetaDatamissing/description-only behaviorvstackdefaults and unsupported merge modesVerified locally:
Notes
uv run ruff check bdpy/bdatastill reports two pre-existing/follow-up findings: boolean comparison inBData.selectandzip()withoutstrict=inshow_metadata.