Skip to content

Clarify BData contracts for static analysis#134

Open
ganow wants to merge 9 commits into
KamitaniLab:devfrom
ganow:refactor/manual-lint-fix
Open

Clarify BData contracts for static analysis#134
ganow wants to merge 9 commits into
KamitaniLab:devfrom
ganow:refactor/manual-lint-fix

Conversation

@ganow

@ganow ganow commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR makes implicit bdpy.bdata contracts explicit so the module is easier to lint, type-check, and refactor safely.

The changes clarify the data model, selector evaluation, metadata/value-label handling, serialization assumptions, and utility merge semantics that bdpy.bdata already relies on, while preserving the existing public API.

Tests

Added focused tests for the assumptions made explicit by the refactor:

  • column-wise BData.add behavior and metadata scoped with where
  • full-dataset boolean masks from select(..., return_index=True)
  • both applyfunc callback result forms
  • HDF5 header/vmap roundtrip
  • MetaData missing/description-only behavior
  • vstack defaults and unsupported merge modes

Verified locally:

uv run pytest tests/bdata/test_bdata.py tests/bdata/test_metadata.py tests/bdata/test_utils.py
uv run mypy bdpy/bdata

Notes

uv run ruff check bdpy/bdata still reports two pre-existing/follow-up findings: boolean comparison in BData.select and zip() without strict= in show_metadata.

ganow added 8 commits June 8, 2026 18:40
Represent FeatureSelector RPN tokens as their actual string sequence and introduce a SelectionOperand alias for values pushed onto the select evaluator stack. Use casts at stack pop sites to document the existing runtime assumptions without changing selection behavior.
Disambiguate header and vmap loop variables in HDF5 saving so mypy can infer their separate key types. Assert that inspect.currentframe() returns a frame before walking the call stack, and cast the metadata selection mask where NumPy indexing loses the ndarray return type.
Add explicit annotations for BData utility functions, including the vstack metadata merge strategy. Replace mutable default arguments with None-backed defaults, preserve existing metadata and vmap behavior with targeted casts, and chain metadata comparison assertion failures for clearer errors.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes implicit bdpy.bdata behavior explicit to support static analysis (typing/linting) while keeping the existing API surface largely intact, and adds targeted tests to lock in those clarified contracts.

Changes:

  • Added/expanded type annotations and narrowed return types across BData, MetaData, FeatureSelector, and bdata.utils.
  • Clarified selection/apply semantics (e.g., select(..., return_index=True) returning a full-dataset boolean mask; applyfunc tuple result reindexing).
  • Added focused tests covering vstack defaults/merge behavior, MetaData description-only updates, HDF5 header/vmap roundtrips, and warning behavior for obsolete APIs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
bdpy/bdata/bdata.py Adds typing/overloads and documents behavior; introduces explicit contracts in selection/metadata/vmap paths.
bdpy/bdata/metadata.py Adds type aliases/overloads and refines MetaData.set/get behavior for static analysis.
bdpy/bdata/utils.py Tightens vstack/concat_dataset/metadata_equal signatures and improves error chaining/casts.
bdpy/bdata/featureselector.py Adds type annotations for tokenization/parsing structures.
tests/bdata/test_bdata.py Adds tests for obsolete warnings, select(return_index=True), applyfunc behaviors, and HDF5 roundtrip.
tests/bdata/test_metadata.py Adds tests for description-only updates and unknown get field behavior.
tests/bdata/test_utils.py Adds tests for vstack(successive=None) and unknown metadata_merge validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bdpy/bdata/metadata.py
Comment on lines 119 to 123
else:
# Add new metadata
self.__key.append(key)
self.__description.append(description)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failure mode actually predates this PR (the add-new-key branch is unchanged except for a rename). However, since this PR added the Optional[...] annotation that officially advertises None as a valid input, clarifying this path is in scope. Fixed in 27d1236: set() now fails fast with ValueError before appending the key/description, so no partially-mutated state is left behind. Added a test (test_set_novalue_new_key) covering both the exception and the no-mutation guarantee.

Comment thread bdpy/bdata/bdata.py
Comment on lines 677 to +681
md = self.metadata.get(key, 'value')
assert md is not None, f"Meta-data key '{key}' not found."

if where is not None:
# Mask the metadata array with columns specified with `where`
ind = self.metadata.get(where, 'value') == 1
md = md[ind]
if where is None:
return md

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert was problematic, but for a different reason: it was itself a behavior change introduced by this PR. On dev, a missing key silently returns None (when where is not given). Since this PR is meant to clarify contracts without changing behavior, I reverted to the historical contract instead of raising: in 27d1236, get_metadata is typed Optional[np.ndarray] and returns None for a missing key when where is None. Where the missing key already crashed on dev (the where-masking path, and update()), it now raises an explicit ValueError (matching the codebase convention, e.g. add_vmap) instead of an incidental TypeError. Overloads guarantee np.ndarray (non-Optional) when where is given. Tests updated/added accordingly.

Comment thread bdpy/bdata/bdata.py Outdated
Comment on lines +803 to +804
f = inspect.currentframe()
assert f is not None, "Failed to get current frame for call stack information."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 27d1236. Replaced the assert with an explicit RuntimeError, so the behavior no longer depends on the -O flag.

@KenyaOtsuka KenyaOtsuka left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two comments about inconsistencies between how the APIs are used in the tests and how they are annotated. Other than those points, I think the changes look good.

note: with the Python/dependency versions currently allowed by pyproject.toml, namely Python 3.6 with NumPy 1.19, importing numpy.typing would likely cause a runtime import error. Please make sure that the supported Python version is updated to 3.8+ before the next release(#120) .

Comment thread bdpy/bdata/bdata.py Outdated
Comment thread bdpy/bdata/bdata.py Outdated
…dling

- Unify metadata value typing: introduce MetaDataValue alias shared by
  MetaData.set (as Optional) and BData.add_metadata, which accepts
  array-like values, not only np.ndarray
- Type __to_unicode/__to_bytes with overloads documenting that non-str
  values (e.g. numeric header values) pass through unchanged
- Move numpy.typing import and dependent aliases under TYPE_CHECKING to
  avoid a runtime dependency on numpy >= 1.21
- MetaData.set: fail fast with ValueError when value is None for a new
  key, instead of IndexError after partial mutation
- get_metadata: drop the assert introduced earlier in this PR and
  restore the historical None return for missing keys; raise explicit
  ValueError where the missing key previously crashed (where-masking
  and update())
- save(): replace assert on inspect.currentframe() with an explicit
  RuntimeError

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ganow

ganow commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

@KenyaOtsuka Thanks for the note. Dropping Python <= 3.7 before the next release (#120) is confirmed on our side. That said, NDArray requires numpy >= 1.21 while our floor for newer Pythons is 1.20, so #120 alone would not fully close this. In 27d1236 I moved the numpy.typing import and the aliases depending on it under if TYPE_CHECKING:, which removes the runtime dependency on numpy.typing entirely.

@KenyaOtsuka KenyaOtsuka left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants