Skip to content

BUG: to_hdf to a key deletes nested keys stored beneath it (GH-17267)#65781

Draft
jbrockmendel wants to merge 1 commit into
pandas-dev:mainfrom
jbrockmendel:bug-17267
Draft

BUG: to_hdf to a key deletes nested keys stored beneath it (GH-17267)#65781
jbrockmendel wants to merge 1 commit into
pandas-dev:mainfrom
jbrockmendel:bug-17267

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

closes #17267

Writing a pandas object to an HDF5 key with DataFrame.to_hdf / HDFStore.put silently deleted any keys nested beneath that key:

pd.Series(np.zeros(20)).to_hdf("t.h5", key="All/atest")   # -> ['/All/atest']
pd.Series(np.zeros(10)).to_hdf("t.h5", key="All")         # -> ['/All']   <- /All/atest gone

HDFStore._identify_group overwrote an existing key by calling remove_node(group, recursive=True), which nukes the entire subtree — including child keys, which are separate pandas objects. This is distinct from the documented recursive behavior of HDFStore.remove / del store[key] (those still work as documented); a plain put clobbering a sibling key is an unintended data-loss path.

Since storers only ever write leaf nodes under their group (never child groups), any child group under a key is necessarily another key. The fix removes only the data nodes (and resets stale attributes) of the object stored at the target key when nested keys are present, leaving those keys intact. When there are no nested keys, the existing recursive-remove path is unchanged, so no orphan nodes are left behind on a normal overwrite.

Tested across fixed/table formats, repeated overwrites, and format/dtype changes of the parent key.

HDFStore.put (and DataFrame.to_hdf) overwrote an existing key by
recursively removing its group, which also deleted any keys nested
underneath it. Now only the data nodes of the object stored at the key
are removed when the group has child keys, leaving the nested keys
intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added Bug IO HDF5 read_hdf, HDFStore labels Jun 2, 2026
@jbrockmendel jbrockmendel marked this pull request as draft June 2, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug IO HDF5 read_hdf, HDFStore

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'A/B' type of keys disappear after store 'A' key using to)_hdf command

1 participant