BUG: to_hdf to a key deletes nested keys stored beneath it (GH-17267)#65781
Draft
jbrockmendel wants to merge 1 commit into
Draft
BUG: to_hdf to a key deletes nested keys stored beneath it (GH-17267)#65781jbrockmendel wants to merge 1 commit into
jbrockmendel wants to merge 1 commit into
Conversation
HDFStore.put (and DataFrame.to_hdf) overwrote an existing key by recursively removing its group, which also deleted any keys nested underneath it. Now only the data nodes of the object stored at the key are removed when the group has child keys, leaving the nested keys intact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #17267
Writing a pandas object to an HDF5 key with
DataFrame.to_hdf/HDFStore.putsilently deleted any keys nested beneath that key:HDFStore._identify_groupoverwrote an existing key by callingremove_node(group, recursive=True), which nukes the entire subtree — including child keys, which are separate pandas objects. This is distinct from the documented recursive behavior ofHDFStore.remove/del store[key](those still work as documented); a plainputclobbering a sibling key is an unintended data-loss path.Since storers only ever write leaf nodes under their group (never child groups), any child group under a key is necessarily another key. The fix removes only the data nodes (and resets stale attributes) of the object stored at the target key when nested keys are present, leaving those keys intact. When there are no nested keys, the existing recursive-remove path is unchanged, so no orphan nodes are left behind on a normal overwrite.
Tested across fixed/table formats, repeated overwrites, and format/dtype changes of the parent key.