Optimize timeseries metadata read and fix table size accounting#840
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves TsFile metadata handling by adding an offset-based fast path for reading timeseries metadata (avoiding redundant device-index lookups when the caller already has the measurement-node range), and fixes table size accounting for the first table during metadata index construction.
Changes:
- Added
readTimeseriesMetadataoverloads inTsFileSequenceReaderthat accept an optionallong[]measurement-node offset range to skip device-index searching. - Synchronized large metadata reads that use
tsFileInput.position(...)and fixed first-table initialization for table size accounting inTsFileIOWriter. - Added tests covering the iterator-provided offset path and the “first table has only one device” table size accounting case.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| java/tsfile/src/main/java/org/apache/tsfile/read/TsFileSequenceReader.java | Adds offset-based metadata read overloads and synchronizes some large metadata reads. |
| java/tsfile/src/main/java/org/apache/tsfile/write/writer/TsFileIOWriter.java | Initializes first table name early so first-table metadata size is accounted during table switches. |
| java/tsfile/src/test/java/org/apache/tsfile/read/TsFileDeviceIteratorTest.java | Adds a test reading timeseries metadata with/without iterator-provided measurement-node offsets. |
| java/tsfile/src/test/java/org/apache/tsfile/write/TsFileWriteApiTest.java | Adds a table size accounting test for a first table with a single device. |
Comments suppressed due to low confidence (1)
java/tsfile/src/main/java/org/apache/tsfile/read/TsFileSequenceReader.java:1701
- Only the single-boolean overload of
generateMetadataIndexUsingTsFileInputis synchronized, but the 2-boolean overload (which directly movestsFileInput.position) is still callable and is invoked directly elsewhere in this class (e.g., fromTimeseriesMetadataIterator). To actually make large-metadata reads thread-safe, the underlying 2-boolean method should also be synchronized (or all call sites should go through a synchronized wrapper).
private synchronized void generateMetadataIndexUsingTsFileInput(
IMetadataIndexEntry metadataIndex,
long start,
long end,
IDeviceID deviceId,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+748
to
752
| if (deviceMetadataIndexNodeOffset != null) { | ||
| buffer = | ||
| readData( | ||
| deviceMetadataIndexNodeOffset[0], deviceMetadataIndexNodeOffset[1], ioSizeConsumer); | ||
| try { |
Comment on lines
+984
to
988
| if (deviceMetadataIndexNodeOffset != null) { | ||
| buffer = | ||
| readData( | ||
| deviceMetadataIndexNodeOffset[0], deviceMetadataIndexNodeOffset[1], ioSizeRecorder); | ||
| try { |
Comment on lines
+123
to
+126
| TimeseriesMetadata metadataWithOffset = | ||
| reader.readTimeseriesMetadata( | ||
| currentDevice.getLeft(), deviceMetadataIndexNodeOffset, "s1", false, null); | ||
|
|
| writer.writeTable(tablet2); | ||
| tableSizeMap = writer.getIOWriter().getTableSizeMap(); | ||
| } | ||
| Assert.assertTrue(tableSizeMap.get("table1") > 1024); |
Comment on lines
+118
to
+120
| Pair<IDeviceID, Boolean> currentDevice = deviceIterator.next(); | ||
| long[] deviceMetadataIndexNodeOffset = deviceIterator.getCurrentDeviceMeasurementNodeOffset(); | ||
|
|
JackieTien97
approved these changes
Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR includes two related fixes around TsFile metadata handling.
First, it adds offset-based
readTimeseriesMetadataoverloads inTsFileSequenceReader. When callers already have the current device's measurement metadata node offset fromTsFileDeviceIterator, the reader can pass thelong[]offset directly and avoid searching the device metadata index again. Existing overloads continue to passnulland keep the original lookup behavior.Second, it fixes table size accounting in
TsFileIOWriter. The previous logic did not initialize the first table name before handling table switches, so metadata size for the first table could be missed, especially when the first table only had one device.Changes
readTimeseriesMetadataoverloads that accept a nullablelong[] deviceMetadataIndexNodeOffset.null.null.tsFileInputposition.TsFileIOWriter.TsFileDeviceIteratorTestcase that reads timeseries metadata with and without the iterator-provided offset.calculateTableSize2to cover table size accounting when the first table has only one device.Validation
Not run locally.