Skip to content

Optimize timeseries metadata read and fix table size accounting#840

Merged
JackieTien97 merged 6 commits into
apache:developfrom
shuwenwei:readTimeseriesMetadata
Jun 16, 2026
Merged

Optimize timeseries metadata read and fix table size accounting#840
JackieTien97 merged 6 commits into
apache:developfrom
shuwenwei:readTimeseriesMetadata

Conversation

@shuwenwei

@shuwenwei shuwenwei commented Jun 12, 2026

Copy link
Copy Markdown
Member

This PR includes two related fixes around TsFile metadata handling.

First, it adds offset-based readTimeseriesMetadata overloads in TsFileSequenceReader. When callers already have the current device's measurement metadata node offset from TsFileDeviceIterator, the reader can pass the long[] offset directly and avoid searching the device metadata index again. Existing overloads continue to pass null and keep the original lookup behavior.

Second, it fixes table size accounting in TsFileIOWriter. The previous logic did not initialize the first table name before handling table switches, so metadata size for the first table could be missed, especially when the first table only had one device.

Changes

  • Add readTimeseriesMetadata overloads that accept a nullable long[] deviceMetadataIndexNodeOffset.
  • Use the provided device measurement metadata node offset as a fast path when it is not null.
  • Keep existing overloads unchanged by forwarding them with null.
  • Synchronize large metadata reads that move the shared tsFileInput position.
  • Initialize the first table name during metadata index construction in TsFileIOWriter.
  • Add a TsFileDeviceIteratorTest case that reads timeseries metadata with and without the iterator-provided offset.
  • Add calculateTableSize2 to cover table size accounting when the first table has only one device.

Validation

Not run locally.

@shuwenwei shuwenwei changed the title Support reading timeseries metadata with device metadata offset Optimize timeseries metadata read and fix table size accounting Jun 12, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves TsFile metadata handling by adding an offset-based fast path for reading timeseries metadata (avoiding redundant device-index lookups when the caller already has the measurement-node range), and fixes table size accounting for the first table during metadata index construction.

Changes:

  • Added readTimeseriesMetadata overloads in TsFileSequenceReader that accept an optional long[] measurement-node offset range to skip device-index searching.
  • Synchronized large metadata reads that use tsFileInput.position(...) and fixed first-table initialization for table size accounting in TsFileIOWriter.
  • Added tests covering the iterator-provided offset path and the “first table has only one device” table size accounting case.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
java/tsfile/src/main/java/org/apache/tsfile/read/TsFileSequenceReader.java Adds offset-based metadata read overloads and synchronizes some large metadata reads.
java/tsfile/src/main/java/org/apache/tsfile/write/writer/TsFileIOWriter.java Initializes first table name early so first-table metadata size is accounted during table switches.
java/tsfile/src/test/java/org/apache/tsfile/read/TsFileDeviceIteratorTest.java Adds a test reading timeseries metadata with/without iterator-provided measurement-node offsets.
java/tsfile/src/test/java/org/apache/tsfile/write/TsFileWriteApiTest.java Adds a table size accounting test for a first table with a single device.
Comments suppressed due to low confidence (1)

java/tsfile/src/main/java/org/apache/tsfile/read/TsFileSequenceReader.java:1701

  • Only the single-boolean overload of generateMetadataIndexUsingTsFileInput is synchronized, but the 2-boolean overload (which directly moves tsFileInput.position) is still callable and is invoked directly elsewhere in this class (e.g., from TimeseriesMetadataIterator). To actually make large-metadata reads thread-safe, the underlying 2-boolean method should also be synchronized (or all call sites should go through a synchronized wrapper).
  private synchronized void generateMetadataIndexUsingTsFileInput(
      IMetadataIndexEntry metadataIndex,
      long start,
      long end,
      IDeviceID deviceId,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +748 to 752
if (deviceMetadataIndexNodeOffset != null) {
buffer =
readData(
deviceMetadataIndexNodeOffset[0], deviceMetadataIndexNodeOffset[1], ioSizeConsumer);
try {
Comment on lines +984 to 988
if (deviceMetadataIndexNodeOffset != null) {
buffer =
readData(
deviceMetadataIndexNodeOffset[0], deviceMetadataIndexNodeOffset[1], ioSizeRecorder);
try {
Comment on lines +123 to +126
TimeseriesMetadata metadataWithOffset =
reader.readTimeseriesMetadata(
currentDevice.getLeft(), deviceMetadataIndexNodeOffset, "s1", false, null);

writer.writeTable(tablet2);
tableSizeMap = writer.getIOWriter().getTableSizeMap();
}
Assert.assertTrue(tableSizeMap.get("table1") > 1024);
Comment on lines +118 to +120
Pair<IDeviceID, Boolean> currentDevice = deviceIterator.next();
long[] deviceMetadataIndexNodeOffset = deviceIterator.getCurrentDeviceMeasurementNodeOffset();

@JackieTien97 JackieTien97 merged commit dfd3284 into apache:develop Jun 16, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants