Skip to content

Add Spark connector for TsFile table model#845

Open
DaZuiZui wants to merge 9 commits into
apache:developfrom
DaZuiZui:feat/spark-tsfile-table-connector
Open

Add Spark connector for TsFile table model#845
DaZuiZui wants to merge 9 commits into
apache:developfrom
DaZuiZui:feat/spark-tsfile-table-connector

Conversation

@DaZuiZui

Copy link
Copy Markdown

Motivation

Closes #843.

Changes

  • Add the java/spark-tsfile Maven module and register it under the Java build.
  • Implement a Spark 3.x DataSource V2 connector with the short name tsfile for TsFile table model batch reads and writes.
  • Reuse existing TsFile Java APIs for table metadata, query execution, table schema registration, tablet writes, and TsFile writing.
  • Support table selection, single-table inference, multi-file schema compatibility checks, sparse FIELD nulls, column pruning, and basic time/TAG predicate pushdown.
  • Support DataFrame writes to table-model TsFile directories, with one part-*.tsfile per non-empty Spark task partition.
  • Harden append writes so output file names include the Spark query id, commit refuses to overwrite existing files, and abort only removes temporary files.
  • Document the initial local-file-only scope and current follow-up areas.

Tests

  • JAVA_HOME=$(/usr/libexec/java_home -v 21) ./mvnw -P with-java -pl java/spark-tsfile -am test
  • Spark connector targeted read/write smoke tests for round trip, append twice, SQL temporary view, external schema validation, and partitioned writes.

Notes

The initial connector intentionally supports local TsFile paths only. Non-local Hadoop filesystems, schema merging, broader predicate pushdown, streaming semantics, expanded categories/types, and ResourceBundle-backed connector exception messages should be handled in follow-up changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add Spark TsFile connector support for the TsFile table model

1 participant