Skip to content

chore: refactor CI to have centralized SBT action#4643

Merged
comphead merged 13 commits into
apache:mainfrom
comphead:chore
Jun 13, 2026
Merged

chore: refactor CI to have centralized SBT action#4643
comphead merged 13 commits into
apache:mainfrom
comphead:chore

Conversation

@comphead

@comphead comphead commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #.

CI: split Spark SBT compile from test execution

Problem

sql_core-* matrix entries randomly SIGKILL on Spark 4.1.2 runners with Killed (no JVM stack) — the kernel/container OOM killer, not -Xmx exhaustion. Each matrix entry runs sbt -mem 3072 testOnly *, so SBT pays the full Scala/Zinc compile heap (~2.5–3 GB peak) and then orchestrates the forked test JVM, on top of Comet's native off-heap pool, on a 7 GB hosted runner. The budget is at the
edge; cumulative allocation patterns push it over non-deterministically.

Change

  • New single build job per Spark version: builds libcomet.so (cargo --profile ci) and runs sbt -mem 3072 catalyst/Test/compile sql/Test/compile hive/Test/compile once. Uploads two
    artifacts:
    • native-lib-linux (libcomet.so, ~50 MB)
    • jvm-compiled-spark-<full>-jdk<N> (apache-spark.tar.gz, sources + target/ + Zinc state, ~500 MB–1 GB, mtimes preserved via tar -czpf)
  • Matrix spark-sql-test entries needs: build, download + extract both, call setup-spark-builder with skip-spark-clone: true (only re-runs the mvn install of Comet's JAR), then sbt -mem 1536 testOnly *. Zinc verifies "no source changed" and skips compile.
  • setup-spark-builder gains a skip-spark-clone input to gate the Spark checkout + diff apply when sources are pre-staged from the artifact.

Effect

  • SBT heap per matrix runner: 3072 → 1536 MB, freeing ~1.5 GB of runner headroom — the budget gap that was producing the OOM kills.
  • Spark compile runs once per Spark version instead of seven times per matrix.
  • One fewer runner per Spark version: was build-native + 7×(compile+test) = 8; now build (native + compile) + 7×test = 8 — same job count, but the heavy compile is amortized.

@comphead comphead marked this pull request as draft June 12, 2026 20:50
@comphead comphead changed the title chore: set default value for spark.comet.memoryOverhead in tests chore: refactor CI to have centralized SBT action Jun 13, 2026
@comphead comphead marked this pull request as ready for review June 13, 2026 15:52
@comphead comphead merged commit d926e21 into apache:main Jun 13, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants