Skip to content

docs: reflect codegen dispatch fallback in expression compatibility guide#4649

Merged
andygrove merged 1 commit into
apache:mainfrom
andygrove:update-compat-guide-codegen-dispatch
Jun 14, 2026
Merged

docs: reflect codegen dispatch fallback in expression compatibility guide#4649
andygrove merged 1 commit into
apache:mainfrom
andygrove:update-compat-guide-codegen-dispatch

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

The expression compatibility guide is auto-generated by GenerateDocs.scala. For every Incompatible expression it printed a sentence like:

The following incompatibilities cause Second to fall back to Spark by default. Set spark.comet.expression.Second.allowIncompatible=true to enable Comet acceleration despite these differences.

This is no longer accurate for expressions that opt into the CodegenDispatchFallback trait. Those expressions do not fall back to Spark by default. They stay in Comet's native pipeline via the JVM codegen dispatcher (running Spark's own generated code) and match Spark exactly. For these expressions, allowIncompatible=true switches to the faster native implementation that carries the listed differences, rather than enabling Comet acceleration that was otherwise disabled.

What changes are included in this PR?

  • GenerateDocs.scala now emits different prose for expressions enrolled in codegen-dispatch fallback. Those expressions document that Comet accelerates them by default via JVM codegen dispatch (Spark-compatible), and that allowIncompatible=true opts into the faster native path with the listed differences. Non-dispatch expressions (such as Cast, SortArray, CollectSet) keep the original "fall back to Spark by default" wording, which remains accurate for them.
  • Refactored the grown CategoryNotes tuple into an ExprNotes case class. Aggregate serdes use a separate builder since CometAggregateExpressionSerde is not a subtype of CometExpressionSerde and never participates in codegen dispatch.
  • Updated the general intro sentence on the expression compatibility index pages (expressions/index.md and the four per-version spark-{3.4,3.5,4.0,4.1}/index.md pages), which carried the same inaccuracy.

How are these changes tested?

This is a documentation generation change. I ran GenerateDocs against a temporary copy of the templates for the Spark 3.5 profile and confirmed both prose variants render correctly: codegen-dispatch expressions (for example Second) get the new wording, and non-dispatch expressions (for example SortArray, CollectSet) retain the original wording. The Spark module compiles and the changed files pass spotless and prettier.

@andygrove andygrove marked this pull request as ready for review June 13, 2026 13:47
@andygrove andygrove added this to the 0.17.0 milestone Jun 13, 2026

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove
1 nit: looks like we have some duplications for expression compat guide, WDYT about combining them to bigger sections like Spark 3, Spark 4?

@andygrove

Copy link
Copy Markdown
Member Author

Thanks @andygrove 1 nit: looks like we have some duplications for expression compat guide, WDYT about combining them to bigger sections like Spark 3, Spark 4?

The challenge is that there are differences between Spark 4.0 and 4.1.

@andygrove andygrove merged commit fdc965a into apache:main Jun 14, 2026
16 checks passed
@andygrove andygrove deleted the update-compat-guide-codegen-dispatch branch June 14, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants