fix: correct stale unix_timestamp NTZ and date_format codegen-default doc text (#4502)#4645
Open
andygrove wants to merge 2 commits into
Open
fix: correct stale unix_timestamp NTZ and date_format codegen-default doc text (#4502)#4645andygrove wants to merge 2 commits into
andygrove wants to merge 2 commits into
Conversation
…default note CometUnixTimestamp.getUnsupportedReasons claimed TimestampNTZType is unsupported because Comet incorrectly applies timezone conversion, but the native unix_timestamp path has a dedicated TimestampNTZ branch that divides raw microseconds without any timezone conversion, matching Spark. Align the reason text with isSupportedInputType and getSupportLevel, both of which already treat TimestampNTZType as supported. CometDateFormat.getCompatibleNotes described the codegen dispatcher as disabled by default; spark.comet.exec.scalaUDF.codegen.enabled now defaults to true. Correct the note.
…bug panic The native date_diff subtracted Date32 day values with a plain i32 -, which panics in debug builds when the result overflows i32. Spark computes the day difference with JVM int subtraction, which wraps. Switch to wrapping_sub to match Spark and remove the debug-only panic path, and add regression tests covering basic and overflow inputs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #4502 (date/time expression audit follow-ups). The remaining open work, gating non-default
StringTypeWithCollationinputs on Spark 4.0 datetime expressions (item 5), is carved out into #4646.Rationale for this change
The high-priority items in #4502 break down as follows after the codegen-dispatch migration and
spark.comet.exec.scalaUDF.codegen.enableddefaulting totrue:CometUnixTimestamp.getUnsupportedReasonsclaimedTimestampNTZTypeis unsupported "because Comet incorrectly applies timezone conversion to TimestampNTZ values". The nativeunix_timestamppath has a dedicatedTimestamp(Microsecond, None)branch that divides raw microseconds by1_000_000with no timezone conversion, matching Spark (ToTimestamp.evaldoes rawmicros / MICROS_PER_SECONDfor bothTimestampTypeandTimestampNTZType). The predicate (isSupportedInputType) andgetSupportLevelalready treatTimestampNTZTypeas supported; only the reason text was out of date.CometDateFormat.getCompatibleNotesdescribed the codegen dispatcher as "disabled (default)". The flag now defaults totrue, so the note was inaccurate.The native
date_diffsubtracted Date32 day values with a plaini32 -, which panics in debug builds when the result overflows. Spark computes the difference with JVM int subtraction, which wraps. This is practically unreachable for validDateTypeinputs but is an unmasked panic path.Items 2, 3, and 4 are obsolete after the codegen-dispatch migration and are intentionally not changed (details below).
What changes are included in this PR?
CometUnixTimestamp.getUnsupportedReasonsnow reads "OnlyDateType,TimestampType, andTimestampNTZTypeinputs are supported.", consistent withisSupportedInputTypeandgetSupportLevel.CometDateFormat.getCompatibleNotesnow states thatspark.comet.exec.scalaUDF.codegen.enabled=trueis the default.date_diffuseswrapping_subto match Spark's wrapping int subtraction and remove the debug-only panic path, with regression tests for basic and overflow inputs.Intentionally not changed (obsolete after the codegen-dispatch migration and the
scalaUDF.codegen.enableddefault flipping totrue):CometDateFormatgating intogetSupportLevel): marking non-UTC + whitelisted formatsIncompatiblewould now route them to full Spark fallback under the default config instead of through the dispatcher, which produces Spark-identical results. That is a regression, not a fix.Incompatible/ surface the dispatcher flag ingetCompatibleNotes):CometCodegenDispatchdeliberately reportsCompatible()and omits the flag note, which is accurate now that the dispatcher is on by default. When it is disabled,convertreturnsNonewith awithFallbackReasonthat shows up in EXPLAIN. The only stale artifact was theCometDateFormatnote fixed above.CometMakeTimestampANSI):CometMakeTimestampis nowCometCodegenDispatch[MakeTimestamp], so Spark's owndoGenCode(which honorsfailOnError) runs in the kernel and throws under ANSI mode; when the dispatcher is disabled it falls back to Spark. ANSI behavior is correct in both configurations.How are these changes tested?
getUnsupportedReasons/getCompatibleNotescorrections are EXPLAIN/compatibility-doc text with no behavior change. The nativeunix_timestampTimestampNTZ path the corrected text describes is covered by existing Rust unit tests innative/spark-expr/src/datetime_funcs/unix_timestamp.rs.mvnw compileis clean for thesparkmodule.date_diffchange addstest_date_diff_basicandtest_date_diff_wraps_on_overflowinnative/spark-expr/src/datetime_funcs/date_diff.rs, both passing.cargo buildandcargo clippy -D warningsare clean for thedatafusion-comet-spark-exprcrate.