Fix $formatInteger word/Roman/letter formatting for values beyond the…#6
Merged
Conversation
… int64 range ## Summary `$formatInteger(1e46, 'w')` (and other magnitudes above `INT64_MAX`) produced garbage or crashed instead of returning `"ten billion trillion trillion trillion"`. The builtin coerced its argument with `Utils::toLong`, narrowing the double to `int64_t`. For values beyond the int64 range this is **undefined behavior**: on x86-64 it yields `INT64_MIN` (negative), on ARM64 it saturates to `INT64_MAX`. Either way the value then indexed out of bounds in the word tables (`few[]`, `decades[]`, `magnitudes[]`) — a silent junk read on release libc++/libstdc++, and a hard assertion abort on MSVC's checked iterators. This case was previously hidden behind an `ignoreError: true` test override; it never actually computed the correct result on any platform. ## Fix Carry the value as a `double` all the way through the word/Roman/letter formatting path, matching the JSONata **JS reference** (which is the source of the test suite's expected value): - `formatInteger` builtin now reads the argument via `Utils::toDouble` instead of `Utils::toLong`. - `formatInteger`, `numberToWords`, and `lookup` signatures changed from `int64_t` to `double`. - `lookup` rewritten with double arithmetic (`std::floor` / `std::fmod` / `std::pow`); `formatInteger` floors the value up front (matching JS `Math.floor`) and renders the DECIMAL case via `std::ostringstream` with `std::fixed`/`setprecision(0)` to avoid scientific notation. - Removed the now-obsolete `function-formatInteger/formatInteger.json_43` override so the case is verified for real. ### Note on floating-point contraction After the rewrite, the result was correct except for a spurious trailing remainder. The cause was FP contraction: at `-O2`, clang fuses `num - mant * factor` into a single full-precision FMA, whereas JS rounds `mant * factor` to a double before subtracting. Computing the product in its own statement (`double product = mant * factor;`) forces the intermediate rounding and reproduces the JS result exactly. ## Compatibility note The Java reference (`jsonata-java`) uses `long` throughout and cannot represent these magnitudes, so this is an intentional divergence from the Java port toward the JS reference for out-of-int64 values. ## Testing - All 1655 generated tests pass, including the previously-overridden `formatInteger` case for `1e46`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… int64 range
Summary
$formatInteger(1e46, 'w')(and other magnitudes aboveINT64_MAX) produced garbage or crashed instead of returning"ten billion trillion trillion trillion".The builtin coerced its argument with
Utils::toLong, narrowing the double toint64_t. For values beyond the int64 range this is undefined behavior: on x86-64 it yieldsINT64_MIN(negative), on ARM64 it saturates toINT64_MAX. Either way the value then indexed out of bounds in the word tables (few[],decades[],magnitudes[]) — a silent junk read on release libc++/libstdc++, and a hard assertion abort on MSVC's checked iterators.This case was previously hidden behind an
ignoreError: truetest override; it never actually computed the correct result on any platform.Fix
Carry the value as a
doubleall the way through the word/Roman/letter formatting path, matching the JSONata JS reference (which is the source of the test suite's expected value):formatIntegerbuiltin now reads the argument viaUtils::toDoubleinstead ofUtils::toLong.formatInteger,numberToWords, andlookupsignatures changed fromint64_ttodouble.lookuprewritten with double arithmetic (std::floor/std::fmod/std::pow);formatIntegerfloors the value up front (matching JSMath.floor) and renders the DECIMAL case viastd::ostringstreamwithstd::fixed/setprecision(0)to avoid scientific notation.function-formatInteger/formatInteger.json_43override so the case is verified for real.Note on floating-point contraction
After the rewrite, the result was correct except for a spurious trailing remainder. The cause was FP contraction: at
-O2, clang fusesnum - mant * factorinto a single full-precision FMA, whereas JS roundsmant * factorto a double before subtracting. Computing the product in its own statement (double product = mant * factor;) forces the intermediate rounding and reproduces the JS result exactly.Compatibility note
The Java reference (
jsonata-java) useslongthroughout and cannot represent these magnitudes, so this is an intentional divergence from the Java port toward the JS reference for out-of-int64 values.Testing
formatIntegercase for1e46.