Skip to content

Fix $formatInteger word/Roman/letter formatting for values beyond the…#6

Merged
rayokota merged 1 commit into
masterfrom
fix-format-int
Jun 15, 2026
Merged

Fix $formatInteger word/Roman/letter formatting for values beyond the…#6
rayokota merged 1 commit into
masterfrom
fix-format-int

Conversation

@rayokota

Copy link
Copy Markdown
Owner

… int64 range

Summary

$formatInteger(1e46, 'w') (and other magnitudes above INT64_MAX) produced garbage or crashed instead of returning
"ten billion trillion trillion trillion".

The builtin coerced its argument with Utils::toLong, narrowing the double to int64_t. For values beyond the int64 range this is undefined behavior: on x86-64 it yields INT64_MIN (negative), on ARM64 it saturates to INT64_MAX. Either way the value then indexed out of bounds in the word tables (few[], decades[], magnitudes[]) — a silent junk read on release libc++/libstdc++, and a hard assertion abort on MSVC's checked iterators.

This case was previously hidden behind an ignoreError: true test override; it never actually computed the correct result on any platform.

Fix

Carry the value as a double all the way through the word/Roman/letter formatting path, matching the JSONata JS reference (which is the source of the test suite's expected value):

  • formatInteger builtin now reads the argument via Utils::toDouble instead of Utils::toLong.
  • formatInteger, numberToWords, and lookup signatures changed from int64_t to double.
  • lookup rewritten with double arithmetic (std::floor / std::fmod / std::pow); formatInteger floors the value up front (matching JS Math.floor) and renders the DECIMAL case via std::ostringstream with std::fixed/setprecision(0) to avoid scientific notation.
  • Removed the now-obsolete function-formatInteger/formatInteger.json_43 override so the case is verified for real.

Note on floating-point contraction

After the rewrite, the result was correct except for a spurious trailing remainder. The cause was FP contraction: at -O2, clang fuses num - mant * factor into a single full-precision FMA, whereas JS rounds mant * factor to a double before subtracting. Computing the product in its own statement (double product = mant * factor;) forces the intermediate rounding and reproduces the JS result exactly.

Compatibility note

The Java reference (jsonata-java) uses long throughout and cannot represent these magnitudes, so this is an intentional divergence from the Java port toward the JS reference for out-of-int64 values.

Testing

  • All 1655 generated tests pass, including the previously-overridden formatInteger case for 1e46.

… int64 range

## Summary

`$formatInteger(1e46, 'w')` (and other magnitudes above `INT64_MAX`)
produced garbage or crashed instead of returning
`"ten billion trillion trillion trillion"`.

The builtin coerced its argument with `Utils::toLong`, narrowing the
double to `int64_t`. For values beyond the int64 range this is **undefined
behavior**: on x86-64 it yields `INT64_MIN` (negative), on ARM64 it
saturates to `INT64_MAX`. Either way the value then indexed out of bounds
in the word tables (`few[]`, `decades[]`, `magnitudes[]`) — a silent junk
read on release libc++/libstdc++, and a hard assertion abort on MSVC's
checked iterators.

This case was previously hidden behind an `ignoreError: true` test
override; it never actually computed the correct result on any platform.

## Fix

Carry the value as a `double` all the way through the word/Roman/letter
formatting path, matching the JSONata **JS reference** (which is the source
of the test suite's expected value):

- `formatInteger` builtin now reads the argument via `Utils::toDouble`
  instead of `Utils::toLong`.
- `formatInteger`, `numberToWords`, and `lookup` signatures changed from
  `int64_t` to `double`.
- `lookup` rewritten with double arithmetic (`std::floor` / `std::fmod` /
  `std::pow`); `formatInteger` floors the value up front (matching JS
  `Math.floor`) and renders the DECIMAL case via `std::ostringstream`
  with `std::fixed`/`setprecision(0)` to avoid scientific notation.
- Removed the now-obsolete `function-formatInteger/formatInteger.json_43`
  override so the case is verified for real.

### Note on floating-point contraction

After the rewrite, the result was correct except for a spurious trailing
remainder. The cause was FP contraction: at `-O2`, clang fuses
`num - mant * factor` into a single full-precision FMA, whereas JS rounds
`mant * factor` to a double before subtracting. Computing the product in
its own statement (`double product = mant * factor;`) forces the
intermediate rounding and reproduces the JS result exactly.

## Compatibility note

The Java reference (`jsonata-java`) uses `long` throughout and cannot
represent these magnitudes, so this is an intentional divergence from the
Java port toward the JS reference for out-of-int64 values.

## Testing

- All 1655 generated tests pass, including the previously-overridden
  `formatInteger` case for `1e46`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rayokota rayokota merged commit db0bd7d into master Jun 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant