<i18n dev> RFR: 8331485: Odd Results when Parsing Scientific Notation with Large Exponent [v2]

Sun May 5 21:00:53 UTC 2024

On Fri, 3 May 2024 18:29:23 GMT, Justin Lu <jlu at openjdk.org> wrote:

>> Please review this PR which corrects an edge case bug for java.text.DecimalFormat that causes incorrect parsing results for strings with very large exponent values.
>> 
>> When parsing values with large exponents, if the value of the exponent exceeds `Integer.MAX_VALUE`, the parsed value  is equal to 0. If the value of the exponent exceeds `Long.MAX_VALUE`, the parsed value is equal to the mantissa. Both results are confusing and incorrect.
>> 
>> For example,
>> 
>> 
>> NumberFormat fmt = NumberFormat.getInstance(Locale.US);
>> fmt.parse(".1E2147483648"); // returns 0.0
>> fmt.parse(".1E9223372036854775808"); // returns 0.1
>> // For comparison
>> Double.parseDouble(".1E2147483648"); // returns Infinity
>> Double.parseDouble(".1E9223372036854775808"); // returns Infinity
>> 
>> 
>> After this change, both parse calls return `Double.POSITIVE_INFINITY` now.
>
> Justin Lu has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - correct other test comment
>  - reflect review

Hello, I filed [JDK-8331485](https://bugs.openjdk.org/browse/JDK-8331485). Thank you for addressing this bug so quickly.

I have a thought/concern regarding the handling of exponents that exceed `Long.MAX_VALUE` in this PR:

> If the value of the exponent exceeds `Long.MAX_VALUE`, the parsed value is equal to the mantissa. Both results are confusing and incorrect.
> 
> For example,
> 
> ```
> NumberFormat fmt = NumberFormat.getInstance(Locale.US);
> fmt.parse(".1E2147483648"); // returns 0.0
> fmt.parse(".1E9223372036854775808"); // returns 0.1
> // For comparison
> Double.parseDouble(".1E2147483648"); // returns Infinity
> Double.parseDouble(".1E9223372036854775808"); // returns Infinity
> ```

The method [`parse(String, ParsePosition)`](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/text/DecimalFormat.html#parse(java.lang.String,java.text.ParsePosition)) uses the [`ParsePosition`](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/text/ParsePosition.html) object as an input and output parameter to determine at what position the parsing should start as well as to communicate up to which position the input string has been consumed during the parsing. (This can be very handy if you use different [`Format`](https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/text/Format.html)s to parse through a string.)

For example, if there is a method like this

static void parseNumber(String s) {
    NumberFormat numberFormat = NumberFormat.getInstance(Locale.US);
    ParsePosition parsePosition = new ParsePosition(0);
    Number parseResult = numberFormat.parse(s, parsePosition);
    System.out.println(STR."numberFormat.parse("{s}") -> {parseResult}; parsePosition: {parsePosition}");
}

`parseNumber("0.123E1XYZ")` will parse the provided string from the beginning to position 7, ignoring the letters at the end of the string. The resulting `Double` value is therefore 1.23 and `parsePosition.getIndex()` returns 7.

Having an exponent that exceeds `Long.MAX_VALUE`, for instance `parseNumber("0.123E9223372036854775808")`, the current implementation of `DecimalFormat` in JDK 22 does the following: Parse the provided string from the beginning to position 5, ignoring the exponent (because it is too long). The resulting `Double` is therefore 0.123 and `parsePosition.getIndex()` returns 5.

The solution implemented in this PR would produce a parsing result of `Double.POSITIVE_INFINITY`, however, `parsePosition.getIndex()` would still return 5. For me this would be confusing behaviour because `parsePosition.getIndex()` indicates that the input string has only been consumed up to index 5, however, the actual numeric parsing result indicates that the exponent was taken into account.

I think there are two intuitive, non-confusing ways to handle exponents that exceed `Long.MAX_VALUE`. Either go with the current behaviour of `DecimalFormat` for this kind of exponents or go with the way exponents are handled in this PR, but advance the parser position past the exponent even if it excceds `Long.MAX_VALUE`. 

The latter would mean that `parseNumber("0.123E9223372036854775807123")` would produce a parsing result of `Double.POSITIVE_INFINITY` and `parsePosition.getIndex()` would return 28. This would probably be the nicest, most intuitve solution.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19075#issuecomment-2094946378