<i18n dev> RFR: 8363972: Loose matching of dash/minusSign in number parsing [v7]
Justin Lu
jlu at openjdk.org
Tue Aug 5 17:15:07 UTC 2025
On Mon, 4 Aug 2025 23:57:28 GMT, Naoto Sato <naoto at openjdk.org> wrote:
>> Enabling lenient minus sign matching when parsing numbers. In some locales, e.g. Finnish, the default minus sign is the Unicode "Minus Sign" (U+2212), which is not the "Hyphen Minus" (U+002D) that users type in from keyboard. Thus the parsing of user input numbers may fail. This change utilizes CLDR's `parseLenient` element for minus signs and loosely matches them with the hyphen-minus so that user input numbers can parse. As this is a behavioral change, a corresponding CSR has been drafted.
>
> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
>
> refrects review comments
Latest changes look good to me. I think ignoring supplementary/normalization is fine and it would have been excessive otherwise. I left some more minor comments.
src/java.base/share/classes/java/text/CompactNumberFormat.java line 219:
> 217: * </pre></blockquote>
> 218: *
> 219: * @implNote The implementation follows the LDML specification to enable loose
Suggestion:
* @implNote The JDK Reference Implementation follows the LDML specification to enable loose
src/java.base/share/classes/java/text/DecimalFormat.java line 421:
> 419: * returns a numerically greater value.
> 420: *
> 421: * @implNote The implementation follows the LDML specification to enable loose
Suggestion:
* @implNote The JDK Reference Implementation follows the LDML specification to enable loose
src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 719:
> 717:
> 718: /**
> 719: * {@return the lenient minus signs} Multiple lenient minus signs
Do we have an idea of when a given locale would not have access to the lenient minus signs provided by parseLenient element? If the vast majority of times it will, then it is fine. Otherwise IMO, `getLenientMinusSigns()` should probably call out the fact that it may not always be a concatenation of multiple minus _signs_, and it may be a single minus _sign_ (as assigned by `minusSignText`). Since that detail is only apparent by digging around the code that assigns `lenientMinusSigns`.
src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 720:
> 718: /**
> 719: * {@return the lenient minus signs} Multiple lenient minus signs
> 720: * are concatenated to form the returned string.
The surrounding package private methods use since tags.
Suggestion:
* are concatenated to form the returned string.
* @since 26
src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 852:
> 850:
> 851: // Lenient minus signs
> 852: lenientMinusSigns = numberElements.length < 14 ? minusSignText : numberElements[13];
BTW, if I remove this check and always assign to `numberElements[13]`, I do not observe any failures in the java_text/Format suite. It would be nice to have an idea of why this check is needed. (I understand it is following the same length checks of monetarySeparator and monetaryGroupingSeparator.)
-------------
PR Review: https://git.openjdk.org/jdk/pull/26580#pullrequestreview-3086137609
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2252827372
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254807654
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254847469
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254792967
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254846860
More information about the i18n-dev
mailing list