<i18n dev> RFR: 8363972: Loose matching of dash/minusSign in number parsing [v7]

Justin Lu jlu at openjdk.org
Tue Aug 5 17:15:07 UTC 2025


On Mon, 4 Aug 2025 23:57:28 GMT, Naoto Sato <naoto at openjdk.org> wrote:

>> Enabling lenient minus sign matching when parsing numbers. In some locales, e.g. Finnish, the default minus sign is the Unicode "Minus Sign" (U+2212), which is not the "Hyphen Minus" (U+002D) that users type in from keyboard. Thus the parsing of user input numbers may fail. This change utilizes CLDR's `parseLenient` element for minus signs and loosely matches them with the hyphen-minus so that user input numbers can parse. As this is a behavioral change, a corresponding CSR has been drafted.
>
> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
> 
>   refrects review comments

Latest changes look good to me. I think ignoring supplementary/normalization is fine and it would have been excessive otherwise. I left some more minor comments.

src/java.base/share/classes/java/text/CompactNumberFormat.java line 219:

> 217:  * </pre></blockquote>
> 218:  *
> 219:  * @implNote The implementation follows the LDML specification to enable loose

Suggestion:

 * @implNote The JDK Reference Implementation follows the LDML specification to enable loose

src/java.base/share/classes/java/text/DecimalFormat.java line 421:

> 419:  * returns a numerically greater value.
> 420:  *
> 421:  * @implNote The implementation follows the LDML specification to enable loose

Suggestion:

 * @implNote The JDK Reference Implementation follows the LDML specification to enable loose

src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 719:

> 717: 
> 718:     /**
> 719:      * {@return the lenient minus signs} Multiple lenient minus signs

Do we have an idea of when a given locale would not have access to the lenient minus signs provided by parseLenient element? If the vast majority of times it will, then it is fine. Otherwise IMO, `getLenientMinusSigns()` should probably call out the fact that it may not always be a concatenation of multiple minus _signs_, and it may be a single minus _sign_ (as assigned by `minusSignText`). Since that detail is only apparent by digging around the code that assigns `lenientMinusSigns`.

src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 720:

> 718:     /**
> 719:      * {@return the lenient minus signs} Multiple lenient minus signs
> 720:      * are concatenated to form the returned string.

The surrounding package private methods use since tags.

Suggestion:

     * are concatenated to form the returned string.
     * @since 26

src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 852:

> 850: 
> 851:         // Lenient minus signs
> 852:         lenientMinusSigns = numberElements.length < 14 ? minusSignText : numberElements[13];

BTW, if I remove this check and always assign to `numberElements[13]`, I do not observe any failures in the java_text/Format suite. It would be nice to have an idea of why this check is needed. (I understand it is following the same length checks of monetarySeparator and monetaryGroupingSeparator.)

-------------

PR Review: https://git.openjdk.org/jdk/pull/26580#pullrequestreview-3086137609
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2252827372
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254807654
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254847469
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254792967
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254846860


More information about the i18n-dev mailing list