<i18n dev> RFR: 8363972: Loose matching of dash/minusSign in number parsing [v2]
Justin Lu
jlu at openjdk.org
Fri Aug 1 00:14:54 UTC 2025
On Thu, 31 Jul 2025 22:04:56 GMT, Naoto Sato <naoto at openjdk.org> wrote:
>> Enabling lenient minus sign matching when parsing numbers. In some locales, e.g. Finnish, the default minus sign is the Unicode "Minus Sign" (U+2212), which is not the "Hyphen Minus" (U+002D) that users type in from keyboard. Thus the parsing of user input numbers may fail. This change utilizes CLDR's `parseLenient` element for minus signs and loosely matches them with the hyphen-minus so that user input numbers can parse. As this is a behavioral change, a corresponding CSR has been drafted.
>
> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
>
> Address review comments
src/java.base/share/classes/java/text/DecimalFormat.java line 3544:
> 3542: var t = text.substring(position, Math.min(tlen, position + alen))
> 3543: .replaceAll(lmsp, "-");
> 3544: return t.regionMatches(0, a, 0, alen);
I presume the original solution was regex for all cases, and you made the single char case for the fast path. If you still have the perf benchmark handy, I wonder how a char by char approach would compare instead of regex. It would also simplify the code since it would be combining the slow and fast path. Although the perf is variable based on the affix string length vs lms length (11).
int i = 0;
for (; position + i < Math.min(tlen, position + alen); i++) {
int tIndex = lms.indexOf(text.charAt(position + i));
int aIndex = lms.indexOf(affix.charAt(i));
// Non LMS. Match direct
if (tIndex < 0 && aIndex < 0) {
if (text.charAt(position + i) != affix.charAt(i)) {
return false;
}
} else {
// By here, at least one LMS. Ensure both LMS.
if (tIndex < 0 || aIndex < 0) {
return false;
}
}
}
// Return true if entire affix was matched
return i == alen;
test/jdk/java/text/Format/NumberFormat/LenientMinusSignTest.java line 120:
> 118: public void testLenientPrefix(String sign) throws ParseException {
> 119: var df = new DecimalFormat(PREFIX, DFS);
> 120: df.setStrict(false);
No need to set strict as false since `df` is lenient by default. If it was done for clarity, I think the method name already adequately indicates it is a lenient style test. Applies to CNF test as well.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2246593124
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2246491998
More information about the i18n-dev
mailing list