<i18n dev> RFR: 8363972: Loose matching of dash/minusSign in number parsing [v2]

Fri Aug 1 19:11:04 UTC 2025

On Thu, 31 Jul 2025 23:58:46 GMT, Justin Lu <jlu at openjdk.org> wrote:

>> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Address review comments
>
> src/java.base/share/classes/java/text/DecimalFormat.java line 3544:
> 
>> 3542:             var t = text.substring(position, Math.min(tlen, position + alen))
>> 3543:                 .replaceAll(lmsp, "-");
>> 3544:             return t.regionMatches(0, a, 0, alen);
> 
> I presume the original solution was regex for all cases, and you made the single char case for the fast path. If you still have the perf benchmark handy, I wonder how a char by char approach would compare instead of regex. It would also simplify the code since it would be combining the slow and fast path.
> 
> 
> int i = 0;
> for (; position + i < Math.min(tlen, position + alen); i++) {
>     int tIndex = lms.indexOf(text.charAt(position + i));
>     int aIndex = lms.indexOf(affix.charAt(i));
>     // Non LMS. Match direct
>     if (tIndex < 0 && aIndex < 0) {
>         if (text.charAt(position + i) != affix.charAt(i)) {
>             return false;
>         }
>     } else {
>         // By here, at least one LMS. Ensure both LMS.
>         if (tIndex < 0 || aIndex < 0) {
>             return false;
>         }
>     }
> }
> // Return true if entire affix was matched
> return i == alen;

Comparing string char-by-char is problematic, as it cannot handle supplementary and/or normalization, so it is not possible to combine those paths. Yes, regex is slow, but need to rely on it for those reasons. I believe 99.9% of the use cases fall into the affix length == 1, so I think it is OK.

> test/jdk/java/text/Format/NumberFormat/LenientMinusSignTest.java line 120:
> 
>> 118:         public void testLenientPrefix(String sign) throws ParseException {
>> 119:             var df = new DecimalFormat(PREFIX, DFS);
>> 120:             df.setStrict(false);
> 
> No need to set strict as false since `df` is lenient by default. If it was done for clarity, I think the method name already adequately indicates it is a lenient style test. Applies to CNF test as well.

This was intentional. Although redundant, I wanted to make sure it is in lenient mode.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2248692543
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2248692479