<i18n dev> RFR: 8363972: Loose matching of dash/minusSign in number parsing [v2]
Naoto Sato
naoto at openjdk.org
Fri Aug 1 19:11:04 UTC 2025
On Thu, 31 Jul 2025 23:58:46 GMT, Justin Lu <jlu at openjdk.org> wrote:
>> Naoto Sato has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Address review comments
>
> src/java.base/share/classes/java/text/DecimalFormat.java line 3544:
>
>> 3542: var t = text.substring(position, Math.min(tlen, position + alen))
>> 3543: .replaceAll(lmsp, "-");
>> 3544: return t.regionMatches(0, a, 0, alen);
>
> I presume the original solution was regex for all cases, and you made the single char case for the fast path. If you still have the perf benchmark handy, I wonder how a char by char approach would compare instead of regex. It would also simplify the code since it would be combining the slow and fast path.
>
>
> int i = 0;
> for (; position + i < Math.min(tlen, position + alen); i++) {
> int tIndex = lms.indexOf(text.charAt(position + i));
> int aIndex = lms.indexOf(affix.charAt(i));
> // Non LMS. Match direct
> if (tIndex < 0 && aIndex < 0) {
> if (text.charAt(position + i) != affix.charAt(i)) {
> return false;
> }
> } else {
> // By here, at least one LMS. Ensure both LMS.
> if (tIndex < 0 || aIndex < 0) {
> return false;
> }
> }
> }
> // Return true if entire affix was matched
> return i == alen;
Comparing string char-by-char is problematic, as it cannot handle supplementary and/or normalization, so it is not possible to combine those paths. Yes, regex is slow, but need to rely on it for those reasons. I believe 99.9% of the use cases fall into the affix length == 1, so I think it is OK.
> test/jdk/java/text/Format/NumberFormat/LenientMinusSignTest.java line 120:
>
>> 118: public void testLenientPrefix(String sign) throws ParseException {
>> 119: var df = new DecimalFormat(PREFIX, DFS);
>> 120: df.setStrict(false);
>
> No need to set strict as false since `df` is lenient by default. If it was done for clarity, I think the method name already adequately indicates it is a lenient style test. Applies to CNF test as well.
This was intentional. Although redundant, I wanted to make sure it is in lenient mode.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2248692543
PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2248692479
More information about the i18n-dev
mailing list