RFR: 8360459: UNICODE_CASE and character class with non-ASCII range does not match ASCII char [v5]
Xueming Shen
sherman at openjdk.org
Tue Jul 15 15:18:54 UTC 2025
On Mon, 14 Jul 2025 07:28:09 GMT, Xueming Shen <sherman at openjdk.org> wrote:
>> src/java.base/share/classes/jdk/internal/util/regex/CaseFolding.java.template line 99:
>>
>>> 97: */
>>> 98: public static int[] getClassRangeClosingCharacters(int start, int end) {
>>> 99: int[] expanded = new int[expanded_casefolding.size()];
>>
>> Can be `Math.min(expanded_casefolding.size(), end - start)` in case the table grows large, and update the `off < expanded.length` check below too.
>
> The table itself probably isn't going to grow significantly anytime soon, and we’ll likely have enough time to adjust if CaseFolding.txt does get substantially bigger.
>
> That said, I probably should consider reversing the lookup logic: instead of iterating through [start, end], we could iterate over the expansion table and check whether any of its code points fall within the input range, at least when the range size is larger than the size of the table, kinda O(n) vs O(1)-ish.
updated the lookup logic as discussed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26285#discussion_r2207809731
More information about the build-dev
mailing list