RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]
Claes Redestad
redestad at openjdk.java.net
Mon Apr 25 15:13:23 UTC 2022
On Wed, 20 Apr 2022 21:08:19 GMT, XenoAmess <duke at openjdk.java.net> wrote:
>> some thoughts after watching 8285001: Simplify StringLatin1.regionMatches https://github.com/openjdk/jdk/pull/8292/
>>
>> if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>> continue;
>> }
>>
>> should be changed to
>>
>> if (((u1 == c1) ? CharacterDataLatin1.instance.toLowerCase(c1) : c1) == Character.toLowerCase(u2)) {
>> continue;
>> }
>>
>> as:
>>
>> 1. c1 is LATIN1, so CharacterDataLatin1.instance.toLowerCase seems faster.
>> 2. because c1 is LATIN1, so if u1 != c1, then c1 is already lowercase, and don't need a lowercase cauculation.
>
> XenoAmess has updated the pull request incrementally with one additional commit since the last revision:
>
> remove = check
Unfortunately this leads to an error for case-insensitive `regionMatches` between a latin-1-string that contains either of `\u00b5` or `\u00ff` (these two code-points have upper case codepoints outside of the latin-1 range) and a UTF-16 string:
jshell> "\u00b5".regionMatches(true, 0, "\u0100", 0, 1)
| Exception java.lang.ArrayIndexOutOfBoundsException: Index 924 out of bounds for length 256
| at CharacterDataLatin1.getProperties (CharacterDataLatin1.java:74)
| at CharacterDataLatin1.toLowerCase (CharacterDataLatin1.java:140)
| at StringLatin1.regionMatchesCI_UTF16 (StringLatin1.java:420)
| at String.regionMatches (String.java:2238)
| at (#4:1)
-------------
PR: https://git.openjdk.java.net/jdk/pull/8308
More information about the core-libs-dev
mailing list