RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]

Claes Redestad redestad at openjdk.java.net
Mon Apr 25 15:13:23 UTC 2022


On Wed, 20 Apr 2022 21:08:19 GMT, XenoAmess <duke at openjdk.java.net> wrote:

>> some thoughts after watching 8285001: Simplify StringLatin1.regionMatches  https://github.com/openjdk/jdk/pull/8292/
>> 
>>             if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>>                 continue;
>>             }
>> 
>> should be changed to 
>> 
>>             if (((u1 == c1) ? CharacterDataLatin1.instance.toLowerCase(c1) : c1) == Character.toLowerCase(u2)) {
>>                 continue;
>>             }
>> 
>> as:
>> 
>> 1. c1 is LATIN1, so CharacterDataLatin1.instance.toLowerCase seems faster.
>> 2. because c1 is LATIN1, so if u1 != c1, then c1 is already lowercase, and don't need a lowercase cauculation.
>
> XenoAmess has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove = check

Unfortunately this leads to an error for case-insensitive `regionMatches` between a latin-1-string that contains either of `\u00b5` or `\u00ff` (these two code-points have upper case codepoints outside of the latin-1 range) and a UTF-16 string:


jshell> "\u00b5".regionMatches(true, 0, "\u0100", 0, 1)
|  Exception java.lang.ArrayIndexOutOfBoundsException: Index 924 out of bounds for length 256
|        at CharacterDataLatin1.getProperties (CharacterDataLatin1.java:74)
|        at CharacterDataLatin1.toLowerCase (CharacterDataLatin1.java:140)
|        at StringLatin1.regionMatchesCI_UTF16 (StringLatin1.java:420)
|        at String.regionMatches (String.java:2238)
|        at (#4:1)

-------------

PR: https://git.openjdk.java.net/jdk/pull/8308


More information about the core-libs-dev mailing list