<i18n dev> RFR: 8365675: Add String Unicode Case-Folding Support [v4]

Tue Oct 28 21:41:36 UTC 2025

On Mon, 27 Oct 2025 09:40:03 GMT, Xueming Shen <sherman at openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/StringLatin1.java line 217:
>> 
>>> 215:             if (c1 != c2) {
>>> 216:                 return c1 - c2;
>>> 217:             }
>> 
>> Compute difference only once.
>> Suggestion:
>> 
>>             if ((c1 - c2) != 0) {
>>                 return c1 - c2;
>>             }
>
> you meant go with
> 
>             if ((c1 - c2) != 0) {
> 	                return c1 - c2;
> 	        } ?

Yes, it might be an inconsequential difference but directly computes the difference and returns if not-equal.

>> src/java.base/share/classes/java/lang/StringUTF16.java line 605:
>> 
>>> 603:         int k1 = off, k2 = ooff, fk1 = 0, fk2 = 0;
>>> 604:         while ((k1 < last || folded1 != null && fk1 < folded1.length) &&
>>> 605:                (k2 < olast || folded2 != null && fk2 < folded2.length)) {
>> 
>> Use ArraySupport.mismatch to quickly scan past identical sequences.  (byte index will need to be converted to char index).
>
> seems like we are only using it for case-aware comparison in existing string comparison. just wonder do we really want to apply this for the case insensitive comparision?
> 
>     static int compareTo(byte[] value, byte[] other, int len1, int len2) {
>         int lim = Math.min(len1, len2);
>         int k = ArraysSupport.mismatch(value, other, lim);
>         return (k < 0) ? len1 - len2 : getChar(value, k) - getChar(other, k);
>     }

I expect most strings will be predominated by normal (not folded) strings, the use of mismatch can rapidly skip over identical strings to find the start of bytes/characters that need to be checked for folding.
I would think it would help speed up the case-sensitive comparisons too.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2471124837
PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2471118690