<i18n dev> RFR: 8365675: Add String Unicode Case-Folding Support [v4]
Roger Riggs
rriggs at openjdk.org
Tue Oct 28 21:41:36 UTC 2025
On Mon, 27 Oct 2025 09:40:03 GMT, Xueming Shen <sherman at openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/StringLatin1.java line 217:
>>
>>> 215: if (c1 != c2) {
>>> 216: return c1 - c2;
>>> 217: }
>>
>> Compute difference only once.
>> Suggestion:
>>
>> if ((c1 - c2) != 0) {
>> return c1 - c2;
>> }
>
> you meant go with
>
> if ((c1 - c2) != 0) {
> return c1 - c2;
> } ?
Yes, it might be an inconsequential difference but directly computes the difference and returns if not-equal.
>> src/java.base/share/classes/java/lang/StringUTF16.java line 605:
>>
>>> 603: int k1 = off, k2 = ooff, fk1 = 0, fk2 = 0;
>>> 604: while ((k1 < last || folded1 != null && fk1 < folded1.length) &&
>>> 605: (k2 < olast || folded2 != null && fk2 < folded2.length)) {
>>
>> Use ArraySupport.mismatch to quickly scan past identical sequences. (byte index will need to be converted to char index).
>
> seems like we are only using it for case-aware comparison in existing string comparison. just wonder do we really want to apply this for the case insensitive comparision?
>
> static int compareTo(byte[] value, byte[] other, int len1, int len2) {
> int lim = Math.min(len1, len2);
> int k = ArraysSupport.mismatch(value, other, lim);
> return (k < 0) ? len1 - len2 : getChar(value, k) - getChar(other, k);
> }
I expect most strings will be predominated by normal (not folded) strings, the use of mismatch can rapidly skip over identical strings to find the start of bytes/characters that need to be checked for folding.
I would think it would help speed up the case-sensitive comparisons too.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2471124837
PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2471118690
More information about the i18n-dev
mailing list