RFR: 8321283: Reuse StringLatin1::equals in regionMatches
Claes Redestad
redestad at openjdk.org
Wed Nov 26 20:14:16 UTC 2025
On Sun, 3 Dec 2023 05:58:27 GMT, Francesco Nigro <duke at openjdk.org> wrote:
>> src/java.base/share/classes/java/lang/String.java line 2186:
>>
>>> 2184: if (coder == otherCoder) {
>>> 2185: if (ooffset == 0 && toffset == 0 && len == (tv.length >> coder) && ov.length == tv.length) {
>>> 2186: return StringLatin1.equals(tv, ov);
>>
>> A few questions on this:
>>
>> - Should this also handle `UTF-16`?
>> - Could we save a comparison by `OR`ing offsets?
>> - The `equals` method will do the `tv` and `ov` length comparisons themselves, could we remove that check here or is the method call overhead too large?
>> - Do you happen to have results for `make test TEST="micro:java.lang.StringComparisons"`?
>>
>> Suggestion:
>>
>> if ((ooffset | toffset) == 0 && len == (tv.length >> coder)) {
>> return coder == LATIN1
>> ? StringLatin1.equals(tv, ov)
>> : StringUTF16.equals(tv, ov);
>> }
>
>> Could we save a comparison by ORing offsets?
>
> I'm ok to use the or to save an additional offset comparison, good idea!
>
>> Should this also handle UTF-16?
>
> Currently String::equals doesn't use the UTF16 variant of the intrinsic, see https://github.com/openjdk/jdk/blob/949846986f572dfb82912e7d71e7bfd37a90871e/src/java.base/share/classes/java/lang/String.java#L1879-L1880; I think it's because checking byte per byte equality (from a semantic perspective), still hold. I didn't checked what are the other usage for the UTF16 version, but @cl4es could clarify it .
>
>> The equals method will do the tv and ov length comparisons themselves, could we remove that check here or is the method call overhead too large?
>
> I didn't remove it because region equality won't return (always) false, as equals would do, in such case, but will compare the (If existing) minimal common available length (till the specified len) between the two.
>
>> Do you happen to have results for make test TEST="micro:java.lang.StringComparisons"?
>
> Not yet, I have performed a microbenchmark myself to verify that with small strings (below 16 bytes), equals was much better (both by enabling and disabling the mismatch intrinsics), but I will do it, thanks!
>
> I'm missing as well to verify the bytecode size of the method after my changes, to make sure I didn't make it non-inlineable.
Yes, since the `coder` is checked first we know the bytes compared are either both latin1 or both UTF16 so there's no need for a UTF-16-aware char-by-char comparison here. In fact the `StringUTF16.equals` method looks completely unused.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16933#discussion_r1413697276
More information about the core-libs-dev
mailing list