RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]

Wed Apr 20 21:22:24 UTC 2022

On Wed, 20 Apr 2022 21:08:19 GMT, XenoAmess <duke at openjdk.java.net> wrote:

>> some thoughts after watching 8285001: Simplify StringLatin1.regionMatches  https://github.com/openjdk/jdk/pull/8292/
>> 
>>             if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>>                 continue;
>>             }
>> 
>> should be changed to 
>> 
>>             if (((u1 == c1) ? CharacterDataLatin1.instance.toLowerCase(c1) : c1) == Character.toLowerCase(u2)) {
>>                 continue;
>>             }
>> 
>> as:
>> 
>> 1. c1 is LATIN1, so CharacterDataLatin1.instance.toLowerCase seems faster.
>> 2. because c1 is LATIN1, so if u1 != c1, then c1 is already lowercase, and don't need a lowercase cauculation.
>
> XenoAmess has updated the pull request incrementally with one additional commit since the last revision:
> 
>   remove = check

Can you run the JMH against the code before either change (or an existing JDK).
It would be interesting to quantify the improvements of going straight to Latin1.

(Understanding current hardware architectures and their parallelism is hard to understand well.
They do clever things with branch prediction and potentially optimistically executing both paths
and then discarding the non-branch case.  The existing code for toLower and toUpper already includes a branch or two; adding one more branch to the sequence likely can't be optimized.)

These interactions at the instruction level is why measuring is important.
Thanks

-------------

PR: https://git.openjdk.java.net/jdk/pull/8308