RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]
Roger Riggs
rriggs at openjdk.java.net
Wed Apr 20 21:22:24 UTC 2022
On Wed, 20 Apr 2022 21:08:19 GMT, XenoAmess <duke at openjdk.java.net> wrote:
>> some thoughts after watching 8285001: Simplify StringLatin1.regionMatches https://github.com/openjdk/jdk/pull/8292/
>>
>> if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>> continue;
>> }
>>
>> should be changed to
>>
>> if (((u1 == c1) ? CharacterDataLatin1.instance.toLowerCase(c1) : c1) == Character.toLowerCase(u2)) {
>> continue;
>> }
>>
>> as:
>>
>> 1. c1 is LATIN1, so CharacterDataLatin1.instance.toLowerCase seems faster.
>> 2. because c1 is LATIN1, so if u1 != c1, then c1 is already lowercase, and don't need a lowercase cauculation.
>
> XenoAmess has updated the pull request incrementally with one additional commit since the last revision:
>
> remove = check
Can you run the JMH against the code before either change (or an existing JDK).
It would be interesting to quantify the improvements of going straight to Latin1.
(Understanding current hardware architectures and their parallelism is hard to understand well.
They do clever things with branch prediction and potentially optimistically executing both paths
and then discarding the non-branch case. The existing code for toLower and toUpper already includes a branch or two; adding one more branch to the sequence likely can't be optimized.)
These interactions at the instruction level is why measuring is important.
Thanks
-------------
PR: https://git.openjdk.java.net/jdk/pull/8308
More information about the core-libs-dev
mailing list