RFR: 8345465: Fix performance regression on x64 after JDK-8345120 [v3]

Thu Dec 5 18:15:39 UTC 2024

On Thu, 5 Dec 2024 11:43:16 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> This PR proposes to fix a performance regression (on x64 platforms) for 32-bit strings introduced by [JDK-8345120](https://bugs.openjdk.org/browse/JDK-8345120).
>> 
>> The PR also fixes a performance regression in the benchmarks caused by using the wrong type for `MemorySegment`.
>> 
>> Regrettably, this PR uses different code paths for various architectures. This gives optimum performance for all platforms at the expense of slightly more code complexity.
>> 
>> Base (macOS, M1) (Before https://github.com/openjdk/jdk/pull/22451)
>> 
>> 
>> Benchmark                               (size)  Mode  Cnt    Score   Error  Units
>> InternalStrLen.changedElementQuad            1  avgt   30    2.057 ? 0.012  ns/op
>> InternalStrLen.changedElementQuad            4  avgt   30    3.776 ? 0.031  ns/op
>> InternalStrLen.changedElementQuad           16  avgt   30    6.690 ? 0.060  ns/op
>> InternalStrLen.changedElementQuad          251  avgt   30   48.581 ? 0.764  ns/op
>> InternalStrLen.changedElementQuad         1024  avgt   30  196.188 ? 3.484  ns/op
>> InternalStrLen.chunkedDouble                 1  avgt   30    1.903 ? 0.013  ns/op
>> InternalStrLen.chunkedDouble                 4  avgt   30    3.446 ? 0.025  ns/op
>> InternalStrLen.chunkedDouble                16  avgt   30    5.759 ? 0.062  ns/op
>> InternalStrLen.chunkedDouble               251  avgt   30   26.892 ? 0.141  ns/op
>> InternalStrLen.chunkedDouble              1024  avgt   30   72.940 ? 1.562  ns/op
>> InternalStrLen.chunkedSingle                 1  avgt   30    1.897 ? 0.015  ns/op
>> InternalStrLen.chunkedSingle                 4  avgt   30    5.357 ? 0.560  ns/op
>> InternalStrLen.chunkedSingle                16  avgt   30    3.821 ? 0.052  ns/op
>> InternalStrLen.chunkedSingle               251  avgt   30   19.482 ? 0.190  ns/op
>> InternalStrLen.chunkedSingle              1024  avgt   30   38.938 ? 0.411  ns/op
>> InternalStrLen.chunkedSingleMisaligned       1  avgt   30    2.230 ? 0.147  ns/op
>> InternalStrLen.chunkedSingleMisaligned       4  avgt   30    5.424 ? 0.688  ns/op
>> InternalStrLen.chunkedSingleMisaligned      16  avgt   30    9.573 ? 0.063  ns/op
>> InternalStrLen.chunkedSingleMisaligned     251  avgt   30   22.242 ? 0.182  ns/op
>> InternalStrLen.chunkedSingleMisaligned    1024  avgt   30   45.442 ? 0.252  ns/op
>> InternalStrLen.elementByteMisaligned         1  avgt   30    1.616 ? 0.041  ns/op
>> InternalStrLen.elementByteMisaligned         4  avgt   30    2.982 ? 0.018  ns/op
>> InternalStrLen.elementByteMis...
>
> Per Minborg has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Improve short string cases

I'm a little bit confused by the numbers, which are for AArch64, while the patch fixes a regression on x64. Can you share any numbers on x64? Do you have an idea why long scanning doesn't help on x64?

src/java.base/share/classes/jdk/internal/foreign/StringSupport.java line 134:

> 132:         segment.checkBounds(fromOffset, length);
> 133:         if (length < 3) {
> 134:             switch ((int) length) {

How much do things like this actually help? I'd think that the added bytecode size might adversely affect inlining as well.

-------------

PR Review: https://git.openjdk.org/jdk/pull/22539#pullrequestreview-2482534043
PR Review Comment: https://git.openjdk.org/jdk/pull/22539#discussion_r1871877536