RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v4]

Emanuel Peter epeter at openjdk.org
Fri Sep 5 15:14:12 UTC 2025


On Mon, 1 Sep 2025 15:29:16 GMT, Vladimir Ivanov <vaivanov at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays
>
> Later alignment improve performance a little bit. Current numbers are:
> SRF | size | jdk26 | patched with "+optFill" | patched/jdk26
> ArraysFill.testByteFill | 16 | 151937.634 | 175045.819 | 1.15
> ArraysFill.testByteFill | 31 | 125661.092 | 211226.668 | 1.68
> ArraysFill.testByteFill | 250 | 57599.684 | 123670.638 | 2.15
> ArraysFill.testByteFill | 266 | 44617.505 | 147306.352 | 3.30
> ArraysFill.testByteFill | 511 | 61541.499 | 129234.48 | 2.10
> ArraysFill.testByteFill | 2047 | 32073.997 | 41503.438 | 1.29
> ArraysFill.testByteFill | 2048 | 31729.263 | 41977.271 | 1.32
> ArraysFill.testByteFill | 8195 | 10620.363 | 10911.334 | 1.03
> ArraysFill.testIntFill | 16 | 144924.577 | 264101.45 | 1.82
> ArraysFill.testIntFill | 31 | 128877.207 | 211225.233 | 1.64
> ArraysFill.testIntFill | 250 | 73785.182 | 79204.674 | 1.07
> ArraysFill.testIntFill | 266 | 67703.171 | 75436.831 | 1.11
> ArraysFill.testIntFill | 511 | 39489.095 | 36011.078 | 0.91
> ArraysFill.testIntFill | 2047 | 11431.835 | 10509.545 | 0.92
> ArraysFill.testIntFill | 2048 | 11178.661 | 10882.991 | 0.97
> ArraysFill.testIntFill | 8195 | 2629.065 | 2601.19 | 0.99
> ArraysFill.testLongFill | 16 | 211218.892 | 211250.585 | 1.00
> ArraysFill.testLongFill | 31 | 133026.186 | 137374.876 | 1.03
> ArraysFill.testLongFill | 250 | 42907.745 | 42937.988 | 1.00
> ArraysFill.testLongFill | 266 | 41935.645 | 41920.801 | 1.00
> ArraysFill.testLongFill | 511 | 23217.606 | 23227.904 | 1.00
> ArraysFill.testLongFill | 2047 | 6083.099 | 6083.384 | 1.00
> ArraysFill.testLongFill | 2048 | 5751.203 | 5753.409 | 1.00
> ArraysFill.testLongFill | 8195 | 612.17 | 612.634 | 1.00
> ArraysFill.testShortFill | 16 | 151917.079 | 352122.571 | 2.32
> ArraysFill.testShortFill | 31 | 138000.217 | 226271.221 | 1.64
> ArraysFill.testShortFill | 250 | 58641.362 | 99043.571 | 1.69
> ArraysFill.testShortFill | 266 | 90499.649 | 93200.335 | 1.03
> ArraysFill.testShortFill | 511 | 64958.462 | 77930.734 | 1.20
> ArraysFill.testShortFill | 2047 | 21577.954 | 21210.006 | 0.98
> ArraysFill.testShortFill | 2048 | 21538.005 | 21429.382 | 0.99
> ArraysFill.testShortFill | 8195 | 5883.097 | 5775.499 | 0.98

@IvaVladimir Both in the PR description and in your later results there are some significant regressions (-10&) as well as improvements. Can you please explain what is going on there?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26747#issuecomment-3258712616


More information about the hotspot-dev mailing list