RFR: 8363858: [perf] OptimizeFill may use wide set of intrinsics [v3]

Emanuel Peter epeter at openjdk.org
Tue Sep 9 05:49:21 UTC 2025


On Mon, 8 Sep 2025 21:09:51 GMT, Vladimir Ivanov <vaivanov at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:
>> 
>>   8363858: [perf] OptimizeFill may use wide set of intrinsics
>
> Perf data for the Xeon 6740E looks as:
> 
> Xeon 6740E | size | jdk_def | jdk_OptFill | p/def
> -- | -- | -- | -- | --
> ArraysFill.testByteFill | 16 | 152113.563 | 173028.75 | 1.14
> ArraysFill.testByteFill | 31 | 125889.446 | 212458.124 | 1.69
> ArraysFill.testByteFill | 250 | 57942.562 | 148391.738 | 2.56
> ArraysFill.testByteFill | 266 | 44883.928 | 156986.22 | 3.50
> ArraysFill.testByteFill | 511 | 61848.425 | 130192.732 | 2.11
> ArraysFill.testByteFill | 2047 | 32242.521 | 39893.863 | 1.24
> ArraysFill.testByteFill | 2048 | 31918.795 | 40665.974 | 1.27
> ArraysFill.testByteFill | 8195 | 10685.801 | 10126.615 | 0.95
> ArraysFill.testIntFill | 16 | 145059.116 | 318660.232 | 2.20
> ArraysFill.testIntFill | 31 | 131312.049 | 227632.469 | 1.73
> ArraysFill.testIntFill | 250 | 73997.421 | 81060.479 | 1.10
> ArraysFill.testIntFill | 266 | 68072.273 | 77967.322 | 1.15
> ArraysFill.testIntFill | 511 | 39691.774 | 45220.274 | 1.14
> ArraysFill.testIntFill | 2047 | 11499.726 | 11295.527 | 0.98
> ArraysFill.testIntFill | 2048 | 11240.285 | 11419.196 | 1.02
> ArraysFill.testIntFill | 8195 | 2758.273 | 1310.374 | 0.48
> ArraysFill.testLongFill | 16 | 212459.292 | 212458.565 | 1.00
> ArraysFill.testLongFill | 31 | 131924.591 | 137124.526 | 1.04
> ArraysFill.testLongFill | 250 | 43105.961 | 43131.914 | 1.00
> ArraysFill.testLongFill | 266 | 42149.578 | 42154.248 | 1.00
> ArraysFill.testLongFill | 511 | 23358.361 | 23358.681 | 1.00
> ArraysFill.testLongFill | 2047 | 6120.952 | 6121.333 | 1.00
> ArraysFill.testLongFill | 2048 | 5781.826 | 5788.489 | 1.00
> ArraysFill.testLongFill | 8195 | 615.994 | 616.218 | 1.00
> ArraysFill.testShortFill | 16 | 152050.701 | 353826.527 | 2.33
> ArraysFill.testShortFill | 31 | 136798.898 | 212330.48 | 1.55
> ArraysFill.testShortFill | 250 | 58773.76 | 99592.044 | 1.69
> ArraysFill.testShortFill | 266 | 91052.769 | 93735.404 | 1.03
> ArraysFill.testShortFill | 511 | 65312.819 | 77820.206 | 1.19
> ArraysFill.testShortFill | 2047 | 21704.419 | 20440.256 | 0.94
> ArraysFill.testShortFill | 2048 | 21657.535 | 21625.922 | 1.00
> ArraysFill.testShortFill | 8195 | 5920.221 | 5872.366 | 0.99
> 
> I.e. most of test cases reports better score with intrinsic code.
> Reported possible 2x drop (for example, ArraysFill.testIntFill, size=8195) relates to store_split metric and should be fixed by PR 26747.

@IvaVladimir Thanks for the numbers and explanations!
In that case, it seems that [JDK-8365290](https://bugs.openjdk.org/browse/JDK-8365290) is a regression of [JDK-8363858](https://bugs.openjdk.org/browse/JDK-8363858), and should be linked as such. I made some changes to reflect that. If you disagree, please let me know :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26974#issuecomment-3268961045


More information about the hotspot-dev mailing list