RFR: 8363858: [perf] OptimizeFill may use wide set of intrinsics [v3]

Vladimir Ivanov vaivanov at openjdk.org
Mon Sep 8 21:12:18 UTC 2025


On Wed, 3 Sep 2025 22:11:17 GMT, Vladimir Ivanov <vaivanov at openjdk.org> wrote:

>> Default mode should use OptimizeFill=true option for the SRF platform.
>
> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:
> 
>   8363858: [perf] OptimizeFill may use wide set of intrinsics

Perf data for the Xeon 6740E looks as:

Xeon 6740E | size | jdk_def | jdk_OptFill | p/def
-- | -- | -- | -- | --
ArraysFill.testByteFill | 16 | 152113.563 | 173028.75 | 1.14
ArraysFill.testByteFill | 31 | 125889.446 | 212458.124 | 1.69
ArraysFill.testByteFill | 250 | 57942.562 | 148391.738 | 2.56
ArraysFill.testByteFill | 266 | 44883.928 | 156986.22 | 3.50
ArraysFill.testByteFill | 511 | 61848.425 | 130192.732 | 2.11
ArraysFill.testByteFill | 2047 | 32242.521 | 39893.863 | 1.24
ArraysFill.testByteFill | 2048 | 31918.795 | 40665.974 | 1.27
ArraysFill.testByteFill | 8195 | 10685.801 | 10126.615 | 0.95
ArraysFill.testIntFill | 16 | 145059.116 | 318660.232 | 2.20
ArraysFill.testIntFill | 31 | 131312.049 | 227632.469 | 1.73
ArraysFill.testIntFill | 250 | 73997.421 | 81060.479 | 1.10
ArraysFill.testIntFill | 266 | 68072.273 | 77967.322 | 1.15
ArraysFill.testIntFill | 511 | 39691.774 | 45220.274 | 1.14
ArraysFill.testIntFill | 2047 | 11499.726 | 11295.527 | 0.98
ArraysFill.testIntFill | 2048 | 11240.285 | 11419.196 | 1.02
ArraysFill.testIntFill | 8195 | 2758.273 | 1310.374 | 0.48
ArraysFill.testLongFill | 16 | 212459.292 | 212458.565 | 1.00
ArraysFill.testLongFill | 31 | 131924.591 | 137124.526 | 1.04
ArraysFill.testLongFill | 250 | 43105.961 | 43131.914 | 1.00
ArraysFill.testLongFill | 266 | 42149.578 | 42154.248 | 1.00
ArraysFill.testLongFill | 511 | 23358.361 | 23358.681 | 1.00
ArraysFill.testLongFill | 2047 | 6120.952 | 6121.333 | 1.00
ArraysFill.testLongFill | 2048 | 5781.826 | 5788.489 | 1.00
ArraysFill.testLongFill | 8195 | 615.994 | 616.218 | 1.00
ArraysFill.testShortFill | 16 | 152050.701 | 353826.527 | 2.33
ArraysFill.testShortFill | 31 | 136798.898 | 212330.48 | 1.55
ArraysFill.testShortFill | 250 | 58773.76 | 99592.044 | 1.69
ArraysFill.testShortFill | 266 | 91052.769 | 93735.404 | 1.03
ArraysFill.testShortFill | 511 | 65312.819 | 77820.206 | 1.19
ArraysFill.testShortFill | 2047 | 21704.419 | 20440.256 | 0.94
ArraysFill.testShortFill | 2048 | 21657.535 | 21625.922 | 1.00
ArraysFill.testShortFill | 8195 | 5920.221 | 5872.366 | 0.99

I.e. most of test cases reports better score with intrinsic code.
Reported possible 2x drop (for example, ArraysFill.testIntFill, size=8195) relates to store_split metric and should be fixed by PR 26747.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26974#issuecomment-3268030499


More information about the hotspot-dev mailing list