RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v4]
    Emanuel Peter 
    epeter at openjdk.org
       
    Tue Sep  9 06:00:13 UTC 2025
    
    
  
On Sat, 30 Aug 2025 16:04:00 GMT, Vladimir Ivanov <vaivanov at openjdk.org> wrote:
>> On the SRF platform for runs with intrinsic scores for the ArrayFill test reports ~2x drop for several sizes due to a lot of the 'MEM_UOPS_RETIRED.SPLIT_STORES' events. The 'good' case for the ArraysFill.testCharFill with size=8195 reports numbers like
>> MEM_UOPS_RETIRED.SPLIT_LOADS | 22.6711
>> MEM_UOPS_RETIRED.SPLIT_STORES | 4.0859
>> while for 'bad' case these metrics are
>> MEM_UOPS_RETIRED.SPLIT_LOADS | 69.1785
>> MEM_UOPS_RETIRED.SPLIT_STORES | 259200.3659
>> 
>> With alignment on the cache size no score drops due to split_stores but small reduction may be reported due to extra 
>> SRF 6740E | Size | orig | pathed | pO/orig
>> -- | -- | -- | -- | --
>> ArraysFill.testByteFill | 16 | 152031.2 | 157001.2 | 1.03
>> ArraysFill.testByteFill | 31 | 125795.9 | 177399.2 | 1.41
>> ArraysFill.testByteFill | 250 | 57961.69 | 120981.9 | 2.09
>> ArraysFill.testByteFill | 266 | 44900.15 | 147893.8 | 3.29
>> ArraysFill.testByteFill | 511 | 61908.17 | 129830.1 | 2.10
>> ArraysFill.testByteFill | 2047 | 32255.51 | 41986.6 | 1.30
>> ArraysFill.testByteFill | 2048 | 31928.97 | 42154.3 | 1.32
>> ArraysFill.testByteFill | 8195 | 10690.15 | 11036.3 | 1.03
>> ArraysFill.testIntFill | 16 | 145030.7 | 318796.9 | 2.20
>> ArraysFill.testIntFill | 31 | 134138.4 | 212487 | 1.58
>> ArraysFill.testIntFill | 250 | 74179.23 | 79522.66 | 1.07
>> ArraysFill.testIntFill | 266 | 68112.72 | 60116.49 | 0.88
>> ArraysFill.testIntFill | 511 | 39693.28 | 36225.09 | 0.91
>> ArraysFill.testIntFill | 2047 | 11504.14 | 10616.91 | 0.92
>> ArraysFill.testIntFill | 2048 | 11244.71 | 10969.14 | 0.98
>> ArraysFill.testIntFill | 8195 | 2751.289 | 2692.216 | 0.98
>> ArraysFill.testLongFill | 16 | 212532.5 | 212526 | 1.00
>> ArraysFill.testLongFill | 31 | 137432.4 | 137283.3 | 1.00
>> ArraysFill.testLongFill | 250 | 43185 | 43159.78 | 1.00
>> ArraysFill.testLongFill | 266 | 42172.22 | 42170.5 | 1.00
>> ArraysFill.testLongFill | 511 | 23370.15 | 23370.86 | 1.00
>> ArraysFill.testLongFill | 2047 | 6123.008 | 6122.73 | 1.00
>> ArraysFill.testLongFill | 2048 | 5793.722 | 5792.855 | 1.00
>> ArraysFill.testLongFill | 8195 | 616.552 | 616.585 | 1.00
>> ArraysFill.testShortFill | 16 | 152088.6 | 265646.1 | 1.75
>> ArraysFill.testShortFill | 31 | 137369.8 | 185596.4 | 1.35
>> ArraysFill.testShortFill | 250 | 58872.03 | 99621.15 | 1.69
>> ArraysFill.testShortFill | 266 | 91085.31 | 93746.62 | 1.03
>> ArraysFill.testShortFill | 511 | 65331.96 | 78003.83 | 1.19
>> ArraysFill.testShortFill | 2047 | 21716.32 | 21216.81 | 0.98
>> ArraysFill.testShortFill...
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays
If I understood our conversation [here](https://github.com/openjdk/jdk/pull/26974#issuecomment-3268030499) right, then this is a regression fix especially for `ArraysFill.testIntFill	8195`. There, you got a `0.48` regression. Now here you report `0.98`. I was a little confused about what is the base-line in your current patch. Can you make it more explicit what `orig` refers to? Is it the "original state" before #26974, or the master state before this issue here #26747?
Also, your PR comment abruptly stops mid sentence:
> but small reduction may be reported due to extra
Can you please fix that?
It seems that maybe the benchmark is not super reliable. It may be beneficial to run extra forks of the benchmark, to ensure you get many executions and a more stable measurement. It is quite possible that some of the +-10% regressions are due to alignment for example.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26747#issuecomment-3268986227
    
    
More information about the hotspot-dev
mailing list