RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v7]
    Volodymyr Paprotski 
    vpaprotski at openjdk.org
       
    Fri Sep 26 23:01:30 UTC 2025
    
    
  
On Wed, 24 Sep 2025 15:41:48 GMT, Vladimir Ivanov <vaivanov at openjdk.org> wrote:
>> On the SRF platform after PR https://github.com/openjdk/jdk/pull/26974  the fill intrinsics used by default.
>> For some types/ sizes scores for the ArrayFill test reports ~2x drop due to a lot of the 'MEM_UOPS_RETIRED.SPLIT_STORES' events. For example, the 'good' case for the ArraysFill.testCharFill with size=8195 reports numbers like
>> MEM_UOPS_RETIRED.SPLIT_LOADS | 22.6711
>> MEM_UOPS_RETIRED.SPLIT_STORES | 4.0859
>> while for 'bad' case these metrics are
>> MEM_UOPS_RETIRED.SPLIT_LOADS | 69.1785
>> MEM_UOPS_RETIRED.SPLIT_STORES | 259200.3659
>> 
>> With alignment for the cache line size no score drops due to split_stores but small reduction may be reported for 'good' cases due to extra instructions in the intrinsic. The default options set was used for testing with '-XX:-OptimizeFill' for compiled code and with '-XX:+OptimizeFill' to force intrinsic.
>> SRF 6740E | Size | compiled code| patched intrinsic| patched/compiled
>> -- | -- | -- | -- | --
>> ArraysFill.testByteFill | 16 | 152031.2 | 157001.2 | 1.03
>> ArraysFill.testByteFill | 31 | 125795.9 | 177399.2 | 1.41
>> ArraysFill.testByteFill | 250 | 57961.69 | 120981.9 | 2.09
>> ArraysFill.testByteFill | 266 | 44900.15 | 147893.8 | 3.29
>> ArraysFill.testByteFill | 511 | 61908.17 | 129830.1 | 2.10
>> ArraysFill.testByteFill | 2047 | 32255.51 | 41986.6 | 1.30
>> ArraysFill.testByteFill | 2048 | 31928.97 | 42154.3 | 1.32
>> ArraysFill.testByteFill | 8195 | 10690.15 | 11036.3 | 1.03
>> ArraysFill.testIntFill | 16 | 145030.7 | 318796.9 | 2.20
>> ArraysFill.testIntFill | 31 | 134138.4 | 212487 | 1.58
>> ArraysFill.testIntFill | 250 | 74179.23 | 79522.66 | 1.07
>> ArraysFill.testIntFill | 266 | 68112.72 | 60116.49 | 0.88
>> ArraysFill.testIntFill | 511 | 39693.28 | 36225.09 | 0.91
>> ArraysFill.testIntFill | 2047 | 11504.14 | 10616.91 | 0.92
>> ArraysFill.testIntFill | 2048 | 11244.71 | 10969.14 | 0.98
>> ArraysFill.testIntFill | 8195 | 2751.289 | 2692.216 | 0.98
>> ArraysFill.testLongFill | 16 | 212532.5 | 212526 | 1.00
>> ArraysFill.testLongFill | 31 | 137432.4 | 137283.3 | 1.00
>> ArraysFill.testLongFill | 250 | 43185 | 43159.78 | 1.00
>> ArraysFill.testLongFill | 266 | 42172.22 | 42170.5 | 1.00
>> ArraysFill.testLongFill | 511 | 23370.15 | 23370.86 | 1.00
>> ArraysFill.testLongFill | 2047 | 6123.008 | 6122.73 | 1.00
>> ArraysFill.testLongFill | 2048 | 5793.722 | 5792.855 | 1.00
>> ArraysFill.testLongFill | 8195 | 616.552 | 616.585 | 1.00
>> ArraysFill.testShortFill | 16 | 152088.6 | 265646.1 | 1.75
>> ArraysFill.testShortFill | 31 | 1...
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays
I've spent some time staring at the assembler, everything looks correct to me, so adding my checkmark..
My only "complaint" is that I wish for a wider-scope fix? But thats clearly out-of-scope here. Though it sounds like that is what @eme64 is indeed already thinking about... (This is quite the multi-dimensional optimization problem... arch, size, type, code-size, callsite-constants... etc?)
With that realization, I am just fine with an incremental fix just for 'EnableX86ECoreOpts' models.
-------------
Marked as reviewed by vpaprotski (Committer).
PR Review: https://git.openjdk.org/jdk/pull/26747#pullrequestreview-3274154812
    
    
More information about the hotspot-dev
mailing list