RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v5]

Vladimir Ivanov vaivanov at openjdk.org
Wed Sep 10 16:25:19 UTC 2025


On Wed, 10 Sep 2025 16:19:33 GMT, Vladimir Ivanov <vaivanov at openjdk.org> wrote:

>> On the SRF platform for runs with intrinsic scores for the ArrayFill test reports ~2x drop for several sizes due to a lot of the 'MEM_UOPS_RETIRED.SPLIT_STORES' events. The 'good' case for the ArraysFill.testCharFill with size=8195 reports numbers like
>> MEM_UOPS_RETIRED.SPLIT_LOADS | 22.6711
>> MEM_UOPS_RETIRED.SPLIT_STORES | 4.0859
>> while for 'bad' case these metrics are
>> MEM_UOPS_RETIRED.SPLIT_LOADS | 69.1785
>> MEM_UOPS_RETIRED.SPLIT_STORES | 259200.3659
>> 
>> With alignment on the cache size no score drops due to split_stores but small reduction may be reported due to extra 
>> SRF 6740E | Size | orig | pathed | pO/orig
>> -- | -- | -- | -- | --
>> ArraysFill.testByteFill | 16 | 152031.2 | 157001.2 | 1.03
>> ArraysFill.testByteFill | 31 | 125795.9 | 177399.2 | 1.41
>> ArraysFill.testByteFill | 250 | 57961.69 | 120981.9 | 2.09
>> ArraysFill.testByteFill | 266 | 44900.15 | 147893.8 | 3.29
>> ArraysFill.testByteFill | 511 | 61908.17 | 129830.1 | 2.10
>> ArraysFill.testByteFill | 2047 | 32255.51 | 41986.6 | 1.30
>> ArraysFill.testByteFill | 2048 | 31928.97 | 42154.3 | 1.32
>> ArraysFill.testByteFill | 8195 | 10690.15 | 11036.3 | 1.03
>> ArraysFill.testIntFill | 16 | 145030.7 | 318796.9 | 2.20
>> ArraysFill.testIntFill | 31 | 134138.4 | 212487 | 1.58
>> ArraysFill.testIntFill | 250 | 74179.23 | 79522.66 | 1.07
>> ArraysFill.testIntFill | 266 | 68112.72 | 60116.49 | 0.88
>> ArraysFill.testIntFill | 511 | 39693.28 | 36225.09 | 0.91
>> ArraysFill.testIntFill | 2047 | 11504.14 | 10616.91 | 0.92
>> ArraysFill.testIntFill | 2048 | 11244.71 | 10969.14 | 0.98
>> ArraysFill.testIntFill | 8195 | 2751.289 | 2692.216 | 0.98
>> ArraysFill.testLongFill | 16 | 212532.5 | 212526 | 1.00
>> ArraysFill.testLongFill | 31 | 137432.4 | 137283.3 | 1.00
>> ArraysFill.testLongFill | 250 | 43185 | 43159.78 | 1.00
>> ArraysFill.testLongFill | 266 | 42172.22 | 42170.5 | 1.00
>> ArraysFill.testLongFill | 511 | 23370.15 | 23370.86 | 1.00
>> ArraysFill.testLongFill | 2047 | 6123.008 | 6122.73 | 1.00
>> ArraysFill.testLongFill | 2048 | 5793.722 | 5792.855 | 1.00
>> ArraysFill.testLongFill | 8195 | 616.552 | 616.585 | 1.00
>> ArraysFill.testShortFill | 16 | 152088.6 | 265646.1 | 1.75
>> ArraysFill.testShortFill | 31 | 137369.8 | 185596.4 | 1.35
>> ArraysFill.testShortFill | 250 | 58872.03 | 99621.15 | 1.69
>> ArraysFill.testShortFill | 266 | 91085.31 | 93746.62 | 1.03
>> ArraysFill.testShortFill | 511 | 65331.96 | 78003.83 | 1.19
>> ArraysFill.testShortFill | 2047 | 21716.32 | 21216.81 | 0.98
>> ArraysFill.testShortFill...
>
> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
> 
>   JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays

The align cycle was updated to counted loop with alignment for 32 bytes (The main loop operates by 32 bytes values and 32bytes alignment enough to have no split_stores). Reported scores on the Xeon 6740E for the compiled version vs intrinsic are:

SRF | size | jdk26, comp | patched | p/jdk_comp
-- | -- | -- | -- | --
ArraysFill.testByteFill | 16 | 152077.563 | 168535.8 | 1.11
ArraysFill.testByteFill | 31 | 125702.057 | 212504.2 | 1.69
ArraysFill.testByteFill | 250 | 57965.263 | 127806.6 | 2.20
ArraysFill.testByteFill | 266 | 44901.157 | 145742 | 3.25
ArraysFill.testByteFill | 511 | 61918.237 | 109917.3 | 1.78
ArraysFill.testByteFill | 2047 | 32222.184 | 40348.55 | 1.25
ArraysFill.testByteFill | 2048 | 31930.607 | 35773.5 | 1.12
ArraysFill.testByteFill | 8195 | 10690.434 | 10709.71 | 1.00
ArraysFill.testIntFill | 16 | 144979.804 | 289781.8 | 2.00
ArraysFill.testIntFill | 31 | 133495.302 | 212475.7 | 1.59
ArraysFill.testIntFill | 250 | 74178.893 | 80775.39 | 1.09
ArraysFill.testIntFill | 266 | 68009.933 | 78090.68 | 1.15
ArraysFill.testIntFill | 511 | 39688.805 | 45553.15 | 1.15
ArraysFill.testIntFill | 2047 | 11504.203 | 11282.22 | 0.98
ArraysFill.testIntFill | 2048 | 11245.331 | 11512.12 | 1.02
ArraysFill.testIntFill | 8195 | 2692.649 | 2654.157 | 0.99
ArraysFill.testLongFill | 16 | 212541.769 | 212508.9 | 1.00
ArraysFill.testLongFill | 31 | 137729.599 | 137624.3 | 1.00
ArraysFill.testLongFill | 250 | 43162.979 | 43155.4 | 1.00
ArraysFill.testLongFill | 266 | 42173.88 | 42156.26 | 1.00
ArraysFill.testLongFill | 511 | 23364.859 | 23367.6 | 1.00
ArraysFill.testLongFill | 2047 | 6122.745 | 6123.296 | 1.00
ArraysFill.testLongFill | 2048 | 5792.552 | 5772.727 | 1.00
ArraysFill.testLongFill | 8195 | 616.62 | 616.257 | 1.00
ArraysFill.testShortFill | 16 | 152176.336 | 354182.7 | 2.33
ArraysFill.testShortFill | 31 | 137527.651 | 227688.8 | 1.66
ArraysFill.testShortFill | 250 | 58930.645 | 99614.52 | 1.69
ArraysFill.testShortFill | 266 | 91088.72 | 93755.19 | 1.03
ArraysFill.testShortFill | 511 | 65332.79 | 70824.73 | 1.08
ArraysFill.testShortFill | 2047 | 21713.296 | 22289.87 | 1.03
ArraysFill.testShortFill | 2048 | 21667.468 | 21021.92 | 0.97
ArraysFill.testShortFill | 8195 | 5922.318 | 5886.738 | 0.99

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26747#issuecomment-3275659629


More information about the hotspot-dev mailing list