RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v7]
Vladimir Ivanov
vaivanov at openjdk.org
Wed Oct 1 19:47:34 UTC 2025
On Wed, 1 Oct 2025 19:12:34 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>>
>> JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5920:
>
>> 5918: if (EnableX86ECoreOpts) {
>> 5919: // align 'big' arrays to cache lines to minimize split_stores
>> 5920: cmpptr(count, 96 << shift);
>
> What is `96?
Two trends identified for buffer filling:
- filling up to cache line size by 4 bytes reduce performance;
- operate by cache line size improve performance.
According to experiments on Xeon 6740E the 96 is good compromise. For small arrays it is better to ignore split_store and do filling by bigger elements.
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 6014:
>
>> 6012: BIND(L_fill_4_bytes);
>> 6013: subptr(count, 1 << shift);
>> 6014: jccb(Assembler::greaterEqual, L_fill_4_bytes_loop);
>
> I don't think it works correctly because you can come here from lines 5998-5999 where ` count` become negative.
testing for tier1, tier2 and tier3 were OK. Will review this part one more time.
Do you have test scenario that may reproduce this issue?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26747#discussion_r2395693046
PR Review Comment: https://git.openjdk.org/jdk/pull/26747#discussion_r2395701015
More information about the hotspot-dev
mailing list