RFR: 8365290: [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays [v7]

Vladimir Ivanov vaivanov at openjdk.org
Wed Oct 1 19:47:34 UTC 2025


On Wed, 1 Oct 2025 19:12:34 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   JDK-8365290 [perf] x86 ArrayFill intrinsic generates SPLIT_STORE for unaligned arrays
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5920:
> 
>> 5918:         if (EnableX86ECoreOpts) {
>> 5919:             // align 'big' arrays to cache lines to minimize split_stores
>> 5920:             cmpptr(count, 96 << shift);
> 
> What is `96?

Two trends identified for buffer filling:
 - filling up to cache line size by 4 bytes reduce performance;
 - operate by cache line size improve performance.
According to experiments on Xeon 6740E the 96 is good compromise. For small arrays it is better to ignore split_store and do filling by bigger elements.

> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 6014:
> 
>> 6012:   BIND(L_fill_4_bytes);
>> 6013:   subptr(count, 1 << shift);
>> 6014:   jccb(Assembler::greaterEqual, L_fill_4_bytes_loop);
> 
> I don't think it works correctly because you can come here from lines 5998-5999 where ` count` become negative.

testing for tier1, tier2 and tier3 were OK. Will review this part one more time.
Do you have test scenario that may reproduce this issue?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26747#discussion_r2395693046
PR Review Comment: https://git.openjdk.org/jdk/pull/26747#discussion_r2395701015


More information about the hotspot-dev mailing list