RFR: 8349452: Fix performance regression for Arrays.fill() with AVX512 [v2]

Fri Nov 21 22:07:08 UTC 2025

On Fri, 21 Nov 2025 01:13:47 GMT, Srinivas Vamsi Parasa <sparasa at openjdk.org> wrote:

>> The goal of this PR is to fix the performance regression in Arrays.fill() x86 stubs caused by masked AVX stores. The fix is to replace the masked AVX stores with store instructions without masks (i.e. unmasked stores). `fill32_masked()` and `fill64_masked()` stubs are replaced with `fill32_unmasked()` and `fill64_unmasked()` respectively.
>> 
>> To speedup unmasked stores, array fills for sizes < 64 bytes are broken down into sequences of 32B, 16B, 8B, 4B, 2B and 1B stores, depending on the size.
>> 
>> 
>> ### **Performance comparison for byte array fills in a loop for 1 million times**
>> <body lang=en-US style='font-family:Calibri;font-size:10.0pt'>
>> 
>> <div style='direction:ltr'>
>> 
>> 
>> UseAVX=3   ByteArray Size | +OptimizeFill    (Masked store   stub)   [secs] | -OptimizeFill   (No stub)   [secs] | --->This PR: +OptimizeFill   (Unmasked store   stub)   [secs]
>> -- | -- | -- | --
>> 1 | 0.46 | 0.14 | 0.263
>> 2 | 0.46 | 0.16 | 0.264
>> 5 | 0.46 | 0.29 | 0.30
>> 10 | 0.46 | 0.58 | 0.32
>> 15 | 0.46 | 0.42 | 0.276
>> 16 | 0.46 | 0.46 | 0.32
>> 17 | 0.21 | 0.5 | 0.3
>> 20 | 0.21 | 0.37 | 0.3
>> 25 | 0.21 | 0.59 | 0.288
>> 31 | 0.21 | 0.53 | 0.284
>> 32 | 0.21 | 0.58 | 0.322
>> 35 | 0.5 | 0.77 | 0.29
>> 40 | 0.5 | 0.61 | 0.367
>> 45 | 0.5 | 0.52 | 0.324
>> 48 | 0.5 | 0.66 | 0.368
>> 49 | 0.22 | 0.69 | 0.342
>> 50 | 0.22 | 0.78 | 0.346
>> 55 | 0.22 | 0.67 | 0.3
>> 60 | 0.22 | 0.67 | 0.322
>> 64 | 0.22 | 0.82 | 0.362
>> 70 | 0.51 | 1.1 | 0.32
>> 80 | 0.49 | 0.89 | 0.37
>> 90 | 0.225 | 0.68 | 0.343
>> 100 | 0.54 | 1.09 | 0.41
>> 110 | 0.6 | 0.98 | 0.36
>> 120 | 0.26 | 0.75 | 0.386
>> 128 | 0.266 | 1.1 | 0.402
>> 
>> 
>> 
>> </div>
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   undo size check for fill64_masked

The pre-submit test seem to be unrelated to the PR changes. A fresh merge with tip might resolve those.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9245:

> 9243: }
> 9244: 
> 9245: void MacroAssembler::fill32_unmasked(uint shift, Register dst, int disp, XMMRegister xmm,

This could be called as fill32_tail. Also good to replace overall fill32_masked with fill32_tail.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9305:

> 9303: }
> 9304: 
> 9305: void MacroAssembler::fill64_unmasked(uint shift, Register dst, int disp,

This could be called as fill64_tail. Also good to replace overall fill64_masked with fill64_tail.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9362:

> 9360:     jcc(Assembler::greater, L_fill_64_bytes);
> 9361:     fill32_unmasked(shift, to, 0, xtmp, count, rtmp);
> 9362:     jmp(L_exit);

Instead of repeating fill32_unmasked multiple time, you could jmp to say L_fill_32_tail and have the fill32_unmasked code there one time.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9380:

> 9378:     bind(L_fill_96_bytes);
> 9379:     cmpq(count, 96 >> shift);
> 9380:     jcc(Assembler::greater, L_fill_128_bytes);

With the suggestion to have fill32_unmasked and fill64_unmasked one time, you may be able to retain the jccb.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9383:

> 9381:     fill64(to, 0, xtmp);
> 9382:     subq(count, 64 >> shift);
> 9383:     fill32_unmasked(shift, to, 64, xtmp, count, rtmp);

Instead of repeating fill64_unmasked multiple time, you could jmp to say L_fill_64_tail and have the fill64_unmasked code there one time.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28442#issuecomment-3564755769
PR Review Comment: https://git.openjdk.org/jdk/pull/28442#discussion_r2551064898
PR Review Comment: https://git.openjdk.org/jdk/pull/28442#discussion_r2551066035
PR Review Comment: https://git.openjdk.org/jdk/pull/28442#discussion_r2551070541
PR Review Comment: https://git.openjdk.org/jdk/pull/28442#discussion_r2551074095
PR Review Comment: https://git.openjdk.org/jdk/pull/28442#discussion_r2551071856