RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false [v7]

Mon Nov 24 09:22:48 UTC 2025

On Fri, 21 Nov 2025 16:15:53 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Refine the count types to pass mac and win builds
>>   
>>   Signed-off-by: Patrick Zhang <patrick at os.amperecomputing.com>
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6236:
> 
>> 6234:   // The zero_blocks routine has already performed the necessary
>> 6235:   // adjustments to r10 and r11, ensuring they are correctly set
>> 6236:   // for subsequent processing.
> 
> Suggestion:
> 
>   // A few words remain. zero_blocks() has adjusted r10 so that it
>   // points to the remaining words and adjusted the count in r11.

Updated accordingly

> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6268:
> 
>> 6266:   // and necessary unrolled str/stp expanding when the condition is not met.
>> 6267:   // This approach also helps prevent sudden increases in code cache size
>> 6268:   // when zeroing large memory areas in many places.
> 
> Suggestion:
> 
>   // There is no need to check UseBlockZeroing here because that is
>   // delegated to the zero_blocks stub. The code here is inlined, so
>   // it is important to keep it small.

Updated

> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6273:
> 
>> 6271:     result = zero_words(r10, r11);
>> 6272:   } else {
>> 6273: #ifndef PRODUCT
> 
> What is this change for?

The reason of why I swapped the `if` and `else` code block is:
1). Initially, I intended to add a check for `UseBlockZeroing` to determine whether to call `zero_words_reg_reg`. Swapping the `if` and `else` branches makes it easier to compare the behavior with and without this additional condition.
2). Later, we decided not to check `UseBlockZeroing` here but I still didn't roll back this change because the comments of warning `There is no need to check UseBlockZeroing..` should be placed before such an if condition, instead of the old one.

> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6281:
> 
>> 6279: #endif
>> 6280:     // Use 16 words as the block size which is 128 bytes on 64-bit systems.
>> 6281:     // A complete loop body will be 8 STPs unrolled there.
> 
> Suggestion:
> 
>     // Use 16 words (128 bytes) as the block size.

Updated

> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6282:
> 
>> 6280:     // Use 16 words as the block size which is 128 bytes on 64-bit systems.
>> 6281:     // A complete loop body will be 8 STPs unrolled there.
>> 6282:     const int block_size = 16;
> 
> Naming this constant `block_size` only adds to any confusion, IMO.

I wondered why `MacroAssembler::zero_words` uses 16 words to do `stp` unrolling, while `generate_zero_blocks()` 8 words (`const int MacroAssembler::zero_words_block_size = 8;`), so defined this variable to compare `8 vs 16` but did not find obvious performance difference. 

Regarding the var name `block_size`, could `unroll` or `unroll_words` be better?

> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 694:
> 
>> 692:       // Process words with length exceeding the predefined
>> 693:       // block size threshold. The loop body will be unrolled based on
>> 694:       // the number of STPs calculated below.
> 
> Suggestion:
> 
>       // Process any remaining blocks not handled by the stub.

Updated

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554847408
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554861246
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2555316021
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554863484
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554943081
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554946033