RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false [v7]
Patrick Zhang
qpzhang at openjdk.org
Mon Nov 24 09:22:48 UTC 2025
On Fri, 21 Nov 2025 16:15:53 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Refine the count types to pass mac and win builds
>>
>> Signed-off-by: Patrick Zhang <patrick at os.amperecomputing.com>
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6236:
>
>> 6234: // The zero_blocks routine has already performed the necessary
>> 6235: // adjustments to r10 and r11, ensuring they are correctly set
>> 6236: // for subsequent processing.
>
> Suggestion:
>
> // A few words remain. zero_blocks() has adjusted r10 so that it
> // points to the remaining words and adjusted the count in r11.
Updated accordingly
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6268:
>
>> 6266: // and necessary unrolled str/stp expanding when the condition is not met.
>> 6267: // This approach also helps prevent sudden increases in code cache size
>> 6268: // when zeroing large memory areas in many places.
>
> Suggestion:
>
> // There is no need to check UseBlockZeroing here because that is
> // delegated to the zero_blocks stub. The code here is inlined, so
> // it is important to keep it small.
Updated
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6273:
>
>> 6271: result = zero_words(r10, r11);
>> 6272: } else {
>> 6273: #ifndef PRODUCT
>
> What is this change for?
The reason of why I swapped the `if` and `else` code block is:
1). Initially, I intended to add a check for `UseBlockZeroing` to determine whether to call `zero_words_reg_reg`. Swapping the `if` and `else` branches makes it easier to compare the behavior with and without this additional condition.
2). Later, we decided not to check `UseBlockZeroing` here but I still didn't roll back this change because the comments of warning `There is no need to check UseBlockZeroing..` should be placed before such an if condition, instead of the old one.
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6281:
>
>> 6279: #endif
>> 6280: // Use 16 words as the block size which is 128 bytes on 64-bit systems.
>> 6281: // A complete loop body will be 8 STPs unrolled there.
>
> Suggestion:
>
> // Use 16 words (128 bytes) as the block size.
Updated
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6282:
>
>> 6280: // Use 16 words as the block size which is 128 bytes on 64-bit systems.
>> 6281: // A complete loop body will be 8 STPs unrolled there.
>> 6282: const int block_size = 16;
>
> Naming this constant `block_size` only adds to any confusion, IMO.
I wondered why `MacroAssembler::zero_words` uses 16 words to do `stp` unrolling, while `generate_zero_blocks()` 8 words (`const int MacroAssembler::zero_words_block_size = 8;`), so defined this variable to compare `8 vs 16` but did not find obvious performance difference.
Regarding the var name `block_size`, could `unroll` or `unroll_words` be better?
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 694:
>
>> 692: // Process words with length exceeding the predefined
>> 693: // block size threshold. The loop body will be unrolled based on
>> 694: // the number of STPs calculated below.
>
> Suggestion:
>
> // Process any remaining blocks not handled by the stub.
Updated
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554847408
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554861246
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2555316021
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554863484
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554943081
PR Review Comment: https://git.openjdk.org/jdk/pull/26917#discussion_r2554946033
More information about the hotspot-dev
mailing list