RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false [v2]
Andrew Haley
aph at openjdk.org
Mon Sep 1 10:14:42 UTC 2025
On Fri, 29 Aug 2025 11:34:26 GMT, Patrick Zhang <qpzhang at openjdk.org> wrote:
>> In AArch64 port, `UseBlockZeroing` is by default set to true and `BlockZeroingLowLimit` is initialized to 256. If `DC ZVA` is supported, `BlockZeroingLowLimit` is later updated to `4 * VM_Version::zva_length()`. When `UseBlockZeroing` is set to false, all related conditional checks should ignore `BlockZeroingLowLimit`. However, the function `MacroAssembler::zero_words(Register base, uint64_t cnt)` still evaluates the lower limit and bases its code generation logic on it, which appears to be an incomplete conditional check.
>>
>> This PR,
>> 1. In `MacroAssembler::zero_words(Register base, uint64_t cnt)`, added the checking of `UseBlockZeroing` to the if-cond `cnt > (uint64_t)BlockZeroingLowLimit / BytesPerWord`, strengthened the condition.
>> 2. In `MacroAssembler::zero_words(Register ptr, Register cnt)`, check `UseBlockZeroing` before checking the conditions of calling the stub function `zero_blocks`, which wraps the `DC ZVA` related instructions and works as the inner part of `zero_words`. Refined code and comments.
>> 3. For `generate_zero_blocks()`, removed the `UseBlockZeroing` checking and added an assertion, moved unrolled `STP` code-gen out to the caller side
>> 4. Added a warning message for if UseBlockZeroing is false and BlockZeroingLowLimit gets manually configured.
>> 5. Added more testing sizes to test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
>>
>> These changes improved the if-conds in `zero_words` functions around `BlockZeroingLowLimit`, ignore it if `UseBlockZeroing` is false. Performance tests are done on the bundled JMH `vm.compiler.ClearMemory`, and `vm.gc.RawAllocationRate` including `arrayTest` and `instanceTest`.
>>
>> Tests include,
>> 1. The wall time of `zero_words_reg_imm` got significantly improved under a particularly designed test case: `-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8`, `size=32` (`arrayTest` and `instanceTest`), the average wall time per call dropped from 309 ns (baseline) to 65 ns (patched), about -80%. The average call count also decreased from 335 to 202, in a 30s run. For example, `jdk/bin/java -jar images/test/micro/benchmarks.jar RawAllocationRate.arrayTest_C1 -bm thrpt -gc false -wi 0 -w 30 -i 1 -r 30 -t 1 -f 1 -tu s -jvmArgs "-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8" -p size=32`.
>> 2. `JMH RawAllocationRate` shows no obvious regression results. In details, patched vs baseline shows average ~70% positive impact, but ratios are minor around +0.5%, since the generated instruction sequences g...
>
> Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision:
>
> Roll back main changes on zero_words_reg_reg and generate_zero_blocks
>
> Signed-off-by: Patrick Zhang <patrick at os.amperecomputing.com>
It's difficult for anyone to predict all the possibilities of `-XX` command-line arguments that users might try, despite them not making any sense.
To begin with, please add this short patch, then see if any of this PR provides an advantage.
diff --git a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
index 9321dd0542e..14a584c5106 100644
--- a/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
+++ b/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp
@@ -446,6 +446,11 @@ void VM_Version::initialize() {
FLAG_SET_DEFAULT(UseBlockZeroing, false);
}
+ if (!UseBlockZeroing && !FLAG_IS_DEFAULT(BlockZeroingLowLimit)) {
+ warning("BlockZeroingLowLimit has been ignored because UseBlockZeroing is disabled");
+ FLAG_SET_DEFAULT(BlockZeroingLowLimit, 4 * VM_Version::zva_length());
+ }
+
if (VM_Version::supports_sve2()) {
if (FLAG_IS_DEFAULT(UseSVE)) {
FLAG_SET_DEFAULT(UseSVE, 2);
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26917#issuecomment-3241755892
More information about the hotspot-dev
mailing list