RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false

Andrew Dinn adinn at openjdk.org
Tue Aug 26 13:29:35 UTC 2025


On Sun, 24 Aug 2025 16:23:19 GMT, Patrick Zhang <qpzhang at openjdk.org> wrote:

> In AArch64 port, `UseBlockZeroing` is by default set to true and `BlockZeroingLowLimit` is initialized to 256. If `DC ZVA` is supported, `BlockZeroingLowLimit` is later updated to `4 * VM_Version::zva_length()`. When `UseBlockZeroing` is set to false, all related conditional checks should ignore `BlockZeroingLowLimit`. However, the function `MacroAssembler::zero_words(Register base, uint64_t cnt)` still evaluates the lower limit and bases its code generation logic on it, which appears to be an incomplete conditional check.
> 
> This PR,
> 1. In `MacroAssembler::zero_words(Register base, uint64_t cnt)`, added the checking of `UseBlockZeroing` to the if-cond `cnt > (uint64_t)BlockZeroingLowLimit / BytesPerWord`, strengthened the condition.
> 2. In `MacroAssembler::zero_words(Register ptr, Register cnt)`, check `UseBlockZeroing`  before checking the conditions of calling the stub function `zero_blocks`, which wraps the `DC ZVA` related instructions and works as the inner part of `zero_words`. Refined code and comments.
> 3. For `generate_zero_blocks()`, removed the `UseBlockZeroing` checking and added an assertion, moved unrolled `STP` code-gen out to the caller side
> 4. Added a warning message for if UseBlockZeroing is false and BlockZeroingLowLimit gets manually configured.
> 5. Added more testing sizes to test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
> 
> These changes improved the if-conds in `zero_words` functions around `BlockZeroingLowLimit`, ignore it if `UseBlockZeroing` is false. Performance tests are done on the bundled JMH `vm.compiler.ClearMemory`, and `vm.gc.RawAllocationRate` including `arrayTest` and `instanceTest`.
> 
> Tests include,
> 1. The wall time of `zero_words_reg_imm` got significantly improved under a particularly designed test case: `-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8`, `size=32` (`arrayTest` and `instanceTest`), the average wall time per call dropped from 309 ns (baseline) to 65 ns (patched), about -80%. The average call count also decreased from 335 to 202, in a 30s run. For example, `jdk/bin/java -jar images/test/micro/benchmarks.jar RawAllocationRate.arrayTest_C1 -bm thrpt -gc false -wi 0 -w 30 -i 1 -r 30 -t 1 -f 1 -tu s -jvmArgs "-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8" -p size=32`.
> 2. `JMH RawAllocationRate` shows no obvious regression results. In details, patched vs baseline shows average ~70% positive impact, but ratios are minor around +0.5%, since the generated instruction sequences got almost same as baseline, ...

@cnqpzhang If you look back at the history of this code you will see that you are undoing a change that was made deliberately by @theRealAph. Your patch may improve the specific test case you have provided but at the cost of a significant and unacceptable increase in code cache use for all cases.

The comment at the head of the code you have edited makes this point explicitly. The reasoning behind that comment is available in the JIRA history and associated review comments. The relevant issue is

  https://bugs.openjdk.org/browse/JDK-8179444

and the corresponding review thread starts with 

  https://mail.openjdk.org/pipermail/hotspot-dev/2017-April/026742.html

and continues with

  https://mail.openjdk.org/pipermail/hotspot-dev/2017-May/026766.html

I don't recommend integrating this change.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26917#issuecomment-3224177760


More information about the hotspot-dev mailing list