RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false [v2]

Patrick Zhang qpzhang at openjdk.org
Fri Aug 29 11:34:26 UTC 2025


> In AArch64 port, `UseBlockZeroing` is by default set to true and `BlockZeroingLowLimit` is initialized to 256. If `DC ZVA` is supported, `BlockZeroingLowLimit` is later updated to `4 * VM_Version::zva_length()`. When `UseBlockZeroing` is set to false, all related conditional checks should ignore `BlockZeroingLowLimit`. However, the function `MacroAssembler::zero_words(Register base, uint64_t cnt)` still evaluates the lower limit and bases its code generation logic on it, which appears to be an incomplete conditional check.
> 
> This PR,
> 1. In `MacroAssembler::zero_words(Register base, uint64_t cnt)`, added the checking of `UseBlockZeroing` to the if-cond `cnt > (uint64_t)BlockZeroingLowLimit / BytesPerWord`, strengthened the condition.
> 2. In `MacroAssembler::zero_words(Register ptr, Register cnt)`, check `UseBlockZeroing`  before checking the conditions of calling the stub function `zero_blocks`, which wraps the `DC ZVA` related instructions and works as the inner part of `zero_words`. Refined code and comments.
> 3. For `generate_zero_blocks()`, removed the `UseBlockZeroing` checking and added an assertion, moved unrolled `STP` code-gen out to the caller side
> 4. Added a warning message for if UseBlockZeroing is false and BlockZeroingLowLimit gets manually configured.
> 5. Added more testing sizes to test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java
> 
> These changes improved the if-conds in `zero_words` functions around `BlockZeroingLowLimit`, ignore it if `UseBlockZeroing` is false. Performance tests are done on the bundled JMH `vm.compiler.ClearMemory`, and `vm.gc.RawAllocationRate` including `arrayTest` and `instanceTest`.
> 
> Tests include,
> 1. The wall time of `zero_words_reg_imm` got significantly improved under a particularly designed test case: `-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8`, `size=32` (`arrayTest` and `instanceTest`), the average wall time per call dropped from 309 ns (baseline) to 65 ns (patched), about -80%. The average call count also decreased from 335 to 202, in a 30s run. For example, `jdk/bin/java -jar images/test/micro/benchmarks.jar RawAllocationRate.arrayTest_C1 -bm thrpt -gc false -wi 0 -w 30 -i 1 -r 30 -t 1 -f 1 -tu s -jvmArgs "-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8" -p size=32`.
> 2. `JMH RawAllocationRate` shows no obvious regression results. In details, patched vs baseline shows average ~70% positive impact, but ratios are minor around +0.5%, since the generated instruction sequences got almost same as baseline, ...

Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision:

  Roll back main changes on zero_words_reg_reg and generate_zero_blocks
  
  Signed-off-by: Patrick Zhang <patrick at os.amperecomputing.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26917/files
  - new: https://git.openjdk.org/jdk/pull/26917/files/98ee2799..14c18f7f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26917&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26917&range=00-01

  Stats: 74 lines in 2 files changed: 24 ins; 19 del; 31 mod
  Patch: https://git.openjdk.org/jdk/pull/26917.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26917/head:pull/26917

PR: https://git.openjdk.org/jdk/pull/26917


More information about the hotspot-dev mailing list