RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false

Patrick Zhang qpzhang at openjdk.org
Sun Aug 24 16:28:18 UTC 2025


In AArch64 port, `UseBlockZeroing` is by default set to true and `BlockZeroingLowLimit` is initialized to 256. If `DC ZVA` is supported, `BlockZeroingLowLimit` is later updated to `4 * VM_Version::zva_length()`. When `UseBlockZeroing` is set to false, all related conditional checks should ignore `BlockZeroingLowLimit`. However, the function `MacroAssembler::zero_words(Register base, uint64_t cnt)` still evaluates the lower limit and bases its code generation logic on it, which looks like an incorrect if-cond.

This PR,
1. In `MacroAssembler::zero_words(Register base, uint64_t cnt)`, added the checking of `UseBlockZeroing` to the if-cond `cnt > (uint64_t)BlockZeroingLowLimit / BytesPerWord`, strengthened the condition.
2. In `MacroAssembler::zero_words(Register ptr, Register cnt)`, check `UseBlockZeroing`  before checking the conditions of calling the stub function `zero_blocks`, which wraps the `DC ZVA` related instructions and works as the inner part of `zero_words`.
3. For `generate_zero_blocks()`, removed the `UseBlockZeroing` checking and added an assertion, moved unrolled `STP` code-gen out to the caller side
4. Added a warning message for if UseBlockZeroing is false and BlockZeroingLowLimit gets manually configured.
5. Added more testing sizes to test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java

These changes improved the if-conds in `zero_words` functions around `BlockZeroingLowLimit`, ignore it if `UseBlockZeroing` is false. Performance tests are done on the bundled JMH `vm.compiler.ClearMemory`, and `vm.gc.RawAllocationRate` including `arrayTest` and `instanceTest`.

Tests include,
1. The wall time of `zero_words_reg_imm` got significantly improved under a particularly designed test case: `-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8`, `size=24` (`arrayTest` and `instanceTest`), the average wall time per call dropped from 281 ns (baseline) to 63 ns (patched), about -80%. The average call count also decreased from 350 to 220, in a 30s run. For example, `jdk/bin/java -jar images/test/micro/benchmarks.jar RawAllocationRate.arrayTest_C1 -wi 2 -w 30 -i 1 -r 30 -t 1 -f 1`.
2. `JMH RawAllocationRate` shows no obvious regression results. In details, patched vs baseline shows average ~70% positive impact, but ratios are minor around +0.5%, since the generated instruction sequences got almost same as baseline, or only slightly updated on various low limits checking. So, this makes sense.
3. Also run Jtreg ter1 test on Ampere Altra, AmpereOne, Graviton2 and 3, tier2 on Altra. No new issues found. Passed tests of GHA Sanity Checks.

-------------

Commit messages:
 - 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false

Changes: https://git.openjdk.org/jdk/pull/26917/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26917&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8365991
  Stats: 152 lines in 4 files changed: 85 ins; 31 del; 36 mod
  Patch: https://git.openjdk.org/jdk/pull/26917.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26917/head:pull/26917

PR: https://git.openjdk.org/jdk/pull/26917


More information about the hotspot-dev mailing list