RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false [v7]

Patrick Zhang qpzhang at openjdk.org
Fri Oct 17 04:23:05 UTC 2025


On Fri, 17 Oct 2025 04:19:42 GMT, Patrick Zhang <qpzhang at openjdk.org> wrote:

>> Issue: 
>> In AArch64 port, `UseBlockZeroing` is by default set to true and `BlockZeroingLowLimit` is initialized to 256. If `DC ZVA` is supported, `BlockZeroingLowLimit` is later updated to `4 * VM_Version::zva_length()`. When `UseBlockZeroing` is set to false, all related conditional checks should ignore `BlockZeroingLowLimit`. However, the function `MacroAssembler::zero_words(Register base, uint64_t cnt)` still evaluates the lower limit and bases its code generation logic on it, which seems to be an incomplete conditional check.
>> 
>> This PR:
>> 1. Reset `BlockZeroingLowLimit` to `4 * VM_Version::zva_length()` or 256 with a warning message if it was manually configured from the default while `UseBlockZeroing` is disabled.
>> 2. Added necessary comments in `MacroAssembler::zero_words(Register base, uint64_t cnt)` and `MacroAssembler::zero_words(Register ptr, Register cnt)` to explain why we do not check `UseBlockZeroing` in the outer part of these functions. Instead, the decision is delegated to the stub function `zero_blocks`, which encapsulates the DC ZVA instructions and serves as the inner implementation of `zero_words`. This approach helps better control the increase in code cache size during array or object instance initialization.
>> 3. Added more testing sizes to `test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java` to better cover scenarios involving smaller arrays and objects.. 
>> 
>> Tests:
>> 1. Performance tests on the bundled JMH `vm.compiler.ClearMemory`, and `vm.gc.RawAllocationRate` (including `arrayTest` and `instanceTest`) showed no obvious regression. Negative tests with `jdk/bin/java -jar images/test/micro/benchmarks.jar RawAllocationRate.arrayTest_C1 -bm thrpt -gc false -wi 0 -w 30 -i 1 -r 30 -t 1 -f 1 -tu s -jvmArgs "-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8" -p size=32` demonstrated good wall times on `zero_words_reg_imm` calls, as expected.
>> 2. Jtreg ter1 test on Ampere Altra, AmpereOne, Graviton2 and 3, tier2 on Altra. No new issues found. Passed tests of GHA Sanity Checks.
>
> Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Refine the count types to pass mac and win builds
>   
>   Signed-off-by: Patrick Zhang <patrick at os.amperecomputing.com>

Added [test/hotspot/gtest/aarch64/test_MacroAssembler_zero_words.cpp](https://github.com/openjdk/jdk/pull/26917/files#diff-6698530e8c1fe55cec7c20ca36725bde034b3cd74b25ccee77cc50f66dabf16c) to measure the impact of different low limits and cleared word counts on the wall time of `MacroAssembler::zero_words` and compare the resulting differences.

Run the test and compare the wall times. We can see that `fixing the low limit from a lower value to the default 256` improves codegen efficiency, by 11x on clear_4_words (289 vs. 25) and by 1.6x on clear_16_words (170 vs. 107).

$ make run-test TEST="gtest:MacroAssemblerZeroWordsTest"

> Clear 4 words with lower limit 8, zero_words wall time (ns): 289
> Clear 4 words with lower limit 256, zero_words wall time (ns): 25
> Clear 16 words with lower limit 64, zero_words wall time (ns): 170
> Clear 16 words with lower limit 256, zero_words wall time (ns): 107

See below for the detailed run log, including the generated code sequences under various conditions:


Test selection 'gtest:MacroAssemblerZeroWordsTest', will run:
* gtest:MacroAssemblerZeroWordsTest/server

Running test 'gtest:MacroAssemblerZeroWordsTest/server'
Note: Google Test filter = MacroAssemblerZeroWordsTest*
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from MacroAssemblerZeroWordsTest
[ RUN      ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_8B_vm
--------------------------------------------------------------------------------
udf     #0
  0x0000400011c56a40:   mov     x10, #0x6fb0                    // #28592
  0x0000400011c56a44:   movk    x10, #0xab05, lsl #16
  0x0000400011c56a48:   movk    x10, #0xaaaa, lsl #32
  0x0000400011c56a4c:   orr     x11, xzr, #0x4
  0x0000400011c56a50:   subs    x8, x11, #0x8
  0x0000400011c56a54:   b.cc    0x0000400011c56a5c  // b.lo, b.ul, b.last
  0x0000400011c56a58:   bl      Stub::Stub Generator zero_blocks_stub
  0x0000400011c56a5c:   tbz     w11, #2, 0x0000400011c56a68
  0x0000400011c56a60:   stp     xzr, xzr, [x10], #16
  0x0000400011c56a64:   stp     xzr, xzr, [x10], #16
  0x0000400011c56a68:   tbz     w11, #1, 0x0000400011c56a70
  0x0000400011c56a6c:   stp     xzr, xzr, [x10], #16
  0x0000400011c56a70:   tbz     w11, #0, 0x0000400011c56a78
  0x0000400011c56a74:   str     xzr, [x10]
--------------------------------------------------------------------------------

Clear 4 words with lower limit 8, zero_words wall time (ns): 289
[       OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_8B_vm (2 ms)
[ RUN      ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_256B_vm
--------------------------------------------------------------------------------
udf     #0
  0x0000400011c57400:   mov     x10, #0x6fb0                    // #28592
  0x0000400011c57404:   movk    x10, #0xab05, lsl #16
  0x0000400011c57408:   movk    x10, #0xaaaa, lsl #32
  0x0000400011c5740c:   stp     xzr, xzr, [x10]
  0x0000400011c57410:   stp     xzr, xzr, [x10, #16]
--------------------------------------------------------------------------------

Clear 4 words with lower limit 256, zero_words wall time (ns): 25
[       OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_256B_vm (0 ms)
[ RUN      ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_64B_vm
--------------------------------------------------------------------------------
udf     #0
  0x0000400011c57400:   mov     x10, #0x6fe0                    // #28640
  0x0000400011c57404:   movk    x10, #0xab05, lsl #16
  0x0000400011c57408:   movk    x10, #0xaaaa, lsl #32
  0x0000400011c5740c:   orr     x11, xzr, #0x10
  0x0000400011c57410:   subs    x8, x11, #0x8
  0x0000400011c57414:   b.cc    0x0000400011c5741c  // b.lo, b.ul, b.last
  0x0000400011c57418:   bl      Stub::Stub Generator zero_blocks_stub
  0x0000400011c5741c:   tbz     w11, #2, 0x0000400011c57428
  0x0000400011c57420:   stp     xzr, xzr, [x10], #16
  0x0000400011c57424:   stp     xzr, xzr, [x10], #16
  0x0000400011c57428:   tbz     w11, #1, 0x0000400011c57430
  0x0000400011c5742c:   stp     xzr, xzr, [x10], #16
  0x0000400011c57430:   tbz     w11, #0, 0x0000400011c57438
  0x0000400011c57434:   str     xzr, [x10]
--------------------------------------------------------------------------------

Clear 16 words with lower limit 64, zero_words wall time (ns): 170
[       OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_64B_vm (0 ms)
[ RUN      ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_256B_vm
--------------------------------------------------------------------------------
udf     #0
  0x0000400011c57400:   mov     x10, #0x6fe0                    // #28640
  0x0000400011c57404:   movk    x10, #0xab05, lsl #16
  0x0000400011c57408:   movk    x10, #0xaaaa, lsl #32
  0x0000400011c5740c:   stp     xzr, xzr, [x10]
  0x0000400011c57410:   stp     xzr, xzr, [x10, #16]
  0x0000400011c57414:   stp     xzr, xzr, [x10, #32]
  0x0000400011c57418:   stp     xzr, xzr, [x10, #48]
  0x0000400011c5741c:   stp     xzr, xzr, [x10, #64]
  0x0000400011c57420:   stp     xzr, xzr, [x10, #80]
  0x0000400011c57424:   stp     xzr, xzr, [x10, #96]
  0x0000400011c57428:   stp     xzr, xzr, [x10, #112]
  0x0000400011c5742c:   add     x10, x10, #0x80
--------------------------------------------------------------------------------

Clear 16 words with lower limit 256, zero_words wall time (ns): 107
[       OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_256B_vm (0 ms)
[----------] 4 tests from MacroAssemblerZeroWordsTest (110 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (110 ms total)
[  PASSED  ] 4 tests.
Finished running test 'gtest:MacroAssemblerZeroWordsTest/server'
Test report is stored in build-pr/test-results/gtest_MacroAssemblerZeroWordsTest_server

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR  SKIP   
   gtest:MacroAssemblerZeroWordsTest/server              4     4     0     0     0   
==============================
TEST SUCCESS

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26917#issuecomment-3413812045


More information about the hotspot-dev mailing list