RFR: 8365991: AArch64: Ignore BlockZeroingLowLimit when UseBlockZeroing is false [v7]
Patrick Zhang
qpzhang at openjdk.org
Fri Oct 17 04:23:05 UTC 2025
On Fri, 17 Oct 2025 04:19:42 GMT, Patrick Zhang <qpzhang at openjdk.org> wrote:
>> Issue:
>> In AArch64 port, `UseBlockZeroing` is by default set to true and `BlockZeroingLowLimit` is initialized to 256. If `DC ZVA` is supported, `BlockZeroingLowLimit` is later updated to `4 * VM_Version::zva_length()`. When `UseBlockZeroing` is set to false, all related conditional checks should ignore `BlockZeroingLowLimit`. However, the function `MacroAssembler::zero_words(Register base, uint64_t cnt)` still evaluates the lower limit and bases its code generation logic on it, which seems to be an incomplete conditional check.
>>
>> This PR:
>> 1. Reset `BlockZeroingLowLimit` to `4 * VM_Version::zva_length()` or 256 with a warning message if it was manually configured from the default while `UseBlockZeroing` is disabled.
>> 2. Added necessary comments in `MacroAssembler::zero_words(Register base, uint64_t cnt)` and `MacroAssembler::zero_words(Register ptr, Register cnt)` to explain why we do not check `UseBlockZeroing` in the outer part of these functions. Instead, the decision is delegated to the stub function `zero_blocks`, which encapsulates the DC ZVA instructions and serves as the inner implementation of `zero_words`. This approach helps better control the increase in code cache size during array or object instance initialization.
>> 3. Added more testing sizes to `test/micro/org/openjdk/bench/vm/gc/RawAllocationRate.java` to better cover scenarios involving smaller arrays and objects..
>>
>> Tests:
>> 1. Performance tests on the bundled JMH `vm.compiler.ClearMemory`, and `vm.gc.RawAllocationRate` (including `arrayTest` and `instanceTest`) showed no obvious regression. Negative tests with `jdk/bin/java -jar images/test/micro/benchmarks.jar RawAllocationRate.arrayTest_C1 -bm thrpt -gc false -wi 0 -w 30 -i 1 -r 30 -t 1 -f 1 -tu s -jvmArgs "-XX:-UseBlockZeroing -XX:BlockZeroingLowLimit=8" -p size=32` demonstrated good wall times on `zero_words_reg_imm` calls, as expected.
>> 2. Jtreg ter1 test on Ampere Altra, AmpereOne, Graviton2 and 3, tier2 on Altra. No new issues found. Passed tests of GHA Sanity Checks.
>
> Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision:
>
> Refine the count types to pass mac and win builds
>
> Signed-off-by: Patrick Zhang <patrick at os.amperecomputing.com>
Added [test/hotspot/gtest/aarch64/test_MacroAssembler_zero_words.cpp](https://github.com/openjdk/jdk/pull/26917/files#diff-6698530e8c1fe55cec7c20ca36725bde034b3cd74b25ccee77cc50f66dabf16c) to measure the impact of different low limits and cleared word counts on the wall time of `MacroAssembler::zero_words` and compare the resulting differences.
Run the test and compare the wall times. We can see that `fixing the low limit from a lower value to the default 256` improves codegen efficiency, by 11x on clear_4_words (289 vs. 25) and by 1.6x on clear_16_words (170 vs. 107).
$ make run-test TEST="gtest:MacroAssemblerZeroWordsTest"
> Clear 4 words with lower limit 8, zero_words wall time (ns): 289
> Clear 4 words with lower limit 256, zero_words wall time (ns): 25
> Clear 16 words with lower limit 64, zero_words wall time (ns): 170
> Clear 16 words with lower limit 256, zero_words wall time (ns): 107
See below for the detailed run log, including the generated code sequences under various conditions:
Test selection 'gtest:MacroAssemblerZeroWordsTest', will run:
* gtest:MacroAssemblerZeroWordsTest/server
Running test 'gtest:MacroAssemblerZeroWordsTest/server'
Note: Google Test filter = MacroAssemblerZeroWordsTest*
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from MacroAssemblerZeroWordsTest
[ RUN ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_8B_vm
--------------------------------------------------------------------------------
udf #0
0x0000400011c56a40: mov x10, #0x6fb0 // #28592
0x0000400011c56a44: movk x10, #0xab05, lsl #16
0x0000400011c56a48: movk x10, #0xaaaa, lsl #32
0x0000400011c56a4c: orr x11, xzr, #0x4
0x0000400011c56a50: subs x8, x11, #0x8
0x0000400011c56a54: b.cc 0x0000400011c56a5c // b.lo, b.ul, b.last
0x0000400011c56a58: bl Stub::Stub Generator zero_blocks_stub
0x0000400011c56a5c: tbz w11, #2, 0x0000400011c56a68
0x0000400011c56a60: stp xzr, xzr, [x10], #16
0x0000400011c56a64: stp xzr, xzr, [x10], #16
0x0000400011c56a68: tbz w11, #1, 0x0000400011c56a70
0x0000400011c56a6c: stp xzr, xzr, [x10], #16
0x0000400011c56a70: tbz w11, #0, 0x0000400011c56a78
0x0000400011c56a74: str xzr, [x10]
--------------------------------------------------------------------------------
Clear 4 words with lower limit 8, zero_words wall time (ns): 289
[ OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_8B_vm (2 ms)
[ RUN ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_256B_vm
--------------------------------------------------------------------------------
udf #0
0x0000400011c57400: mov x10, #0x6fb0 // #28592
0x0000400011c57404: movk x10, #0xab05, lsl #16
0x0000400011c57408: movk x10, #0xaaaa, lsl #32
0x0000400011c5740c: stp xzr, xzr, [x10]
0x0000400011c57410: stp xzr, xzr, [x10, #16]
--------------------------------------------------------------------------------
Clear 4 words with lower limit 256, zero_words wall time (ns): 25
[ OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_32B_with_lowlimit_256B_vm (0 ms)
[ RUN ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_64B_vm
--------------------------------------------------------------------------------
udf #0
0x0000400011c57400: mov x10, #0x6fe0 // #28640
0x0000400011c57404: movk x10, #0xab05, lsl #16
0x0000400011c57408: movk x10, #0xaaaa, lsl #32
0x0000400011c5740c: orr x11, xzr, #0x10
0x0000400011c57410: subs x8, x11, #0x8
0x0000400011c57414: b.cc 0x0000400011c5741c // b.lo, b.ul, b.last
0x0000400011c57418: bl Stub::Stub Generator zero_blocks_stub
0x0000400011c5741c: tbz w11, #2, 0x0000400011c57428
0x0000400011c57420: stp xzr, xzr, [x10], #16
0x0000400011c57424: stp xzr, xzr, [x10], #16
0x0000400011c57428: tbz w11, #1, 0x0000400011c57430
0x0000400011c5742c: stp xzr, xzr, [x10], #16
0x0000400011c57430: tbz w11, #0, 0x0000400011c57438
0x0000400011c57434: str xzr, [x10]
--------------------------------------------------------------------------------
Clear 16 words with lower limit 64, zero_words wall time (ns): 170
[ OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_64B_vm (0 ms)
[ RUN ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_256B_vm
--------------------------------------------------------------------------------
udf #0
0x0000400011c57400: mov x10, #0x6fe0 // #28640
0x0000400011c57404: movk x10, #0xab05, lsl #16
0x0000400011c57408: movk x10, #0xaaaa, lsl #32
0x0000400011c5740c: stp xzr, xzr, [x10]
0x0000400011c57410: stp xzr, xzr, [x10, #16]
0x0000400011c57414: stp xzr, xzr, [x10, #32]
0x0000400011c57418: stp xzr, xzr, [x10, #48]
0x0000400011c5741c: stp xzr, xzr, [x10, #64]
0x0000400011c57420: stp xzr, xzr, [x10, #80]
0x0000400011c57424: stp xzr, xzr, [x10, #96]
0x0000400011c57428: stp xzr, xzr, [x10, #112]
0x0000400011c5742c: add x10, x10, #0x80
--------------------------------------------------------------------------------
Clear 16 words with lower limit 256, zero_words wall time (ns): 107
[ OK ] MacroAssemblerZeroWordsTest.UseBZ_clear_128B_with_lowlimit_256B_vm (0 ms)
[----------] 4 tests from MacroAssemblerZeroWordsTest (110 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (110 ms total)
[ PASSED ] 4 tests.
Finished running test 'gtest:MacroAssemblerZeroWordsTest/server'
Test report is stored in build-pr/test-results/gtest_MacroAssemblerZeroWordsTest_server
==============================
Test summary
==============================
TEST TOTAL PASS FAIL ERROR SKIP
gtest:MacroAssemblerZeroWordsTest/server 4 4 0 0 0
==============================
TEST SUCCESS
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26917#issuecomment-3413812045
More information about the hotspot-dev
mailing list