RFR: JDK-8323497: On x64, use 32-bit immediate moves for narrow klass base if possible [v3]
Quan Anh Mai
qamai at openjdk.org
Tue Feb 20 11:27:57 UTC 2024
On Tue, 20 Feb 2024 06:31:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> On x64, we always use the long form of mov immediate to load the klass base into a register. If the klass base fits into 32 bits, we could use the short form and save four instruction bytes.
>>
>> Before: mov uses 10 instruction bytes:
>>
>>
>> 35 ;; decode_klass_not_null
>> 36 0x00007f8b089e51c4: movabs $0x82000000,%r11
>> 37 0x00007f8b089e51ce: add %r11,%r10
>>
>>
>> Now: mov uses 6 instruction bytes:
>>
>>
>> 35 ;; decode_klass_not_null
>> 36 0x00007fbe609e51c4: mov $0x82000000,%r11d
>> 37 0x00007fbe609e51ca: add %r11,%r10
>>
>>
>> Note that this optimization does not depend on zero-based addressing, and therefore we change class space reservation: we now always look in low-address regions first.
>>
>> ----------
>>
>> Tests: tier1 (GHA), tier 2 on x64 linux
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - Merge branch 'openjdk:master' into use-32bit-immediate-moves-on-x64-for-klass-encoding-base
> - Merge branch 'openjdk:master' into use-32bit-immediate-moves-on-x64-for-klass-encoding-base
> - remove obsolete comment
> - use-32bit-immediate-moves-on-x64-for-klass-encoding-base
I have taken a look and left some comments. Thanks.
src/hotspot/cpu/x86/compressedKlass_x86.cpp line 44:
> 42: result = reserve_address_space_for_zerobased_encoding(size, aslr);
> 43: }
> 44: } else {
This should be:
if (result == nullptr) {
// If we cannot use zero-based encoding (when CDS is enabled), optimizing for an
...
}
src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5596:
> 5594: if (CompressedKlassPointers::base() != nullptr) {
> 5595: // Uses 32-bit mov if base is small enough
> 5596: movptr(tmp, (intptr_t)CompressedKlassPointers::base());
If we can reserve the base in the low 2G then this could be optimized further to `addq(r, CompressedKlassPointers::base())` and if `LogKlassAlignmentInBytes` is 8 (as asserted in `decode_and_move_klass_not_null` below), we can shorten the whole sequence into `leaq(r, Address(noreg, r, Address::times_8, CompressedKlassPointers::base())`
src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5620:
> 5618: } else {
> 5619: xorq(dst, dst);
> 5620: }
This can also be 1 instruction when the base is smaller than $2^{31}$.
-------------
PR Review: https://git.openjdk.org/jdk/pull/17340#pullrequestreview-1890167182
PR Review Comment: https://git.openjdk.org/jdk/pull/17340#discussion_r1495647551
PR Review Comment: https://git.openjdk.org/jdk/pull/17340#discussion_r1495654933
PR Review Comment: https://git.openjdk.org/jdk/pull/17340#discussion_r1495657740
More information about the hotspot-dev
mailing list