RFR: JDK-8323497: On x64, use 32-bit immediate moves for narrow klass base if possible [v3]

Quan Anh Mai qamai at openjdk.org
Tue Feb 20 11:27:57 UTC 2024


On Tue, 20 Feb 2024 06:31:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> On x64, we always use the long form of mov immediate to load the klass base into a register. If the klass base fits into 32 bits, we could use the short form and save four instruction bytes. 
>> 
>> Before: mov uses 10 instruction bytes:
>> 
>> 
>>    35  ;; decode_klass_not_null
>>    36   0x00007f8b089e51c4:   movabs $0x82000000,%r11
>>    37   0x00007f8b089e51ce:   add    %r11,%r10
>> 
>> 
>> Now: mov uses 6 instruction bytes:
>> 
>> 
>>    35  ;; decode_klass_not_null
>>    36   0x00007fbe609e51c4:   mov    $0x82000000,%r11d
>>    37   0x00007fbe609e51ca:   add    %r11,%r10
>> 
>> 
>> Note that this optimization does not depend on zero-based addressing, and therefore we change class space reservation: we now always look in low-address regions first.
>> 
>> ----------
>> 
>> Tests: tier1 (GHA), tier 2 on x64 linux
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
> 
>  - Merge branch 'openjdk:master' into use-32bit-immediate-moves-on-x64-for-klass-encoding-base
>  - Merge branch 'openjdk:master' into use-32bit-immediate-moves-on-x64-for-klass-encoding-base
>  - remove obsolete comment
>  - use-32bit-immediate-moves-on-x64-for-klass-encoding-base

I have taken a look and left some comments. Thanks.

src/hotspot/cpu/x86/compressedKlass_x86.cpp line 44:

> 42:       result = reserve_address_space_for_zerobased_encoding(size, aslr);
> 43:     }
> 44:   } else {

This should be:

    if (result == nullptr) {
        // If we cannot use zero-based encoding (when CDS is enabled), optimizing for an
        ...
    }

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5596:

> 5594:   if (CompressedKlassPointers::base() != nullptr) {
> 5595:     // Uses 32-bit mov if base is small enough
> 5596:     movptr(tmp, (intptr_t)CompressedKlassPointers::base());

If we can reserve the base in the low 2G then this could be optimized further to `addq(r, CompressedKlassPointers::base())` and if `LogKlassAlignmentInBytes` is 8 (as asserted in `decode_and_move_klass_not_null` below), we can shorten the whole sequence into `leaq(r, Address(noreg, r, Address::times_8, CompressedKlassPointers::base())`

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5620:

> 5618:     } else {
> 5619:       xorq(dst, dst);
> 5620:     }

This can also be 1 instruction when the base is smaller than $2^{31}$.

-------------

PR Review: https://git.openjdk.org/jdk/pull/17340#pullrequestreview-1890167182
PR Review Comment: https://git.openjdk.org/jdk/pull/17340#discussion_r1495647551
PR Review Comment: https://git.openjdk.org/jdk/pull/17340#discussion_r1495654933
PR Review Comment: https://git.openjdk.org/jdk/pull/17340#discussion_r1495657740


More information about the hotspot-dev mailing list