RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates

Quan Anh Mai qamai at openjdk.org
Fri Nov 3 19:53:06 UTC 2023


On Fri, 3 Nov 2023 16:00:08 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate.
> 
> Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically.
> 
> For example, sample branch profiling hunk from C1 tier3 on x86_64:
> 
> 
> Before:
>    0x00007f269065ed02:   test   %edx,%edx
>    0x00007f269065ed04:   movabs $0x7f260a4ddd68,%rax  ;   {metadata(method data for {method} …
>    0x00007f269065ed0e:   movabs $0x138,%rsi
>  ╭ 0x00007f269065ed18:   je     0x00007f269065ed24
>  │ 0x00007f269065ed1a:   movabs $0x148,%rsi
>  ↘ 0x00007f269065ed24:   mov    (%rax,%rsi,1),%rdi
>    0x00007f269065ed28:   lea    0x1(%rdi),%rdi
>    0x00007f269065ed2c:   mov    %rdi,(%rax,%rsi,1)
>    0x00007f269065ed30:   je     0x00007f269065ed4e     
> 
> After:
>    0x00007f1370dcd782:   test   %edx,%edx
>    0x00007f1370dcd784:   movabs $0x7f12f64ddd68,%rax   ;   {metadata(method data for {method} …
>    0x00007f1370dcd78e:   mov    $0x138,%esi
>  ╭ 0x00007f1370dcd793:   je     0x00007f1370dcd79a        
>  │ 0x00007f1370dcd795:   mov    $0x148,%esi
>  ↘ 0x00007f1370dcd79a:   mov    (%rax,%rsi,1),%rdi        
>    0x00007f1370dcd79e:   lea    0x1(%rdi),%rdi
>    0x00007f1370dcd7a2:   mov    %rdi,(%rax,%rsi,1)       
>    0x00007f1370dcd7a6:   je     0x00007f1370dcd7c4 
> 
> 
> We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes.
> 
> This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`.
> 
> 
> # Before
>  nmethod code size         :   430328 bytes
>  nmethod code size         :   467032 bytes
>  nmethod code size         :   908936 bytes
>  nmethod code size         :  1267816 bytes
> 
> # After
>  nmethod code size         :   429616 bytes (-0.1%)
>  nmethod code size         :   466344 bytes (-0.1%)
>  nmethod code size         :   897144 bytes (-1.3%)
>  nmethod code size         :  1256216 bytes (-0.9%)
> 
> 
> There are two wrinkles:
>   1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this patch work. Note that x86_64 does not actually define `movslq reg64, imm32`, this is a regular `mov...

Can we create `MacroAssembler::mov64` that does the branching instead, I think it is more natural there. And things that need 8-byte immediates will call into `Assembler::mov64`.

Thanks.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2576:

> 2574: #ifdef _LP64
> 2575:   if (is_simm32(src)) {
> 2576:     movslq(dst, checked_cast<int32_t>(src));

Why not just `movq`? there is no `movslq r, i` so this is kind of confusing.

src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1818:

> 1816:   void mov_metadata(Address  dst, Metadata* obj, Register rscratch);
> 1817: 
> 1818:   void mov_ptrslot(Register dst, intptr_t val);

I believe the convention here would be `movptr_imm64`

-------------

PR Review: https://git.openjdk.org/jdk/pull/16497#pullrequestreview-1713429561
PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1382136858
PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1382138403


More information about the hotspot-dev mailing list