RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v4]

Aleksey Shipilev shade at openjdk.org
Tue Nov 14 09:02:42 UTC 2023


On Mon, 13 Nov 2023 09:30:19 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate.
>> 
>> Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically.
>> 
>> For example, sample branch profiling hunk from C1 tier3 on x86_64:
>> 
>> 
>> Before:
>>    0x00007f269065ed02:   test   %edx,%edx
>>    0x00007f269065ed04:   movabs $0x7f260a4ddd68,%rax  ;   {metadata(method data for {method} …
>>    0x00007f269065ed0e:   movabs $0x138,%rsi
>>  ╭ 0x00007f269065ed18:   je     0x00007f269065ed24
>>  │ 0x00007f269065ed1a:   movabs $0x148,%rsi
>>  ↘ 0x00007f269065ed24:   mov    (%rax,%rsi,1),%rdi
>>    0x00007f269065ed28:   lea    0x1(%rdi),%rdi
>>    0x00007f269065ed2c:   mov    %rdi,(%rax,%rsi,1)
>>    0x00007f269065ed30:   je     0x00007f269065ed4e     
>> 
>> After:
>>    0x00007f1370dcd782:   test   %edx,%edx
>>    0x00007f1370dcd784:   movabs $0x7f12f64ddd68,%rax   ;   {metadata(method data for {method} …
>>    0x00007f1370dcd78e:   mov    $0x138,%esi
>>  ╭ 0x00007f1370dcd793:   je     0x00007f1370dcd79a        
>>  │ 0x00007f1370dcd795:   mov    $0x148,%esi
>>  ↘ 0x00007f1370dcd79a:   mov    (%rax,%rsi,1),%rdi        
>>    0x00007f1370dcd79e:   lea    0x1(%rdi),%rdi
>>    0x00007f1370dcd7a2:   mov    %rdi,(%rax,%rsi,1)       
>>    0x00007f1370dcd7a6:   je     0x00007f1370dcd7c4 
>> 
>> 
>> We can use shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes, look around `movabs` -> `mov` changes. But this is not limited to the profiling code. There are nearly 1% code space savings on larger tests in C2. For example, on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`:
>> 
>> 
>> # Before
>>  nmethod code size         :   430328 bytes
>>  nmethod code size         :   467032 bytes
>>  nmethod code size         :   908936 bytes
>>  nmethod code size         :  1267816 bytes
>> 
>> # After
>>  nmethod code size         :   429616 bytes (-0.1%)
>>  nmethod code size         :   466344 bytes (-0.1%)
>>  nmethod code size         :   897144 bytes (-1.3%)
>>  nmethod code size         :  1256216 bytes (-0.9%)
>> 
>> 
>> There are two wrinkles:
>>   1. Current `movslq(Register, int32_t)` is broken and protected by `Sh...
>
> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
> 
>  - Remove the movslq declaration as well
>  - Merge branch 'master' into JDK-8319406-shorter-movptr-32
>  - Enlighs
>  - Remove new imm64 method completely, inline at use
>  - Easy review feedback
>  - Merge branch 'master' into JDK-8319406-shorter-movptr-32
>  - Fix

Thanks! Testing passes, so I am integrating.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16497#issuecomment-1809790048


More information about the hotspot-dev mailing list