RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates

Aleksey Shipilev shade at openjdk.org
Fri Nov 3 18:35:23 UTC 2023


Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate.

Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically.

For example, sample branch profiling hunk from C1 tier3 on x86_64:


Before:
   0x00007f269065ed02:   test   %edx,%edx
   0x00007f269065ed04:   movabs $0x7f260a4ddd68,%rax  ;   {metadata(method data for {method} …
   0x00007f269065ed0e:   movabs $0x138,%rsi
 ╭ 0x00007f269065ed18:   je     0x00007f269065ed24
 │ 0x00007f269065ed1a:   movabs $0x148,%rsi
 ↘ 0x00007f269065ed24:   mov    (%rax,%rsi,1),%rdi
   0x00007f269065ed28:   lea    0x1(%rdi),%rdi
   0x00007f269065ed2c:   mov    %rdi,(%rax,%rsi,1)
   0x00007f269065ed30:   je     0x00007f269065ed4e     

After:
   0x00007f1370dcd782:   test   %edx,%edx
   0x00007f1370dcd784:   movabs $0x7f12f64ddd68,%rax   ;   {metadata(method data for {method} …
   0x00007f1370dcd78e:   mov    $0x138,%esi
 ╭ 0x00007f1370dcd793:   je     0x00007f1370dcd79a        
 │ 0x00007f1370dcd795:   mov    $0x148,%esi
 ↘ 0x00007f1370dcd79a:   mov    (%rax,%rsi,1),%rdi        
   0x00007f1370dcd79e:   lea    0x1(%rdi),%rdi
   0x00007f1370dcd7a2:   mov    %rdi,(%rax,%rsi,1)       
   0x00007f1370dcd7a6:   je     0x00007f1370dcd7c4 


We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes.

This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`.


# Before
 nmethod code size         :   430328 bytes
 nmethod code size         :   467032 bytes
 nmethod code size         :   908936 bytes
 nmethod code size         :  1267816 bytes

# After
 nmethod code size         :   429616 bytes (-0.1%)
 nmethod code size         :   466344 bytes (-0.1%)
 nmethod code size         :   897144 bytes (-1.3%)
 nmethod code size         :  1256216 bytes (-0.9%)


There are two wrinkles:
  1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this patch work. Note that x86_64 does not actually define `movslq reg64, imm32`, this is a regular `mov reg64, imm32`. It matches our current `movq(Register, int32_t)`.
  2. There is at least one place in Hotspot -- IC calls -- that expects the synthetic `movptr` to always have the same length, because it would be used as IC slot. I had to introduce a special method in `MacroAssembler` to handle it. I looked through other uses of `movptr(Register, intptr_t)`, and no other are suspicious. (I don't quite like the name "mov_ptrslot" all that much, suggestions welcome.)

Additional testing:
 - [ ] Linux x86_64 server fastdebug, `tier1 tier2 tier3 tier4`

-------------

Commit messages:
 - Fix

Changes: https://git.openjdk.org/jdk/pull/16497/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8319406
  Stats: 26 lines in 3 files changed: 19 ins; 3 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/16497.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16497/head:pull/16497

PR: https://git.openjdk.org/jdk/pull/16497


More information about the hotspot-dev mailing list