RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates
Aleksey Shipilev
shade at openjdk.org
Tue Nov 7 12:38:35 UTC 2023
On Fri, 3 Nov 2023 19:46:26 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:
>> Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate.
>>
>> Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically.
>>
>> For example, sample branch profiling hunk from C1 tier3 on x86_64:
>>
>>
>> Before:
>> 0x00007f269065ed02: test %edx,%edx
>> 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} …
>> 0x00007f269065ed0e: movabs $0x138,%rsi
>> ╭ 0x00007f269065ed18: je 0x00007f269065ed24
>> │ 0x00007f269065ed1a: movabs $0x148,%rsi
>> ↘ 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi
>> 0x00007f269065ed28: lea 0x1(%rdi),%rdi
>> 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1)
>> 0x00007f269065ed30: je 0x00007f269065ed4e
>>
>> After:
>> 0x00007f1370dcd782: test %edx,%edx
>> 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} …
>> 0x00007f1370dcd78e: mov $0x138,%esi
>> ╭ 0x00007f1370dcd793: je 0x00007f1370dcd79a
>> │ 0x00007f1370dcd795: mov $0x148,%esi
>> ↘ 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi
>> 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi
>> 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1)
>> 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4
>>
>>
>> We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes.
>>
>> This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`.
>>
>>
>> # Before
>> nmethod code size : 430328 bytes
>> nmethod code size : 467032 bytes
>> nmethod code size : 908936 bytes
>> nmethod code size : 1267816 bytes
>>
>> # After
>> nmethod code size : 429616 bytes (-0.1%)
>> nmethod code size : 466344 bytes (-0.1%)
>> nmethod code size : 897144 bytes (-1.3%)
>> nmethod code size : 1256216 bytes (-0.9%)
>>
>>
>> There are two wrinkles:
>> 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this...
>
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2576:
>
>> 2574: #ifdef _LP64
>> 2575: if (is_simm32(src)) {
>> 2576: movslq(dst, checked_cast<int32_t>(src));
>
> Why not just `movq`? there is no `movslq r, i` so this is kind of confusing.
Right. There is no point in trying to fix `movslq`. Replaced with `movq`.
> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1818:
>
>> 1816: void mov_metadata(Address dst, Metadata* obj, Register rscratch);
>> 1817:
>> 1818: void mov_ptrslot(Register dst, intptr_t val);
>
> I believe the convention here would be `movptr_imm64`
Good name, renamed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1384844360
PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1384845944
More information about the hotspot-dev
mailing list