RFR: 8362504: AArch64: Replace MOVZ+MOVK+MOVK with ADRP+ADD
Andrew Dinn
adinn at openjdk.org
Tue Aug 19 15:02:41 UTC 2025
On Tue, 19 Aug 2025 10:46:15 GMT, Fei Gao <fgao at openjdk.org> wrote:
>>> Do you think these numbers are trustworthy?
>>
>> Yes, but it's a microarchitecture-dependent optimization, and it's just a single case. I'm seeing virtually identical times on Apple M1 between these:
>>
>>
>> #define ACTION1 \
>> "movz x0, #1234; " \
>> "movk x0, #1234, lsl #16; " \
>> "movk x0, #1234, lsl #32; " \
>> "movz x2, #1234; " \
>> "movk x2, #1234, lsl #16; " \
>> "movk x2, #1234, lsl #32; " \
>> "add x1, x2, x0; " \
>>
>> #define ACTION2 \
>> "adrp x0, . + 20480 * 4096; " \
>> "add x0, x0, #48; " \
>> "adrp x2, . + 20480 * 4096; " \
>> "add x2, x2, #48; " \
>> "add x1, x2, x0; " \
>>
>>
>>
>>
>> 96,642,308 cycles:u # 2.858 GHz
>> 702,095,662 instructions:u # 7.26 insn per cycle
>>
>> 103,939,352 cycles:u # 2.930 GHz
>> 502,095,644 instructions:u # 4.83 insn per cycle
>>
>>
>>
>> All of this stuff is pretty marginal. I can at least accept that `adrp; addp` is shorter therefore better,.
>>
>> But I do not look forward to a blizzard of such changes.
>
>> All of this stuff is pretty marginal. I can at least accept that `adrp; addp` is shorter therefore better,.
>>
>> But I do not look forward to a blizzard of such changes.
>
> @theRealAph That does make sense. Thanks for running the experimental tests — much appreciated!
>
> I've disabled this reachability-based optimization during AOT code dumping in the new commit as suggested by @adinn .
>
> Could you please take a look? Thanks again.
@fg1417 How does this code relate to the far_jump and far_call code? Is there an overlap in functionality here that we need to simplify?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/26653#issuecomment-3201115396
More information about the hotspot-dev
mailing list