RFR: 8362504: AArch64: Replace MOVZ+MOVK+MOVK with ADRP+ADD

Andrew Dinn adinn at openjdk.org
Tue Aug 19 15:02:41 UTC 2025


On Tue, 19 Aug 2025 10:46:15 GMT, Fei Gao <fgao at openjdk.org> wrote:

>>> Do you think these numbers are trustworthy?
>> 
>> Yes, but it's a microarchitecture-dependent optimization, and it's just a single case. I'm seeing virtually identical times on Apple M1 between these:
>> 
>> 
>> #define ACTION1                                 \
>>     "movz  x0, #1234; "                         \
>>     "movk  x0, #1234, lsl #16; "                \
>>     "movk  x0, #1234, lsl #32; "                \
>>     "movz  x2, #1234; "                         \
>>     "movk  x2, #1234, lsl #16; "                \
>>     "movk  x2, #1234, lsl #32; "                \
>>     "add   x1, x2, x0; "                        \
>> 
>> #define ACTION2                                 \
>>     "adrp x0, . + 20480 * 4096; "               \
>>     "add x0, x0, #48; "                         \
>>     "adrp x2, . + 20480 * 4096; "               \
>>     "add x2, x2, #48; "                         \
>>     "add   x1, x2, x0; "                        \
>> 
>> 
>> 
>> 
>>         96,642,308      cycles:u                         #    2.858 GHz                       
>>        702,095,662      instructions:u                   #    7.26  insn per cycle            
>>  
>>        103,939,352      cycles:u                         #    2.930 GHz                       
>>        502,095,644      instructions:u                   #    4.83  insn per cycle            
>> 
>> 
>> 
>> All of this stuff is pretty marginal. I can at least accept that `adrp; addp` is shorter therefore better,.
>> 
>> But I do not look forward to a blizzard of such changes.
>
>> All of this stuff is pretty marginal. I can at least accept that `adrp; addp` is shorter therefore better,.
>> 
>> But I do not look forward to a blizzard of such changes.
> 
> @theRealAph That does make sense. Thanks for running the experimental tests — much appreciated!
> 
> I've disabled this reachability-based optimization during AOT code dumping in the new commit as suggested by @adinn .
> 
> Could you please take a look? Thanks again.

@fg1417 How does this code relate to the far_jump and far_call code? Is there an overlap in functionality here that we need to simplify?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26653#issuecomment-3201115396


More information about the hotspot-dev mailing list