RFR: 8362504: AArch64: Replace MOVZ+MOVK+MOVK with ADRP+ADD

Wed Aug 6 10:20:02 UTC 2025

On Wed, 6 Aug 2025 09:13:30 GMT, Fei Gao <fgao at openjdk.org> wrote:

> If the relocation or target address is guaranteed to reside within the CodeCache, we can safely replace a `movz + movk + movk` sequence with a more compact and efficient `adrp + add` instruction pair.
> 
> In `MacroAssembler::mov(Register r, Address dest)`, this replacement can be applied if any of the following rules hold:
> 
> 1. The relocation type indicates that the address resides within the CodeCache and the necessary patching logic is provided in `fix_relocation_after_move()`.
> 2. The target address is fixed (i.e., does not require relocation) and is within the reachable range for `adrp`.
> 
> The patch performs the filtering in `is_relocated_within_codecache()` and `is_adrp_reachable()` to ensure this optimization is applied safely and selectively.

I agree with Andrew that this may well be slower even if it does save on code size. So, the trade-off being made here is not clear-cut.

I also think we need to consider and allow for save and restore of adapter, stub and, in upcoming releases, nmethod code using the AOT cache. Both code cache to code cache (runtime) calls and code cache to foreign (external) calls may require a one-off offset adjustment during AOT code loading. This can happen because 1) code generated at some given address and saved from the Assembly VM may be restored in a Production VM at a different address and 2) references from generated code to a method/function address in some given external C library loaded in the Assembly VM may need adjustment to allow for the library and associated function/method being loaded at a different address.

This reloc at reload should not be an issue for generated calls to the runtime since wherever the saved code is reloaded the call offset will still be less than the cache range and that will be less than 2GB. So, we can always patch an 'adrp + add' with another 'adrp + add' However, it is significant for foreign calls where restoration at a different place in the cache and/or randomization of library load addresses implies that a positive reachability decision made at Assembly time might no longer hold in a Production run. In that case an 'adrp + add' pair would not be able to be simply patched to a 'movz + movk + movk' triple.

The solution is to make the compiler always generate the 3-instruction load when compiling in an Assembly VM and otherwise generate the 2-instruction load based on reachability i.e. AOT code won't 'benefit' from this patch but runtime generated code will (assuming it is a benefit).

So, the reachability method needs to return false if `AOTCodeCache::is_on_for_dump() returns true.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26653#issuecomment-3159151855