RFR: 8343789: Move mutable nmethod data out of CodeCache [v9]

Mon Feb 3 11:44:56 UTC 2025

On Fri, 24 Jan 2025 20:37:32 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

>> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache.
>> 
>> The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density.
>> 
>> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1–2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark.
>> 
>> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark):
>> - nmethod_count:134000, total_compilation_time: 510460ms
>> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms,
>> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB
>> 
>> Functional testing: jtreg on arm/aarch/x86.
>> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks.
>> 
>> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data.
>
> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Force the use of movk in combination with adrp and ldr instructions to address scenarios
>   where os::malloc allocates buffers beyond the typical ±4GB range accessible with adrp

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1422:

> 1420:     bool force_movk = true; // movk is important if the target can be more than 4GB away
> 1421:     adrp(dest, const_addr, offset, force_movk);
> 1422:     ldr(dest, Address(dest, offset));

I wonder if this really is the best way to do it. It's not clear to me that there is any advantage of using `adrp` in this case rather than a simple `mov(scratch, const_adr); ldr(dest, Address(scratch);`. The `mov` would produce `movz; movk; movk` which almost certainly execute in a single cycle, then a load without an offset, which is a single micro-op rather than two micro-ops for load+offset. All we've gained for this complication is a small reduction in code density rather than a performance improvement. I'd go with simplicity.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1939240122