RFR: 8343789: Move mutable nmethod data out of CodeCache [v13]
Boris Ulasevich
bulasevich at openjdk.org
Mon Mar 3 20:11:59 UTC 2025
On Thu, 27 Feb 2025 14:31:31 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:
>> This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache.
>>
>> OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data).
>>
>> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1–2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark.
>>
>> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark):
>> - nmethod_count:134000, total_compilation_time: 510460ms
>> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms,
>> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB
>>
>> Functional testing: jtreg on arm/aarch/x86.
>> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks.
>>
>> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data.
>
> Boris Ulasevich has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits:
>
> - cleanup
> - returning oops back to nmethods. jtreg: Ok, performance: Ok. todo: cleanup
> - Address review comments: cleanup, move fields to avoid padding, fix CodeBlob purge to call os::free, fix nmethod::print, update Layout description
> - add a separate adrp_movk function to to support targets located more than 4GB away
> - Force the use of movk in combination with adrp and ldr instructions to address scenarios
> where os::malloc allocates buffers beyond the typical ±4GB range accessible with adrp
> - Fixing TestFindInstMemRecursion test fail with XX:+StressReflectiveCode option:
> _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16.
> Fix: use _oops_size int16 field to calculate metadata offset
> - removing dead code
> - a bit of cleanup and addressing review suggestions
> - rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup
> - remove _code_end_offset
> - ... and 4 more: https://git.openjdk.org/jdk/compare/3c9d64eb...56c0cc78
As agreed, I moved oops back to nmethod, significantly reducing the change. All AArch-specific modifications (encoding of long load with adrp+movk+ldr and its relocation patching) were reverted.
Testing results:
- Builds: AArch & x86, client build, GraalVM build
- jtreg (hotspot & jdk tier1-3): G1/ZGC/Shenandoah/Xcomp/TieredStopAtLevel=3/-TieredCompilation - No regressions
- DaCapo & Renaissance benchmarks - No regressions
Here is the PrintNMethodStatistics printout. It shows that in the application (a large Renaissance Dotty benchmark), we observe a significant reduction in CodeCache usage.
Statistics for 20625 bytecoded nmethods for C1:
total size = 121587728 (100%)
in CodeCache = 80406760 (66.130653%)
header = 4950000 (6.156199%)
constants = 640 (0.000796%)
main code = 69890600 (86.921303%)
stub code = 4923768 (6.123575%)
oops = 476752 (0.592925%)
mutable data = 10163920 (8.359330%)
relocation = 6810824 (67.009811%)
metadata = 3353096 (32.990185%)
immutable data = 31017048 (25.510014%)
dependencies = 606216 (1.954461%)
nul chk table = 724344 (2.335309%)
handler table = 222464 (0.717231%)
scopes pcs = 15817888 (50.997398%)
scopes data = 13646136 (43.995602%)
Statistics for 8290 bytecoded nmethods for C2 | Statistics for 8442 bytecoded nmethods for JVMCI
total size = 66679688 (100%) | total size = 46208136 (100%)
in CodeCache = 26004920 (38.999763%) | in CodeCache = 19489616 (42.177887%)
header = 1989600 (7.650860%) | header = 2026080 (10.395690%)
constants = 1920 (0.007383%) | constants = 540288 (2.772184%)
main code = 20949456 (80.559586%) | main code = 14737620 (75.617805%)
stub code = 2702064 (10.390588%) | stub code = 1904548 (9.772117%)
oops = 295560 (1.136554%) | oops = 213544 (1.095681%)
mutable data = 6564928 (9.845469%) | mutable data = 4168848 (9.021892%)
relocation = 3542736 (53.964584%) | relocation = 1671384 (40.092228%)
> JVMCI data = 202608 (4.860048%)
metadata = 3022192 (46.035416%) | metadata = 2294856 (55.047726%)
immutable data = 34109840 (51.154766%) | immutable data = 22549672 (48.800220%)
dependencies = 988000 (2.896525%) | dependencies = 460104 (2.040402%)
nul chk table = 554680 (1.626158%) | nul chk table = 618888 (2.744554%)
handler table = 1787424 (5.240201%) | handler table = 20664 (0.091638%)
scopes pcs = 16152224 (47.353561%) | scopes pcs = 10965040 (48.626163%)
scopes data = 14627512 (42.883556%) | scopes data = 7746888 (34.354771%)
> speculations = 2738088 (12.142474%)
By moving mutable data out of the CodeCache, we reduce CodeCache usage by the following percentages:
- C1: 10163920/(10163920+80406760) = 11%
- C2: 6564928/(6564928+26004920) = 20%
- JMVTI: 4168848/(4168848+19489616) = 18%
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2695425491
PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2695429860
More information about the hotspot-compiler-dev
mailing list