RFR: 8316694: Implement relocation of nmethod within CodeCache [v7]

Wed Mar 26 13:06:26 UTC 2025

On Mon, 24 Mar 2025 23:00:36 GMT, Chad Rakoczy <duke at openjdk.org> wrote:

>> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186).
>> 
>> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation.
>> 
>> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release.
>
> Chad Rakoczy has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Relocate nmethod at safepoint
>  - Fix windows build

I have only skimmed through what you are doing but what I have read makes me worried from a GC point of view. In general, I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles.
It might be that some of my concerns are false because this is more of a drive by review to sanity check if you thought about the GC implications. These are just random things on top of my head.
1) You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up.
2) Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up.
3) I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values.
4) I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it?
5) You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed
6) Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning.
7) By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive a safepoint. For example, if a GC safepoint runs first, the GC might decide to unload the nmethod. It then traverses all known pointers to stale nmethods, and cleans them up so that nobody is referring to the nmethod any longer. Naturally, the GC won't know that there is a stale _nm pointer embedded into your VM operation. When you start messing around with it you enter a use-after-free situation and we will blow up.
8) What are the consequences of copying the deoptimization generation? I don't know!
9) Sometimes the method() is null when using Truffle.
10) Since you don't hold the Compile_lock across the safepoint, it's not obvious to me that you can't get a not_installed nmethod. Can you? I don't know what the consequences are of cloning one of those. The target nmethod will start off as not_installed, but I don't know that it will be made in_use.
11) These new special nmethods call post_init after installing the nmethod in the Method, while normally the order is reversed. While this may or may not be okay, it introduces a new anomaly where new special nmethods are being special.

In general, every time that we have ever introduced "special nmethods" that work in different ways to "normal" nmethods, it has been a huge pain. With this approach to nmethod relocation, every time somebody adds a stateful field to nmethod, one will have to think very carefully about the impact on this cloning, and how that can end up affecting class redefinition, GC, etc. I really don't think we want this extra mental overhead, unless the motivation is exceptionally good.

-------------

Changes requested by eosterlund (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-2717095140