RFR: 8316694: Implement relocation of nmethod within CodeCache [v7]

Fri Apr 25 20:56:51 UTC 2025

On Fri, 11 Apr 2025 14:43:00 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>> I have only skimmed through what you are doing but what I have read makes me worried from a GC point of view. In general, I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles.
>> It might be that some of my concerns are false because this is more of a drive by review to sanity check if you thought about the GC implications. These are just random things on top of my head.
>> 1) You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up.
>> 2) Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up.
>> 3) I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values.
>> 4) I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it?
>> 5) You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed
>> 6) Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning.
>> 7) By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive...
>
>> Hi @fisk,
>> 
>> Thank you for the very valuable comment. It has point we have not thought about.
>> 
>> > I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles.
>> 
>> It's not clear to me what you mean "special nmethods". IMO we don't introduce any special nmethods. From my point of view, a normal nmethod is an nmethod for a ordinary Java method. Nmethods for non-ordinary Java methods are special, e.g. native nmethods or method handle linkers(JDK-8263377). I think normal nmethods should be relocatable within CodeCache.
> 
> I mean nmethods with a subtly different life cycle where usual invariants/expectations don't hold. Like method handle intrinsics and enter special intrinsics for example. Used to have a different life cycle for OSR nmethods too.
> 
>> > You can't just copy oops.
>> 
>> Yes, this is the main issue at the moment. Can we do this at a safepoint?
> 
> I don't think it solves much. You can't stash away a pointer to the nmethod, roll to a safepoint, and expect the nmethod to not be freed. Even if you did, you still can't copy the oops.
> 
> If we are to do this, I think you want to apply nmethod entry barriers first. That stabilizes the oops.
> 
>> > I'm worried about copying the nmethod epoch counters
>> 
>> We should clear them. If not, it is a bug.
> 
> I'd like to change copying from opt-out to opt-in instead; that would make me feel more comfortable. Then perhaps you can share initialization code that sets up the initial state of the nmethod exactly in the same way as normal nmethods.
> 
> I didn't check but you need to take the Compile_lock and verify dependencies too if you didn't do that, I think, so you don't race with deoptimization.
> 
>> > You don't check if the nmethod is_unloading() when cloning it.
>> 
>> Should such nmethods be not entrant? We don't relocate not entrant nmethods.
> 
> is_not_entrant doesn't imply is_unloading.
> 
>> > What are the consequences of copying the deoptimization generation?
>> 
>> What do you mean?
> 
> I mean is it safe to racingly copy the deoptmization generation when there is concurrent deoptimization? This is why I'd prefer copying to be opt-in rather than opt-out so we don't have to stare at every single field and wonder what will happen when a new nmethod "inherits" state from a different nmethod in interesting races. I want it to work as much as possible as normal nmethod installation, starting with a state as close as possible to when the original nmethod was created, as opp...

@fisk Thank you for the valuable feedback. Here is a more detailed response to the concerns you brought up

> 1 You can't just copy oops. Propagating stale pointers to a new nmethod is not valid and will make the GC vomit. The GC assumes that it can traverse a snapshot of nmethods, and that new nmethods created after that snapshot, will have sane valid oops initially, and hence do not need fixing. Copying stale oops to a new nmethod would violate those invariants and inevitably blow up.

Instead of tracking the nmethod pointer which could become stale I updated the code to use method handles. I believe the method handle should ensure the method remains valid and we can then relocate its corresponding nmethod. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.cpp#L106-L110)

> 2 Class redefinition tracks in an external data structure which nmethods contained metadata that we want to eventually throw away. This is done to avoid walking the entire code cache just to keep tabs on the one nmethod that still uses the old metadata. If we clone the nmethod without putting it in said data structure, we will blow up.

The relocated nmethod is added as a dependent nmethod on all of the MethodHandles and InstranceKlass in its dependency scope. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1543-L1564)

> 3 I'm worried about the initial state of the nmethod entry barrier guard value being copied from the source nmethod, instead of having the initial value we expect for newly created nmethods. It means that the initial invocation will not get the nmethod entry barrier callback. The GC traverses the nmethods assuming that new nmethods created during the traversal will not start off with weird stale values.

The source nmethod entry barrier is now called before copying. I believe this will disarm the barrier and reset the guard value for it to be safe to copy. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1530)

> 4 I'm worried about copying the nmethod epoch counters used by virtual threads to mark which nmethods have been found on-stack. Copying it implies that this nmethod has been found on-stack even though it never has. To me, the implications are unknown, but perhaps you thought about it?

Copying this value was not intentional. It should be correctly set to the default value now. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1441)

> 5 You don't check if the nmethod is_unloading() when cloning it. That means you can create a new nmethod that has dead oops from the get go - that cannot be allowed

I added this check to ensure the nmethod is not unloading and removed the not entrant check as is unloading implies not entrant. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1583-L1585)

> 6 Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning.

I’m still investigating the JVMCI speculation data and how the nmethod mirror is used. I will follow up when I have a clearer understanding.

> 7 By running the operation in a safepoint you a) introduce an obvious latency problem, b) create a new source for stale nmethod pointers that will become stale and burn. The _nm of the safepoint operation might not survive a safepoint. For example, if a GC safepoint runs first, the GC might decide to unload the nmethod. It then traverses all known pointers to stale nmethods, and cleans them up so that nobody is referring to the nmethod any longer. Naturally, the GC won't know that there is a stale _nm pointer embedded into your VM operation. When you start messing around with it you enter a use-after-free situation and we will blow up.

a) Due to the nature of oops it seems a safe point is necessary. I do not see a fix to the latency problem. 
b) For the stale nmethod pointer issues I updated to use methodHandle. Same reasoning as number 1

> 8 What are the consequences of copying the deoptimization generation? I don't know!

This was unintentional and the value is no longer copied. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1440)

> 9 Sometimes the method() is null when using Truffle.

I added null check before updating nmethod reference to help avoid this. As mentioned earlier I do not have much knowledge around Truffle/JVMCI so I will follow up on this when I have a better understanding.

> 10 Since you don't hold the Compile_lock across the safepoint, it's not obvious to me that you can't get a not_installed nmethod. Can you? I don't know what the consequences are of cloning one of those. The target nmethod will start off as not_installed, but I don't know that it will be made in_use.

I updated the code to hold the Compile_lock to ensure we do not relocate nmethods during construction. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.hpp#L93-L98)

> 11 These new special nmethods call post_init after installing the nmethod in the Method, while normally the order is reversed. While this may or may not be okay, it introduces a new anomaly where new special nmethods are being special

I moved the post_init call to be more inline with the other constructors. Creating a “special” nmethod was not the intention and I agree the relocation should follow the normal creation where possible. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1522)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2831412968