<div dir="ltr"><div dir="ltr">On Tue, Jun 20, 2023 at 4:38 AM David Holmes <<a href="mailto:david.holmes@oracle.com">david.holmes@oracle.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 19/06/2023 11:46 pm, Jaroslav Bachorík wrote:<br>
> Hi,<br>
> <br>
> On Thu, Jun 1, 2023 at 5:49 PM <<a href="mailto:daniel.daugherty@oracle.com" target="_blank">daniel.daugherty@oracle.com</a> <br>
> <mailto:<a href="mailto:daniel.daugherty@oracle.com" target="_blank">daniel.daugherty@oracle.com</a>>> wrote:<br>
> <br>
> Hmmm... seems like concurrent class unloading has revealed a situation<br>
> in some<br>
> code where class refs should have been held, but were not held because<br>
> it's very<br>
> difficult to do, and the code (mostly) got away with it for a long<br>
> time...<br>
> <br>
> David's quote from the JNI spec makes things very, very clear:<br>
> <br>
> > "A field or method ID does not prevent the VM from unloading the<br>
> class from<br>
> > which the ID has been derived. After the class is unloaded, the<br>
> method or<br>
> > field ID becomes invalid and may not be passed to any function<br>
> taking<br>
> such<br>
> > an ID. The native code, therefore, must make sure to:<br>
> > - keep a live reference to the underlying class, or<br>
> > - recompute the method or field ID<br>
> ><br>
> > if it intends to use a method or field ID for an extended period of<br>
> time."<br>
> <br>
> and the JVM/TI spec does depend on the JNI spec for certain pieces<br>
> like the<br>
> semantics of jmethodID.<br>
> <br>
> I agree with David that fixing this is going to be difficult and a major<br>
> effort which is why we've all only chipped at this boulder before...<br>
> <br>
> <br>
> I apologize for being a stickler, but it seems that the concurrent class <br>
> unloading and the ephemeral nature of jmethodIDs could cause problems <br>
> even for a much less exotic usecase of the JVMTI GetStackTrace <br>
> (<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace</a>>).<br>
> IIUC, the stacktrace consists of stack frames and each stack frame <br>
> contains a jmethodID identifying the associated Java method. Further on, <br>
> obtaining the stacktrace will not create references to corresponding <br>
> classes so the jmethodIDs can go 'bad' at any given moment, without an <br>
> official way to make sure such a 'bad' jmethodID would not crash JVM.<br>
<br>
Yep all correct. The jvmtiFrameInfo contains the jmethodID as applicable <br>
when the stacktrace was taken. If the current scope is not ensuring <br>
liveness then those method ids could be invalidated at any time.<br>
<br>
I agree this is a hole in the JVM TI specification, but it is not a hole <br>
I know how to patch. Additional API are needed here along with changes <br>
to the validity requirements of jMethodIds.<br></blockquote><div><br></div><div>Thanks! Yes, this makes sense. I just wanted to make sure my perception was correct.</div><div><br></div><div>Cheers,</div><div><br></div><div>-JB-</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Cheers,<br>
David<br>
-----<br>
<br>
> Cheers,<br>
> <br>
> -JB-<br>
> <br>
> <br>
> Dan<br>
> <br>
> <br>
> On 6/1/23 4:28 AM, David Holmes wrote:<br>
> > On 1/06/2023 5:16 pm, Jaroslav Bachorík wrote:<br>
> >> Hi David,<br>
> >><br>
> >> On Thu, Jun 1, 2023 at 3:56 AM David Holmes<br>
> <<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a> <mailto:<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a>><br>
> >> <mailto:<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a><br>
> <mailto:<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a>>>> wrote:<br>
> >><br>
> >> Hi Jaroslav,<br>
> >><br>
> >> On 31/05/2023 9:12 pm, Jaroslav Bachorík wrote:<br>
> >> > Dear Team,<br>
> >> ><br>
> >> > I've been investigating the unusual JVM crashes occurring in<br>
> >> JVMTI calls<br>
> >> > on a J9 JVM. During my investigation, I scrutinized the<br>
> >> `jmethodID`<br>
> >> > definition closely, available here: [jmethodID<br>
> >> ><br>
> >><br>
> definition](<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>><br>
> >><br>
> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>>><br>
> >><br>
> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>><br>
> >><br>
> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>>>>).<br>
> >><br>
> >> ><br>
> >> > To paraphrase, the definition suggests that `jmethodID`<br>
> >> identifies a<br>
> >> > Java method, initializer, or constructor. These<br>
> identifiers, once<br>
> >> > returned by JVM TI functions and events, can be safely<br>
> stored.<br>
> >> However,<br>
> >> > when the class is unloaded, they become invalid,<br>
> rendering them<br>
> >> > unsuitable for use.<br>
> >> ><br>
> >> > My interpretation is that the JVMTI user should verify the<br>
> >> validity of a<br>
> >> > `jmethodID` value before using it to prevent potential<br>
> crashes.<br>
> >> Would<br>
> >> > you agree with this interpretation?<br>
> >><br>
> >> Not quite - as you note you can't verify the jmethodID<br>
> validity.<br>
> >> What<br>
> >> the user needs to do, in line with what Dan was saying, is<br>
> ensure<br>
> >> that<br>
> >> they keep track of the classes to which the methods belong<br>
> and keep<br>
> >> them<br>
> >> alive if necessary. Now that may be easier said than done, but<br>
> >> that is<br>
> >> the gist of it. This comes from the JNI spec:<br>
> >><br>
> >> "A field or method ID does not prevent the VM from unloading<br>
> the<br>
> >> class<br>
> >> from which the ID has been derived. After the class is<br>
> unloaded, the<br>
> >> method or field ID becomes invalid and may not be passed to any<br>
> >> function<br>
> >> taking such an ID. The native code, therefore, must make<br>
> sure to:<br>
> >><br>
> >> keep a live reference to the underlying class, or<br>
> >> recompute the method or field ID<br>
> >><br>
> >> if it intends to use a method or field ID for an extended<br>
> period of<br>
> >> time."<br>
> >><br>
> >> > This sounds like a sensible requirement, but its practical<br>
> >> application<br>
> >> > remains unclear. As far as I know, methods can be unloaded<br>
> >> concurrently<br>
> >> > to the native code executing JVMTI functions. This<br>
> introduces a<br>
> >> > potential race condition where the JVM unloads the<br>
> methods during<br>
> >> the<br>
> >> > check->use flow, making it only a partial solution. To<br>
> complicate<br>
> >> > matters further, no method exists to confirm whether a<br>
> >> `jmethodID` is valid.<br>
> >> ><br>
> >> > Theoretically, we could monitor the `CompiledMethodUnload`<br>
> >> event to<br>
> >> > track the validity state, creating a constantly expanding<br>
> set of<br>
> >> > unloaded `jmethodID` values or a bloom filter, if one<br>
> does not<br>
> >> care<br>
> >> > about few potential false positives. This strategy, however,<br>
> >> doesn't<br>
> >> > address the potential race condition, and it could even<br>
> >> exacerbate it<br>
> >> > due to possible event delays. This delay might mistakenly<br>
> >> validate a<br>
> >> > `jmethodID` value that has already been unloaded, but for<br>
> >> which the<br>
> >> > event hasn't been delivered yet.<br>
> >> ><br>
> >> > Honestly, I don't see a way to use `jmethodID` safely<br>
> unless the<br>
> >> code<br>
> >> > using it suspends the entire JVM and doesn't resume until<br>
> it's<br>
> >> finished<br>
> >> > with that `jmethodID`. Any other approach might lead to JVM<br>
> >> crashes, as<br>
> >> > we've observed with J9.<br>
> >> ><br>
> >> > Lastly, it's noteworthy that Hotspot takes meticulous<br>
> measures to<br>
> >> ensure<br>
> >> > that using jmethodIDs for unloaded methods doesn't crash the<br>
> >> JVM and<br>
> >> > even provides useful information. This observation has<br>
> led me to<br>
> >> > question whether the documentation aligns with the Hotspot<br>
> >> > implementation, especially given that following closely the<br>
> >> > documentation appears to increase the risk associated<br>
> with the<br>
> >> use of<br>
> >> > `jmethodID` values.<br>
> >><br>
> >> There have been folk who wanted to make this area more<br>
> user-friendly<br>
> >> but<br>
> >> that shouldn't be mistaken for moving towards a world where<br>
> >> jmethodIDs<br>
> >> are always safe to use.<br>
> >><br>
> >><br>
> >> Yes, I see your point. Unfortunately, this confirms my worries that<br>
> >> using AsyncGetCallTrace (ASGCT) on a system strictly adhering to<br>
> the<br>
> >> JVMTI spec of jmethoID is not really possible without risking<br>
> random<br>
> >> and quite frequent crashes on systems with concurrent class<br>
> unloading<br>
> >> enabled.<br>
> >> FTR, ASGCT will record the stack trace as a list of frames, each<br>
> one<br>
> >> containing the corresponding jmethodID value. Considering that the<br>
> >> most common usage of ASGCT is in a signal handler it makes it<br>
> >> impossible to use JVMTI calls to resolve the holder class and<br>
> create<br>
> >> a strong reference to prevent it from being unloaded.<br>
> >> And even if this would be possible we would need to figure out when<br>
> >> to release the class reference when it is no more needed - and<br>
> it is<br>
> >> not really clear how we could do that reliably, leaving us with the<br>
> >> option of holding the class references indefinitely or risking<br>
> >> crashing JVM.<br>
> ><br>
> > I would have to agree you cannot use jmethodID's for that<br>
> purpose, not<br>
> > without also recording (and keeping-alive) the associated<br>
> classes. As<br>
> > to when you would clear such references I can't say as I don't know<br>
> > how the ASGCT stack record would be used.<br>
> ><br>
> >> I want to emphasize that not being able to resolve additional<br>
> details<br>
> >> for a jmethodID pointing to a method of an unloaded class is not an<br>
> >> issue, as long as the JVMTI call does not crash. I think that<br>
> >> <a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a><br>
> <<a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a>><br>
> >> <<a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a><br>
> <<a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a>>> did address exactly the<br>
> >> problem of concurrent class unloading causing races in the code<br>
> that<br>
> >> is checking for validity of jmethodID and then using it.<br>
> ><br>
> > Yes but internal to the JVMTI implementation. The basic null check<br>
> > that was used was found to be insufficient when used with ZGC and so<br>
> > we would crash more often than was considered reasonable. There is a<br>
> > quoted comment in the code in<br>
> > <a href="https://bugs.openjdk.org/browse/JDK-8268088" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268088</a><br>
> <<a href="https://bugs.openjdk.org/browse/JDK-8268088" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268088</a>>:<br>
> ><br>
> > // These aren't deallocated and are going to look like a leak, but<br>
> > that's<br>
> > // needed because we can't really get rid of jmethodIDs because we<br>
> > don't<br>
> > // know when native code is going to stop using them. The spec<br>
> says<br>
> > that<br>
> > // they're "invalid" but existing programs likely rely on their<br>
> being<br>
> > // NULL after class unloading.<br>
> ><br>
> > that kind of sums up the position of trying to accommodate "bad<br>
> code"<br>
> > in a reasonable way.<br>
> ><br>
> >> Can this be summarize in a way that the user is not guaranteed<br>
> to get<br>
> >> any additional information for an invalid jmethodID but it would be<br>
> >> really nice for JVM not to crash when jmethodID becomes invalid as<br>
> >> there is no way for the user to check for its validity in an atomic<br>
> >> manner<br>
> >> - and yes, even calling GetMethodDeclaringClass in order to obtain<br>
> >> the class one could create a strong reference is a subject to racy<br>
> >> behaviour so it really can not be used as a workaround.<br>
> ><br>
> > The only real solution IMO would be to encode jmethodIDs in a way<br>
> that<br>
> > validity can be tracked and then have all JVMTI methods be able to<br>
> > check that validity and guarantee atomicity of the method such that<br>
> > the id can't become invalid whilst in use. Whether that is at all<br>
> > feasible/practical I do not know - but it sounds like a major effort<br>
> > to me.<br>
> ><br>
> > Cheers,<br>
> > David<br>
> ><br>
> >> Cheers,<br>
> >><br>
> >> -JB-<br>
> >><br>
> >><br>
> >> Cheers,<br>
> >> David<br>
> >><br>
> >> > I welcome your thoughts and perspectives on this matter.<br>
> >> ><br>
> >> > Best regards,<br>
> >> ><br>
> >> > Jaroslav<br>
> >><br>
> <br>
</blockquote></div></div>