<div dir="ltr"><div dir="ltr">On Tue, Jun 20, 2023 at 4:38 AM David Holmes <<a href="mailto:david.holmes@oracle.com">david.holmes@oracle.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 19/06/2023 11:46 pm, Jaroslav Bachorík wrote:<br>

> Hi,<br>

> <br>

> On Thu, Jun 1, 2023 at 5:49 PM <<a href="mailto:daniel.daugherty@oracle.com" target="_blank">daniel.daugherty@oracle.com</a> <br>

> <mailto:<a href="mailto:daniel.daugherty@oracle.com" target="_blank">daniel.daugherty@oracle.com</a>>> wrote:<br>

> <br>

>     Hmmm... seems like concurrent class unloading has revealed a situation<br>

>     in some<br>

>     code where class refs should have been held, but were not held because<br>

>     it's very<br>

>     difficult to do, and the code (mostly) got away with it for a long<br>

>     time...<br>

> <br>

>     David's quote from the JNI spec makes things very, very clear:<br>

> <br>

>       > "A field or method ID does not prevent the VM from unloading the<br>

>     class from<br>

>       > which the ID has been derived. After the class is unloaded, the<br>

>     method or<br>

>       > field ID becomes invalid and may not be passed to any function<br>

>     taking<br>

>     such<br>

>       > an ID. The native code, therefore, must make sure to:<br>

>       >   - keep a live reference to the underlying class, or<br>

>       >   - recompute the method or field ID<br>

>       ><br>

>       > if it intends to use a method or field ID for an extended period of<br>

>     time."<br>

> <br>

>     and the JVM/TI spec does depend on the JNI spec for certain pieces<br>

>     like the<br>

>     semantics of jmethodID.<br>

> <br>

>     I agree with David that fixing this is going to be difficult and a major<br>

>     effort which is why we've all only chipped at this boulder before...<br>

> <br>

> <br>

> I apologize for being a stickler, but it seems that the concurrent class <br>

> unloading and the ephemeral nature of jmethodIDs could cause problems <br>

> even for a much less exotic usecase of the JVMTI GetStackTrace <br>

> (<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace</a>>).<br>

> IIUC, the stacktrace consists of stack frames and each stack frame <br>

> contains a jmethodID identifying the associated Java method. Further on, <br>

> obtaining the stacktrace will not create references to corresponding <br>

> classes so the jmethodIDs can go 'bad' at any given moment, without an <br>

> official way to make sure such a 'bad' jmethodID would not crash JVM.<br>

<br>

Yep all correct. The jvmtiFrameInfo contains the jmethodID as applicable <br>

when the stacktrace was taken. If the current scope is not ensuring <br>

liveness then those method ids could be invalidated at any time.<br>

<br>

I agree this is a hole in the JVM TI specification, but it is not a hole <br>

I know how to patch. Additional API are needed here along with changes <br>

to the validity requirements of jMethodIds.<br></blockquote><div><br></div><div>Thanks! Yes, this makes sense. I just wanted to make sure my perception was correct.</div><div><br></div><div>Cheers,</div><div><br></div><div>-JB-</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Cheers,<br>

David<br>

-----<br>

<br>

> Cheers,<br>

> <br>

> -JB-<br>

> <br>

> <br>

>     Dan<br>

> <br>

> <br>

>     On 6/1/23 4:28 AM, David Holmes wrote:<br>

>      > On 1/06/2023 5:16 pm, Jaroslav Bachorík wrote:<br>

>      >> Hi David,<br>

>      >><br>

>      >> On Thu, Jun 1, 2023 at 3:56 AM David Holmes<br>

>     <<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a> <mailto:<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a>><br>

>      >> <mailto:<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a><br>

>     <mailto:<a href="mailto:david.holmes@oracle.com" target="_blank">david.holmes@oracle.com</a>>>> wrote:<br>

>      >><br>

>      >>     Hi Jaroslav,<br>

>      >><br>

>      >>     On 31/05/2023 9:12 pm, Jaroslav Bachorík wrote:<br>

>      >>      > Dear Team,<br>

>      >>      ><br>

>      >>      > I've been investigating the unusual JVM crashes occurring in<br>

>      >>     JVMTI calls<br>

>      >>      > on a J9 JVM. During my investigation, I scrutinized the<br>

>      >> `jmethodID`<br>

>      >>      > definition closely, available here: [jmethodID<br>

>      >>      ><br>

>      >><br>

>     definition](<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>><br>

>      >><br>

>     <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>>><br>

>      >><br>

>     <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>><br>

>      >><br>

>     <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a> <<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID" rel="noreferrer" target="_blank">https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID</a>>>>).<br>

>      >><br>

>      >>      ><br>

>      >>      > To paraphrase, the definition suggests that `jmethodID`<br>

>      >> identifies a<br>

>      >>      > Java method, initializer, or constructor. These<br>

>     identifiers, once<br>

>      >>      > returned by JVM TI functions and events, can be safely<br>

>     stored.<br>

>      >>     However,<br>

>      >>      > when the class is unloaded, they become invalid,<br>

>     rendering them<br>

>      >>      > unsuitable for use.<br>

>      >>      ><br>

>      >>      > My interpretation is that the JVMTI user should verify the<br>

>      >>     validity of a<br>

>      >>      > `jmethodID` value before using it to prevent potential<br>

>     crashes.<br>

>      >>     Would<br>

>      >>      > you agree with this interpretation?<br>

>      >><br>

>      >>     Not quite - as you note you can't verify the jmethodID<br>

>     validity.<br>

>      >> What<br>

>      >>     the user needs to do, in line with what Dan was saying, is<br>

>     ensure<br>

>      >> that<br>

>      >>     they keep track of the classes to which the methods belong<br>

>     and keep<br>

>      >>     them<br>

>      >>     alive if necessary. Now that may be easier said than done, but<br>

>      >> that is<br>

>      >>     the gist of it. This comes from the JNI spec:<br>

>      >><br>

>      >>     "A field or method ID does not prevent the VM from unloading<br>

>     the<br>

>      >> class<br>

>      >>     from which the ID has been derived. After the class is<br>

>     unloaded, the<br>

>      >>     method or field ID becomes invalid and may not be passed to any<br>

>      >>     function<br>

>      >>     taking such an ID. The native code, therefore, must make<br>

>     sure to:<br>

>      >><br>

>      >>           keep a live reference to the underlying class, or<br>

>      >>           recompute the method or field ID<br>

>      >><br>

>      >>     if it intends to use a method or field ID for an extended<br>

>     period of<br>

>      >>     time."<br>

>      >><br>

>      >>      > This sounds like a sensible requirement, but its practical<br>

>      >>     application<br>

>      >>      > remains unclear. As far as I know, methods can be unloaded<br>

>      >>     concurrently<br>

>      >>      > to the native code executing JVMTI functions. This<br>

>     introduces a<br>

>      >>      > potential race condition where the JVM unloads the<br>

>     methods during<br>

>      >>     the<br>

>      >>      > check->use flow, making it only a partial solution. To<br>

>     complicate<br>

>      >>      > matters further, no method exists to confirm whether a<br>

>      >>     `jmethodID` is valid.<br>

>      >>      ><br>

>      >>      > Theoretically, we could monitor the `CompiledMethodUnload`<br>

>      >> event to<br>

>      >>      > track the validity state, creating a constantly expanding<br>

>     set of<br>

>      >>      > unloaded `jmethodID` values or a bloom filter, if one<br>

>     does not<br>

>      >> care<br>

>      >>      > about few potential false positives. This strategy, however,<br>

>      >> doesn't<br>

>      >>      > address the potential race condition, and it could even<br>

>      >>     exacerbate it<br>

>      >>      > due to possible event delays. This delay might mistakenly<br>

>      >> validate a<br>

>      >>      > `jmethodID` value that has already been unloaded, but for<br>

>      >> which the<br>

>      >>      > event hasn't been delivered yet.<br>

>      >>      ><br>

>      >>      > Honestly, I don't see a way to use `jmethodID` safely<br>

>     unless the<br>

>      >>     code<br>

>      >>      > using it suspends the entire JVM and doesn't resume until<br>

>     it's<br>

>      >>     finished<br>

>      >>      > with that `jmethodID`. Any other approach might lead to JVM<br>

>      >>     crashes, as<br>

>      >>      > we've observed with J9.<br>

>      >>      ><br>

>      >>      > Lastly, it's noteworthy that Hotspot takes meticulous<br>

>     measures to<br>

>      >>     ensure<br>

>      >>      > that using jmethodIDs for unloaded methods doesn't crash the<br>

>      >> JVM and<br>

>      >>      > even provides useful information. This observation has<br>

>     led me to<br>

>      >>      > question whether the documentation aligns with the Hotspot<br>

>      >>      > implementation, especially given that following closely the<br>

>      >>      > documentation appears to increase the risk associated<br>

>     with the<br>

>      >>     use of<br>

>      >>      > `jmethodID` values.<br>

>      >><br>

>      >>     There have been folk who wanted to make this area more<br>

>     user-friendly<br>

>      >>     but<br>

>      >>     that shouldn't be mistaken for moving towards a world where<br>

>      >> jmethodIDs<br>

>      >>     are always safe to use.<br>

>      >><br>

>      >><br>

>      >> Yes, I see your point. Unfortunately, this confirms my worries that<br>

>      >> using AsyncGetCallTrace (ASGCT) on a system strictly adhering to<br>

>     the<br>

>      >> JVMTI spec of jmethoID is not really possible without risking<br>

>     random<br>

>      >> and quite frequent crashes on systems with concurrent class<br>

>     unloading<br>

>      >> enabled.<br>

>      >> FTR, ASGCT will record the stack trace as a list of frames, each<br>

>     one<br>

>      >> containing the corresponding jmethodID value. Considering that the<br>

>      >> most common usage of ASGCT is in a signal handler it makes it<br>

>      >> impossible to use JVMTI calls to resolve the holder class and<br>

>     create<br>

>      >> a strong reference to prevent it from being unloaded.<br>

>      >> And even if this would be possible we would need to figure out when<br>

>      >> to release the class reference when it is no more needed - and<br>

>     it is<br>

>      >> not really clear how we could do that reliably, leaving us with the<br>

>      >> option of holding the class references indefinitely or risking<br>

>      >> crashing JVM.<br>

>      ><br>

>      > I would have to agree you cannot use jmethodID's for that<br>

>     purpose, not<br>

>      > without also recording (and keeping-alive) the associated<br>

>     classes. As<br>

>      > to when you would clear such references I can't say as I don't know<br>

>      > how the ASGCT stack record would be used.<br>

>      ><br>

>      >> I want to emphasize that not being able to resolve additional<br>

>     details<br>

>      >> for a jmethodID pointing to a method of an unloaded class is not an<br>

>      >> issue, as long as the JVMTI call does not crash. I think that<br>

>      >> <a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a><br>

>     <<a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a>><br>

>      >> <<a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a><br>

>     <<a href="https://bugs.openjdk.org/browse/JDK-8268364" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268364</a>>> did address exactly the<br>

>      >> problem of concurrent class unloading causing races in the code<br>

>     that<br>

>      >> is checking for validity of jmethodID and then using it.<br>

>      ><br>

>      > Yes but internal to the JVMTI implementation. The basic null check<br>

>      > that was used was found to be insufficient when used with ZGC and so<br>

>      > we would crash more often than was considered reasonable. There is a<br>

>      > quoted comment in the code in<br>

>      > <a href="https://bugs.openjdk.org/browse/JDK-8268088" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268088</a><br>

>     <<a href="https://bugs.openjdk.org/browse/JDK-8268088" rel="noreferrer" target="_blank">https://bugs.openjdk.org/browse/JDK-8268088</a>>:<br>

>      ><br>

>      >   // These aren't deallocated and are going to look like a leak, but<br>

>      > that's<br>

>      >   // needed because we can't really get rid of jmethodIDs because we<br>

>      > don't<br>

>      >   // know when native code is going to stop using them. The spec<br>

>     says<br>

>      > that<br>

>      >   // they're "invalid" but existing programs likely rely on their<br>

>     being<br>

>      >   // NULL after class unloading.<br>

>      ><br>

>      > that kind of sums up the position of trying to accommodate "bad<br>

>     code"<br>

>      > in a reasonable way.<br>

>      ><br>

>      >> Can this be summarize in a way that the user is not guaranteed<br>

>     to get<br>

>      >> any additional information for an invalid jmethodID but it would be<br>

>      >> really nice for JVM not to crash when jmethodID becomes invalid as<br>

>      >> there is no way for the user to check for its validity in an atomic<br>

>      >> manner<br>

>      >> - and yes, even calling GetMethodDeclaringClass in order to obtain<br>

>      >> the class one could create a strong reference is a subject to racy<br>

>      >> behaviour so it really can not be used as a workaround.<br>

>      ><br>

>      > The only real solution IMO would be to encode jmethodIDs in a way<br>

>     that<br>

>      > validity can be tracked and then have all JVMTI methods be able to<br>

>      > check that validity and guarantee atomicity of the method such that<br>

>      > the id can't become invalid whilst in use. Whether that is at all<br>

>      > feasible/practical I do not know - but it sounds like a major effort<br>

>      > to me.<br>

>      ><br>

>      > Cheers,<br>

>      > David<br>

>      ><br>

>      >> Cheers,<br>

>      >><br>

>      >> -JB-<br>

>      >><br>

>      >><br>

>      >>     Cheers,<br>

>      >>     David<br>

>      >><br>

>      >>      > I welcome your thoughts and perspectives on this matter.<br>

>      >>      ><br>

>      >>      > Best regards,<br>

>      >>      ><br>

>      >>      > Jaroslav<br>

>      >><br>

> <br>

</blockquote></div></div>