Clarifying jmethodID Usage and Potential JVM Crashes

Tue Jun 20 02:38:29 UTC 2023

On 19/06/2023 11:46 pm, Jaroslav Bachorík wrote:
> Hi,
> 
> On Thu, Jun 1, 2023 at 5:49 PM <daniel.daugherty at oracle.com 
> <mailto:daniel.daugherty at oracle.com>> wrote:
> 
>     Hmmm... seems like concurrent class unloading has revealed a situation
>     in some
>     code where class refs should have been held, but were not held because
>     it's very
>     difficult to do, and the code (mostly) got away with it for a long
>     time...
> 
>     David's quote from the JNI spec makes things very, very clear:
> 
>       > "A field or method ID does not prevent the VM from unloading the
>     class from
>       > which the ID has been derived. After the class is unloaded, the
>     method or
>       > field ID becomes invalid and may not be passed to any function
>     taking
>     such
>       > an ID. The native code, therefore, must make sure to:
>       >   - keep a live reference to the underlying class, or
>       >   - recompute the method or field ID
>       >
>       > if it intends to use a method or field ID for an extended period of
>     time."
> 
>     and the JVM/TI spec does depend on the JNI spec for certain pieces
>     like the
>     semantics of jmethodID.
> 
>     I agree with David that fixing this is going to be difficult and a major
>     effort which is why we've all only chipped at this boulder before...
> 
> 
> I apologize for being a stickler, but it seems that the concurrent class 
> unloading and the ephemeral nature of jmethodIDs could cause problems 
> even for a much less exotic usecase of the JVMTI GetStackTrace 
> (https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace>).
> IIUC, the stacktrace consists of stack frames and each stack frame 
> contains a jmethodID identifying the associated Java method. Further on, 
> obtaining the stacktrace will not create references to corresponding 
> classes so the jmethodIDs can go 'bad' at any given moment, without an 
> official way to make sure such a 'bad' jmethodID would not crash JVM.

Yep all correct. The jvmtiFrameInfo contains the jmethodID as applicable 
when the stacktrace was taken. If the current scope is not ensuring 
liveness then those method ids could be invalidated at any time.

I agree this is a hole in the JVM TI specification, but it is not a hole 
I know how to patch. Additional API are needed here along with changes 
to the validity requirements of jMethodIds.

Cheers,
David
-----

> Cheers,
> 
> -JB-
> 
> 
>     Dan
> 
> 
>     On 6/1/23 4:28 AM, David Holmes wrote:
>      > On 1/06/2023 5:16 pm, Jaroslav Bachorík wrote:
>      >> Hi David,
>      >>
>      >> On Thu, Jun 1, 2023 at 3:56 AM David Holmes
>     <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>      >> <mailto:david.holmes at oracle.com
>     <mailto:david.holmes at oracle.com>>> wrote:
>      >>
>      >>     Hi Jaroslav,
>      >>
>      >>     On 31/05/2023 9:12 pm, Jaroslav Bachorík wrote:
>      >>      > Dear Team,
>      >>      >
>      >>      > I've been investigating the unusual JVM crashes occurring in
>      >>     JVMTI calls
>      >>      > on a J9 JVM. During my investigation, I scrutinized the
>      >> `jmethodID`
>      >>      > definition closely, available here: [jmethodID
>      >>      >
>      >>
>     definition](https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID>
>      >>
>     <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID>>
>      >>
>     <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID>
>      >>
>     <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID>>>).
>      >>
>      >>      >
>      >>      > To paraphrase, the definition suggests that `jmethodID`
>      >> identifies a
>      >>      > Java method, initializer, or constructor. These
>     identifiers, once
>      >>      > returned by JVM TI functions and events, can be safely
>     stored.
>      >>     However,
>      >>      > when the class is unloaded, they become invalid,
>     rendering them
>      >>      > unsuitable for use.
>      >>      >
>      >>      > My interpretation is that the JVMTI user should verify the
>      >>     validity of a
>      >>      > `jmethodID` value before using it to prevent potential
>     crashes.
>      >>     Would
>      >>      > you agree with this interpretation?
>      >>
>      >>     Not quite - as you note you can't verify the jmethodID
>     validity.
>      >> What
>      >>     the user needs to do, in line with what Dan was saying, is
>     ensure
>      >> that
>      >>     they keep track of the classes to which the methods belong
>     and keep
>      >>     them
>      >>     alive if necessary. Now that may be easier said than done, but
>      >> that is
>      >>     the gist of it. This comes from the JNI spec:
>      >>
>      >>     "A field or method ID does not prevent the VM from unloading
>     the
>      >> class
>      >>     from which the ID has been derived. After the class is
>     unloaded, the
>      >>     method or field ID becomes invalid and may not be passed to any
>      >>     function
>      >>     taking such an ID. The native code, therefore, must make
>     sure to:
>      >>
>      >>           keep a live reference to the underlying class, or
>      >>           recompute the method or field ID
>      >>
>      >>     if it intends to use a method or field ID for an extended
>     period of
>      >>     time."
>      >>
>      >>      > This sounds like a sensible requirement, but its practical
>      >>     application
>      >>      > remains unclear. As far as I know, methods can be unloaded
>      >>     concurrently
>      >>      > to the native code executing JVMTI functions. This
>     introduces a
>      >>      > potential race condition where the JVM unloads the
>     methods during
>      >>     the
>      >>      > check->use flow, making it only a partial solution. To
>     complicate
>      >>      > matters further, no method exists to confirm whether a
>      >>     `jmethodID` is valid.
>      >>      >
>      >>      > Theoretically, we could monitor the `CompiledMethodUnload`
>      >> event to
>      >>      > track the validity state, creating a constantly expanding
>     set of
>      >>      > unloaded `jmethodID` values or a bloom filter, if one
>     does not
>      >> care
>      >>      > about few potential false positives. This strategy, however,
>      >> doesn't
>      >>      > address the potential race condition, and it could even
>      >>     exacerbate it
>      >>      > due to possible event delays. This delay might mistakenly
>      >> validate a
>      >>      > `jmethodID` value that has already been unloaded, but for
>      >> which the
>      >>      > event hasn't been delivered yet.
>      >>      >
>      >>      > Honestly, I don't see a way to use `jmethodID` safely
>     unless the
>      >>     code
>      >>      > using it suspends the entire JVM and doesn't resume until
>     it's
>      >>     finished
>      >>      > with that `jmethodID`. Any other approach might lead to JVM
>      >>     crashes, as
>      >>      > we've observed with J9.
>      >>      >
>      >>      > Lastly, it's noteworthy that Hotspot takes meticulous
>     measures to
>      >>     ensure
>      >>      > that using jmethodIDs for unloaded methods doesn't crash the
>      >> JVM and
>      >>      > even provides useful information. This observation has
>     led me to
>      >>      > question whether the documentation aligns with the Hotspot
>      >>      > implementation, especially given that following closely the
>      >>      > documentation appears to increase the risk associated
>     with the
>      >>     use of
>      >>      > `jmethodID` values.
>      >>
>      >>     There have been folk who wanted to make this area more
>     user-friendly
>      >>     but
>      >>     that shouldn't be mistaken for moving towards a world where
>      >> jmethodIDs
>      >>     are always safe to use.
>      >>
>      >>
>      >> Yes, I see your point. Unfortunately, this confirms my worries that
>      >> using AsyncGetCallTrace (ASGCT) on a system strictly adhering to
>     the
>      >> JVMTI spec of jmethoID is not really possible without risking
>     random
>      >> and quite frequent crashes on systems with concurrent class
>     unloading
>      >> enabled.
>      >> FTR, ASGCT will record the stack trace as a list of frames, each
>     one
>      >> containing the corresponding jmethodID value. Considering that the
>      >> most common usage of ASGCT is in a signal handler it makes it
>      >> impossible to use JVMTI calls to resolve the holder class and
>     create
>      >> a strong reference to prevent it from being unloaded.
>      >> And even if this would be possible we would need to figure out when
>      >> to release the class reference when it is no more needed - and
>     it is
>      >> not really clear how we could do that reliably, leaving us with the
>      >> option of holding the class references indefinitely or risking
>      >> crashing JVM.
>      >
>      > I would have to agree you cannot use jmethodID's for that
>     purpose, not
>      > without also recording (and keeping-alive) the associated
>     classes. As
>      > to when you would clear such references I can't say as I don't know
>      > how the ASGCT stack record would be used.
>      >
>      >> I want to emphasize that not being able to resolve additional
>     details
>      >> for a jmethodID pointing to a method of an unloaded class is not an
>      >> issue, as long as the JVMTI call does not crash. I think that
>      >> https://bugs.openjdk.org/browse/JDK-8268364
>     <https://bugs.openjdk.org/browse/JDK-8268364>
>      >> <https://bugs.openjdk.org/browse/JDK-8268364
>     <https://bugs.openjdk.org/browse/JDK-8268364>> did address exactly the
>      >> problem of concurrent class unloading causing races in the code
>     that
>      >> is checking for validity of jmethodID and then using it.
>      >
>      > Yes but internal to the JVMTI implementation. The basic null check
>      > that was used was found to be insufficient when used with ZGC and so
>      > we would crash more often than was considered reasonable. There is a
>      > quoted comment in the code in
>      > https://bugs.openjdk.org/browse/JDK-8268088
>     <https://bugs.openjdk.org/browse/JDK-8268088>:
>      >
>      >   // These aren't deallocated and are going to look like a leak, but
>      > that's
>      >   // needed because we can't really get rid of jmethodIDs because we
>      > don't
>      >   // know when native code is going to stop using them. The spec
>     says
>      > that
>      >   // they're "invalid" but existing programs likely rely on their
>     being
>      >   // NULL after class unloading.
>      >
>      > that kind of sums up the position of trying to accommodate "bad
>     code"
>      > in a reasonable way.
>      >
>      >> Can this be summarize in a way that the user is not guaranteed
>     to get
>      >> any additional information for an invalid jmethodID but it would be
>      >> really nice for JVM not to crash when jmethodID becomes invalid as
>      >> there is no way for the user to check for its validity in an atomic
>      >> manner
>      >> - and yes, even calling GetMethodDeclaringClass in order to obtain
>      >> the class one could create a strong reference is a subject to racy
>      >> behaviour so it really can not be used as a workaround.
>      >
>      > The only real solution IMO would be to encode jmethodIDs in a way
>     that
>      > validity can be tracked and then have all JVMTI methods be able to
>      > check that validity and guarantee atomicity of the method such that
>      > the id can't become invalid whilst in use. Whether that is at all
>      > feasible/practical I do not know - but it sounds like a major effort
>      > to me.
>      >
>      > Cheers,
>      > David
>      >
>      >> Cheers,
>      >>
>      >> -JB-
>      >>
>      >>
>      >>     Cheers,
>      >>     David
>      >>
>      >>      > I welcome your thoughts and perspectives on this matter.
>      >>      >
>      >>      > Best regards,
>      >>      >
>      >>      > Jaroslav
>      >>
>