Clarifying jmethodID Usage and Potential JVM Crashes

Thu Jun 1 15:49:37 UTC 2023

Hmmm... seems like concurrent class unloading has revealed a situation 
in some
code where class refs should have been held, but were not held because 
it's very
difficult to do, and the code (mostly) got away with it for a long time...

David's quote from the JNI spec makes things very, very clear:

 > "A field or method ID does not prevent the VM from unloading the 
class from
 > which the ID has been derived. After the class is unloaded, the method or
 > field ID becomes invalid and may not be passed to any function taking 
such
 > an ID. The native code, therefore, must make sure to:
 >   - keep a live reference to the underlying class, or
 >   - recompute the method or field ID
 >
 > if it intends to use a method or field ID for an extended period of 
time."

and the JVM/TI spec does depend on the JNI spec for certain pieces like the
semantics of jmethodID.

I agree with David that fixing this is going to be difficult and a major
effort which is why we've all only chipped at this boulder before...

Dan

On 6/1/23 4:28 AM, David Holmes wrote:
> On 1/06/2023 5:16 pm, Jaroslav Bachorík wrote:
>> Hi David,
>>
>> On Thu, Jun 1, 2023 at 3:56 AM David Holmes <david.holmes at oracle.com 
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Jaroslav,
>>
>>     On 31/05/2023 9:12 pm, Jaroslav Bachorík wrote:
>>      > Dear Team,
>>      >
>>      > I've been investigating the unusual JVM crashes occurring in
>>     JVMTI calls
>>      > on a J9 JVM. During my investigation, I scrutinized the 
>> `jmethodID`
>>      > definition closely, available here: [jmethodID
>>      >
>> definition](https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID 
>> <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID> 
>> <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID 
>> <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID>>). 
>>
>>      >
>>      > To paraphrase, the definition suggests that `jmethodID` 
>> identifies a
>>      > Java method, initializer, or constructor. These identifiers, once
>>      > returned by JVM TI functions and events, can be safely stored.
>>     However,
>>      > when the class is unloaded, they become invalid, rendering them
>>      > unsuitable for use.
>>      >
>>      > My interpretation is that the JVMTI user should verify the
>>     validity of a
>>      > `jmethodID` value before using it to prevent potential crashes.
>>     Would
>>      > you agree with this interpretation?
>>
>>     Not quite - as you note you can't verify the jmethodID validity. 
>> What
>>     the user needs to do, in line with what Dan was saying, is ensure 
>> that
>>     they keep track of the classes to which the methods belong and keep
>>     them
>>     alive if necessary. Now that may be easier said than done, but 
>> that is
>>     the gist of it. This comes from the JNI spec:
>>
>>     "A field or method ID does not prevent the VM from unloading the 
>> class
>>     from which the ID has been derived. After the class is unloaded, the
>>     method or field ID becomes invalid and may not be passed to any
>>     function
>>     taking such an ID. The native code, therefore, must make sure to:
>>
>>           keep a live reference to the underlying class, or
>>           recompute the method or field ID
>>
>>     if it intends to use a method or field ID for an extended period of
>>     time."
>>
>>      > This sounds like a sensible requirement, but its practical
>>     application
>>      > remains unclear. As far as I know, methods can be unloaded
>>     concurrently
>>      > to the native code executing JVMTI functions. This introduces a
>>      > potential race condition where the JVM unloads the methods during
>>     the
>>      > check->use flow, making it only a partial solution. To complicate
>>      > matters further, no method exists to confirm whether a
>>     `jmethodID` is valid.
>>      >
>>      > Theoretically, we could monitor the `CompiledMethodUnload` 
>> event to
>>      > track the validity state, creating a constantly expanding set of
>>      > unloaded `jmethodID` values or a bloom filter, if one does not 
>> care
>>      > about few potential false positives. This strategy, however, 
>> doesn't
>>      > address the potential race condition, and it could even
>>     exacerbate it
>>      > due to possible event delays. This delay might mistakenly 
>> validate a
>>      > `jmethodID` value that has already been unloaded, but for 
>> which the
>>      > event hasn't been delivered yet.
>>      >
>>      > Honestly, I don't see a way to use `jmethodID` safely unless the
>>     code
>>      > using it suspends the entire JVM and doesn't resume until it's
>>     finished
>>      > with that `jmethodID`. Any other approach might lead to JVM
>>     crashes, as
>>      > we've observed with J9.
>>      >
>>      > Lastly, it's noteworthy that Hotspot takes meticulous measures to
>>     ensure
>>      > that using jmethodIDs for unloaded methods doesn't crash the 
>> JVM and
>>      > even provides useful information. This observation has led me to
>>      > question whether the documentation aligns with the Hotspot
>>      > implementation, especially given that following closely the
>>      > documentation appears to increase the risk associated with the
>>     use of
>>      > `jmethodID` values.
>>
>>     There have been folk who wanted to make this area more user-friendly
>>     but
>>     that shouldn't be mistaken for moving towards a world where 
>> jmethodIDs
>>     are always safe to use.
>>
>>
>> Yes, I see your point. Unfortunately, this confirms my worries that 
>> using AsyncGetCallTrace (ASGCT) on a system strictly adhering to the 
>> JVMTI spec of jmethoID is not really possible without risking random 
>> and quite frequent crashes on systems with concurrent class unloading 
>> enabled.
>> FTR, ASGCT will record the stack trace as a list of frames, each one 
>> containing the corresponding jmethodID value. Considering that the 
>> most common usage of ASGCT is in a signal handler it makes it 
>> impossible to use JVMTI calls to resolve the holder class and create 
>> a strong reference to prevent it from being unloaded.
>> And even if this would be possible we would need to figure out when 
>> to release the class reference when it is no more needed - and it is 
>> not really clear how we could do that reliably, leaving us with the 
>> option of holding the class references indefinitely or risking 
>> crashing JVM.
>
> I would have to agree you cannot use jmethodID's for that purpose, not 
> without also recording (and keeping-alive) the associated classes. As 
> to when you would clear such references I can't say as I don't know 
> how the ASGCT stack record would be used.
>
>> I want to emphasize that not being able to resolve additional details 
>> for a jmethodID pointing to a method of an unloaded class is not an 
>> issue, as long as the JVMTI call does not crash. I think that 
>> https://bugs.openjdk.org/browse/JDK-8268364 
>> <https://bugs.openjdk.org/browse/JDK-8268364> did address exactly the 
>> problem of concurrent class unloading causing races in the code that 
>> is checking for validity of jmethodID and then using it.
>
> Yes but internal to the JVMTI implementation. The basic null check 
> that was used was found to be insufficient when used with ZGC and so 
> we would crash more often than was considered reasonable. There is a 
> quoted comment in the code in 
> https://bugs.openjdk.org/browse/JDK-8268088:
>
>   // These aren't deallocated and are going to look like a leak, but 
> that's
>   // needed because we can't really get rid of jmethodIDs because we 
> don't
>   // know when native code is going to stop using them. The spec says 
> that
>   // they're "invalid" but existing programs likely rely on their being
>   // NULL after class unloading.
>
> that kind of sums up the position of trying to accommodate "bad code" 
> in a reasonable way.
>
>> Can this be summarize in a way that the user is not guaranteed to get 
>> any additional information for an invalid jmethodID but it would be 
>> really nice for JVM not to crash when jmethodID becomes invalid as 
>> there is no way for the user to check for its validity in an atomic 
>> manner
>> - and yes, even calling GetMethodDeclaringClass in order to obtain 
>> the class one could create a strong reference is a subject to racy 
>> behaviour so it really can not be used as a workaround.
>
> The only real solution IMO would be to encode jmethodIDs in a way that 
> validity can be tracked and then have all JVMTI methods be able to 
> check that validity and guarantee atomicity of the method such that 
> the id can't become invalid whilst in use. Whether that is at all 
> feasible/practical I do not know - but it sounds like a major effort 
> to me.
>
> Cheers,
> David
>
>> Cheers,
>>
>> -JB-
>>
>>
>>     Cheers,
>>     David
>>
>>      > I welcome your thoughts and perspectives on this matter.
>>      >
>>      > Best regards,
>>      >
>>      > Jaroslav
>>