RFR: 8313816: Accessing jmethodID might lead to spurious crashes

Thu Nov 16 20:33:54 UTC 2023

On Wed, 15 Nov 2023 05:35:49 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please, review this fix for a corner case handling of `jmethodID` values.
>> 
>> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. 
>> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version.
>> 
>> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version.
>> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. 
>> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. 
>> 
>> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. 
>> 
>> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated.
>> 
>> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_
>
> src/hotspot/share/oops/instanceKlass.cpp line 531:
> 
>> 529: 
>> 530: void InstanceKlass::deallocate_methods(ClassLoaderData* loader_data,
>> 531:                                        Array<Method*>* methods, InstanceKlass* klass) {
> 
> An explicit boolean parameter would be cleaner/clearer.

I just removed the `klass` argument. It is not really used anyway.

> src/hotspot/share/oops/instanceKlass.cpp line 542:
> 
>> 540:       if (klass) {
>> 541:         jmethodID jmid = method->find_jmethod_id_or_null();
>> 542:         // Do the pointer maintenance before releasing the metadata, just in case
> 
> I assume there should be a period after 'case`. But just in case of what?

The code was moved to`method.cpp` and this particular comment line became obsolete

> src/hotspot/share/oops/instanceKlass.cpp line 549:
> 
>> 547:         if (jmid != nullptr && *((Method**)jmid) == method) {
>> 548:           *((Method**)jmid) = nullptr;
>> 549:         }
> 
> This should be abstracted behind a utility function e.g. `method->clear_jmethod_id()`.

Done

> src/hotspot/share/oops/method.cpp line 2277:
> 
>> 2275:   }
>> 2276: }
>> 2277: 
> 
> Can this race with redefinition?

The cleanup of previous versions is executed in VM_Operation at a safepoint - therefore we should be safe against races with class redefinitions. 
I am adding an assert to `clear_jmethod_id()` to check for being at a safepoint.

> src/hotspot/share/oops/method.hpp line 730:
> 
>> 728:   // so handles are not used to avoid deadlock.
>> 729:   jmethodID find_jmethod_id_or_null()               {
>> 730:     return method_holder() != nullptr ? method_holder()->jmethod_id_or_null(this) : nullptr;
> 
> If `method_holder()` is null at this point what does that mean for the lifecycle of the Method?

Please, ignore this part of code for the time being. I had a crash in CI which was pointing vaguely to this code - unfortunately, the hs_err.log files are not preserved in the test archives and I am not able to reproduce the failure locally. I need to debug the crash and make sure I understand the root cause.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394642247
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394647034
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394647173
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395100685
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395102362