RFR: 8313816: Accessing jmethodID might lead to spurious crashes
Jaroslav Bachorik
jbachorik at openjdk.org
Thu Nov 16 20:33:54 UTC 2023
On Wed, 15 Nov 2023 05:35:49 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> Please, review this fix for a corner case handling of `jmethodID` values.
>>
>> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version.
>> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version.
>>
>> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version.
>> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory.
>> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded.
>>
>> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions.
>>
>> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated.
>>
>> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_
>
> src/hotspot/share/oops/instanceKlass.cpp line 531:
>
>> 529:
>> 530: void InstanceKlass::deallocate_methods(ClassLoaderData* loader_data,
>> 531: Array<Method*>* methods, InstanceKlass* klass) {
>
> An explicit boolean parameter would be cleaner/clearer.
I just removed the `klass` argument. It is not really used anyway.
> src/hotspot/share/oops/instanceKlass.cpp line 542:
>
>> 540: if (klass) {
>> 541: jmethodID jmid = method->find_jmethod_id_or_null();
>> 542: // Do the pointer maintenance before releasing the metadata, just in case
>
> I assume there should be a period after 'case`. But just in case of what?
The code was moved to`method.cpp` and this particular comment line became obsolete
> src/hotspot/share/oops/instanceKlass.cpp line 549:
>
>> 547: if (jmid != nullptr && *((Method**)jmid) == method) {
>> 548: *((Method**)jmid) = nullptr;
>> 549: }
>
> This should be abstracted behind a utility function e.g. `method->clear_jmethod_id()`.
Done
> src/hotspot/share/oops/method.cpp line 2277:
>
>> 2275: }
>> 2276: }
>> 2277:
>
> Can this race with redefinition?
The cleanup of previous versions is executed in VM_Operation at a safepoint - therefore we should be safe against races with class redefinitions.
I am adding an assert to `clear_jmethod_id()` to check for being at a safepoint.
> src/hotspot/share/oops/method.hpp line 730:
>
>> 728: // so handles are not used to avoid deadlock.
>> 729: jmethodID find_jmethod_id_or_null() {
>> 730: return method_holder() != nullptr ? method_holder()->jmethod_id_or_null(this) : nullptr;
>
> If `method_holder()` is null at this point what does that mean for the lifecycle of the Method?
Please, ignore this part of code for the time being. I had a crash in CI which was pointing vaguely to this code - unfortunately, the hs_err.log files are not preserved in the test archives and I am not able to reproduce the failure locally. I need to debug the crash and make sure I understand the root cause.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394642247
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394647034
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394647173
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395100685
PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395102362
More information about the serviceability-dev
mailing list