RFR: 8281615: Deadlock caused by jdwp agent [v2]

Chris Plummer cjplummer at openjdk.java.net
Tue Feb 15 20:22:11 UTC 2022


On Tue, 15 Feb 2022 17:48:43 GMT, Zhengyu Gu <zgu at openjdk.org> wrote:

>> There are scenarios that JDWP agent can deadlock on `classTrackLock` monitor. Following is the scenario in bug report.
>> 
>> **Java Thread** 
>> -   loads a class and post `JVMTI_EVENT_CLASS_PREPARE` event
>> - JDWP event callback handler calls `classTrack_processUnloads()` to handle the event.
>> - `classTrack_processUnloads()` takes `classTrackLock` lock, then tries to allocate a new bag under the lock.
>> - bag allocation code calls` jvmtiAllocate()`, which may be blocked by ongoing safepoint due to state transition.
>> 
>> If the safepoint is GC safepoint (prior to JDK16) or `VM_JvmtiPostObjectFree`  safepoint (JDK16 or later)
>> 
>> **VM Thread**
>> - post `JVMTI_EVENT_OBJECT_FREE`
>> - JDWP event callback handler calls `cbTrackingObjectFree()` to handle the event
>> - `cbTrackingObjectFree()` tries to acquire `classTrackLock` lock, leads to deadlock
>> 
>> From my research, there are three events that may be posted at safepoints, `JVMTI_EVENT_GARBAGE_COLLECTION_START`, `JVMTI_EVENT_GARBAGE_COLLECTION_FINISH` and  `JVMTI_EVENT_OBJECT_FREE`, but only  `JVMTI_EVENT_OBJECT_FREE` is relevant to JDWP agent.
>> 
>> The solution I purpose here, is simply move allocation/deallocation code outside of `classTrackLock` lock.
>> 
>> 
>> Test:
>> - [x] tier1 
>> - [x] vmTestbase_nsk_jdi
>> - [x] vmTestbase_nsk_jdwp
>> - [x] vmTestbase_nsk_jvmti
>
> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   David and Chris' comments

src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c line 88:

> 86:     jboolean retry = JNI_FALSE;
> 87:     do {
> 88:       // Avoid unnecessary allocations when class track has yet been activated.

I don't think this comment is accurate. It's not that we want to avoid unnecessary allocations. We want to avoid unnecessary tracking of loaded classes.

src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c line 89:

> 87:     do {
> 88:       // Avoid unnecessary allocations when class track has yet been activated.
> 89:       if (deletedSignatures != NULL) {

It's now kind of hard to read what used to be the `deletedSignatures == NULL` case. I think you can first just check for NULL outside the lock and then return. It will make it clear that it is very low overhead in this case.

src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c line 111:

> 109:       }
> 110:       debugMonitorExit(classTrackLock);
> 111:     } while (retry == JNI_TRUE);

I don't think the retry is necessary. Nor is the check for `deletedSignatures != NULL`. I assume you are trying to cover the case where initially deletedSignatures was NULL (so no bag was allocated), but is not NULL by the time you grab the lock. If you return when NULL like I suggested above, you won't have this issue. Note that if not NULL, there is no way another thread can get into this same code in clear it. This is because all callers first grab the handlerLock. Grabbing of the classTrackLock here is only done to synchronize with cbTrackingObjectFree(), not with other callers of classTrack_processUnloads().

src/jdk.jdwp.agent/share/native/libjdwp/classTrack.c line 112:

> 110:       debugMonitorExit(classTrackLock);
> 111:     } while (retry == JNI_TRUE);
> 112:     bagDestroyBag(new_bag);

One other comment about `classTrack_processUnloads()` in general that also applies to the original version. It seems pretty inefficient. Every time the debug agent gets an event it is called. On almost ever call it is returning and empty back and allocating a new one. It seems a check for `bagSize(deletedSignatures) == 0` and returning NULL if true would help performance. I also believe this can be done outside of the lock (would like David's opinion on this).

-------------

PR: https://git.openjdk.java.net/jdk/pull/7461


More information about the serviceability-dev mailing list