RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks
David Holmes
david.holmes at oracle.com
Mon Oct 29 06:20:32 UTC 2018
Hi Robbin,
On 29/10/2018 6:08 AM, Robbin Ehn wrote:
> Hi Dan,
>
> Thanks for looking at this, here is the update:
> Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
> Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
I can't say I really understand the change in protocol here and why all
the cancel operations are no longer needed. I see the handshake VM
operations reusing the initial "threads list" but I'm unclear how they
might be affected if additional threads are added to the system before
the Threads_lock is acquired?
A couple of specific comments:
src/hotspot/share/runtime/handshake.hpp
cancel_inner() is dead now.
---
src/hotspot/share/runtime/handshake.cpp
This was an odd looking for loop before your change and now looks even
more strange:
for ( ; JavaThread *thr = jtiwh.next(); ) {
can it not simply be a more normal looking:
for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {
?
---
Thanks,
David
> /Robbin
>
> On 26/10/2018 17:38, Daniel D. Daugherty wrote:
>> On 10/26/18 10:33 AM, Robbin Ehn wrote:
>>> Hi, please review.
>>>
>>> When the VM thread executes a handshake it uses different
>>> ThreadsLists during
>>> the execution. A JavaThread that is armed for the handshake when it
>>> is already
>>> in the exit path in VM will cancel the handshake. Even if the VM
>>> thread cannot
>>> see this thread after the initial ThreadsList which where used for
>>> arming, the
>>> handshake can progress when the exiting thread cancels the handshake.
>>>
>>> But if a third thread takes a ThreadsList where the exiting
>>> JavaThread is present and tries to execute a VM operation, hence
>>> waiting on VM thread to finish the handshake, the JavaThread in the
>>> exit path can never reach the handshake cancellation point. VM thread
>>> cannot finishes the handshake and the third thread is stuck waiting
>>> on the VM thread.
>>>
>>> To allow holding a ThreadsList when executing a VM operation we
>>> instead let the
>>> VM thread use the same ThreadsList over the entire handshake making
>>> all armed
>>> threads visible to the VM thread at all time. And if VM thread spots
>>> a terminated thread it will count that thread is already done by only
>>> clearing
>>> it's operation.
>>>
>>> Passes local stress testing, t1-5 and the deadlock is no longer
>>> reproduce-able.
>>> Added a jtreg handshake + thread suspend test as a reproducer.
>>>
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
>>> Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
>>
>> src/hotspot/share/runtime/handshake.hpp
>> No comments.
>>
>> src/hotspot/share/runtime/handshake.cpp
>> L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
>> L359: assert(Thread::current()->is_VM_thread(), "should call
>> from vm thread");
>> Both calls to handshake_process_by_vmthread() which calls this
>> function are made with the Threads_lock held:
>>
>> MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
>>
>> Looks like the lock is grabbed because of
>> possibly_vmthread_can_process_handshake() which asserts:
>>
>> L351: // An externally suspended thread cannot be resumed
>> while the
>> L352: // Threads_lock is held so it is safe.
>> L353: // Note that this method is allowed to produce false
>> positives.
>> L354: assert(Threads_lock->owned_by_self(), "Not holding
>> Threads_lock.");
>> L355: if (target->is_ext_suspended()) {
>> L356: return true;
>> L357: }
>>
>> Also looks like vmthread_can_process_handshake() needs the
>> Threads_lock for the same externally suspended thread check.
>>
>> So I was going to ask that you add:
>>
>> assert(Threads_lock->owned_by_self(), "Not holding
>> Threads_lock.");
>>
>> after L359, but how about a comment instead:
>>
>> // Threads_lock must be held here, but that is assert()ed in
>> // possibly_vmthread_can_process_handshake().
>>
>>
>> src/hotspot/share/runtime/thread.hpp
>> No comments.
>>
>> src/hotspot/share/runtime/thread.cpp
>> No comments.
>>
>> src/hotspot/share/runtime/threadSMR.cpp
>> No comments.
>>
>> test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
>> Very nice test! It specifically exercises ThreadLocalHandshakes
>> with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
>> only ran into this bug by accident (JDK-8212152) so I like the
>> targeted test.
>>
>> L49: while(!exit_now) {
>> nit - please add a space before '('
>>
>> L51: for (int i = 0; i < _threads.length; i+=2) {
>> L58: for (int i = 0; i < _threads.length; i+=2) {
>> nit - please added spaces around '+='
>>
>> So why every other thread? A comment would be good...
>>
>> L52: wb.handshakeWalkStack(null, true);
>> I'm guessing the 'null' parameter means current thread, but
>> that's a guess on my part. A comment would be good.
>>
>> L82: for (int i = 0; i < _threads.length; i++) {
>> L83: _threads[i].join();
>> L84: }
>> Thanks for cleaning up the test_threads. That will make
>> the JTREG thread sweeper happy. However, you don't save
>> the test_exit_thread references and you don't clean those
>> up either. Yes, I realize that they are supposed to exit,
>> but if something hangs up on exit, I'd rather have a join()
>> hang failure in this test's code than have the JTREG thread
>> sweeper catch it.
>>
>> Dan
>>
>>>
>>> Thanks, Robbin
>>
More information about the serviceability-dev
mailing list