RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks
Robbin Ehn
robbin.ehn at oracle.com
Sun Oct 28 20:08:36 UTC 2018
Hi Dan,
Thanks for looking at this, here is the update:
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
/Robbin
On 26/10/2018 17:38, Daniel D. Daugherty wrote:
> On 10/26/18 10:33 AM, Robbin Ehn wrote:
>> Hi, please review.
>>
>> When the VM thread executes a handshake it uses different ThreadsLists during
>> the execution. A JavaThread that is armed for the handshake when it is already
>> in the exit path in VM will cancel the handshake. Even if the VM thread cannot
>> see this thread after the initial ThreadsList which where used for arming, the
>> handshake can progress when the exiting thread cancels the handshake.
>>
>> But if a third thread takes a ThreadsList where the exiting JavaThread is
>> present and tries to execute a VM operation, hence waiting on VM thread to
>> finish the handshake, the JavaThread in the exit path can never reach the
>> handshake cancellation point. VM thread cannot finishes the handshake and the
>> third thread is stuck waiting on the VM thread.
>>
>> To allow holding a ThreadsList when executing a VM operation we instead let the
>> VM thread use the same ThreadsList over the entire handshake making all armed
>> threads visible to the VM thread at all time. And if VM thread spots a
>> terminated thread it will count that thread is already done by only clearing
>> it's operation.
>>
>> Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
>> Added a jtreg handshake + thread suspend test as a reproducer.
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
>> Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
>
> src/hotspot/share/runtime/handshake.hpp
> No comments.
>
> src/hotspot/share/runtime/handshake.cpp
> L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
> L359: assert(Thread::current()->is_VM_thread(), "should call from vm
> thread");
> Both calls to handshake_process_by_vmthread() which calls this
> function are made with the Threads_lock held:
>
> MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
>
> Looks like the lock is grabbed because of
> possibly_vmthread_can_process_handshake() which asserts:
>
> L351: // An externally suspended thread cannot be resumed while the
> L352: // Threads_lock is held so it is safe.
> L353: // Note that this method is allowed to produce false positives.
> L354: assert(Threads_lock->owned_by_self(), "Not holding
> Threads_lock.");
> L355: if (target->is_ext_suspended()) {
> L356: return true;
> L357: }
>
> Also looks like vmthread_can_process_handshake() needs the
> Threads_lock for the same externally suspended thread check.
>
> So I was going to ask that you add:
>
> assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
>
> after L359, but how about a comment instead:
>
> // Threads_lock must be held here, but that is assert()ed in
> // possibly_vmthread_can_process_handshake().
>
>
> src/hotspot/share/runtime/thread.hpp
> No comments.
>
> src/hotspot/share/runtime/thread.cpp
> No comments.
>
> src/hotspot/share/runtime/threadSMR.cpp
> No comments.
>
> test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
> Very nice test! It specifically exercises ThreadLocalHandshakes
> with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
> only ran into this bug by accident (JDK-8212152) so I like the
> targeted test.
>
> L49: while(!exit_now) {
> nit - please add a space before '('
>
> L51: for (int i = 0; i < _threads.length; i+=2) {
> L58: for (int i = 0; i < _threads.length; i+=2) {
> nit - please added spaces around '+='
>
> So why every other thread? A comment would be good...
>
> L52: wb.handshakeWalkStack(null, true);
> I'm guessing the 'null' parameter means current thread, but
> that's a guess on my part. A comment would be good.
>
> L82: for (int i = 0; i < _threads.length; i++) {
> L83: _threads[i].join();
> L84: }
> Thanks for cleaning up the test_threads. That will make
> the JTREG thread sweeper happy. However, you don't save
> the test_exit_thread references and you don't clean those
> up either. Yes, I realize that they are supposed to exit,
> but if something hangs up on exit, I'd rather have a join()
> hang failure in this test's code than have the JTREG thread
> sweeper catch it.
>
> Dan
>
>>
>> Thanks, Robbin
>
More information about the serviceability-dev
mailing list