RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Oct 26 15:38:51 UTC 2018


On 10/26/18 10:33 AM, Robbin Ehn wrote:
> Hi, please review.
>
> When the VM thread executes a handshake it uses different ThreadsLists 
> during
> the execution. A JavaThread that is armed for the handshake when it is 
> already
> in the exit path in VM will cancel the handshake. Even if the VM 
> thread cannot
> see this thread after the initial ThreadsList which where used for 
> arming, the
> handshake can progress when the exiting thread cancels the handshake.
>
> But if a third thread takes a ThreadsList where the exiting JavaThread 
> is present and tries to execute a VM operation, hence waiting on VM 
> thread to finish the handshake, the JavaThread in the exit path can 
> never reach the handshake cancellation point. VM thread cannot 
> finishes the handshake and the third thread is stuck waiting on the VM 
> thread.
>
> To allow holding a ThreadsList when executing a VM operation we 
> instead let the
> VM thread use the same ThreadsList over the entire handshake making 
> all armed
> threads visible to the VM thread at all time. And if VM thread spots a 
> terminated thread it will count that thread is already done by only 
> clearing
> it's operation.
>
> Passes local stress testing, t1-5 and the deadlock is no longer 
> reproduce-able.
> Added a jtreg handshake + thread suspend test as a reproducer.
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
> Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/

src/hotspot/share/runtime/handshake.hpp
     No comments.

src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call from 
vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         function are made with the Threads_lock held:

         MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);

         Looks like the lock is grabbed because of
         possibly_vmthread_can_process_handshake() which asserts:

         L351:   // An externally suspended thread cannot be resumed 
while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false 
positives.
         L354:   assert(Threads_lock->owned_by_self(), "Not holding 
Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }

         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.

         So I was going to ask that you add:

         assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");

         after L359, but how about a comment instead:

         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().


src/hotspot/share/runtime/thread.hpp
     No comments.

src/hotspot/share/runtime/thread.cpp
     No comments.

src/hotspot/share/runtime/threadSMR.cpp
     No comments.

test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.

     L49:         while(!exit_now) {
         nit - please add a space before '('

     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='

         So why every other thread? A comment would be good...

     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.

     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.

Dan

>
> Thanks, Robbin



More information about the serviceability-dev mailing list