RFR: 8212933: Thread-SMR: requesting a VM operation whilst holding a ThreadsListHandle can cause deadlocks

Robbin Ehn robbin.ehn at oracle.com
Wed Oct 31 07:04:18 UTC 2018


Thanks Serguei, Robbin

On 10/29/18 6:35 PM, serguei.spitsyn at oracle.com wrote:
> Hi Robbin,
> 
> +1
> 
> Thanks,
> Serguei
> 
> On 10/29/18 06:52, Daniel D. Daugherty wrote:
>> On 10/28/18 4:08 PM, Robbin Ehn wrote:
>>> Hi Dan,
>>>
>>> Thanks for looking at this, here is the update:
>>> Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
>>
>> src/hotspot/share/runtime/handshake.cpp
>>     No comments.
>>
>> test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
>>     No comments.
>>
>> Thumbs up!
>>
>> Thanks for making the updates.
>>
>> Dan
>>
>>
>>
>>> Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/
>>>
>>> /Robbin
>>>
>>> On 26/10/2018 17:38, Daniel D. Daugherty wrote:
>>>> On 10/26/18 10:33 AM, Robbin Ehn wrote:
>>>>> Hi, please review.
>>>>>
>>>>> When the VM thread executes a handshake it uses different ThreadsLists during
>>>>> the execution. A JavaThread that is armed for the handshake when it is already
>>>>> in the exit path in VM will cancel the handshake. Even if the VM thread cannot
>>>>> see this thread after the initial ThreadsList which where used for arming, the
>>>>> handshake can progress when the exiting thread cancels the handshake.
>>>>>
>>>>> But if a third thread takes a ThreadsList where the exiting JavaThread is 
>>>>> present and tries to execute a VM operation, hence waiting on VM thread to 
>>>>> finish the handshake, the JavaThread in the exit path can never reach the 
>>>>> handshake cancellation point. VM thread cannot finishes the handshake and 
>>>>> the third thread is stuck waiting on the VM thread.
>>>>>
>>>>> To allow holding a ThreadsList when executing a VM operation we instead let 
>>>>> the
>>>>> VM thread use the same ThreadsList over the entire handshake making all armed
>>>>> threads visible to the VM thread at all time. And if VM thread spots a 
>>>>> terminated thread it will count that thread is already done by only clearing
>>>>> it's operation.
>>>>>
>>>>> Passes local stress testing, t1-5 and the deadlock is no longer 
>>>>> reproduce-able.
>>>>> Added a jtreg handshake + thread suspend test as a reproducer.
>>>>>
>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
>>>>> Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/
>>>>
>>>> src/hotspot/share/runtime/handshake.hpp
>>>>      No comments.
>>>>
>>>> src/hotspot/share/runtime/handshake.cpp
>>>>      L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
>>>>      L359:   assert(Thread::current()->is_VM_thread(), "should call from vm 
>>>> thread");
>>>>          Both calls to handshake_process_by_vmthread() which calls this
>>>>          function are made with the Threads_lock held:
>>>>
>>>>          MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);
>>>>
>>>>          Looks like the lock is grabbed because of
>>>>          possibly_vmthread_can_process_handshake() which asserts:
>>>>
>>>>          L351:   // An externally suspended thread cannot be resumed while the
>>>>          L352:   // Threads_lock is held so it is safe.
>>>>          L353:   // Note that this method is allowed to produce false 
>>>> positives.
>>>>          L354:   assert(Threads_lock->owned_by_self(), "Not holding 
>>>> Threads_lock.");
>>>>          L355:   if (target->is_ext_suspended()) {
>>>>          L356:     return true;
>>>>          L357:   }
>>>>
>>>>          Also looks like vmthread_can_process_handshake() needs the
>>>>          Threads_lock for the same externally suspended thread check.
>>>>
>>>>          So I was going to ask that you add:
>>>>
>>>>          assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
>>>>
>>>>          after L359, but how about a comment instead:
>>>>
>>>>          // Threads_lock must be held here, but that is assert()ed in
>>>>          // possibly_vmthread_can_process_handshake().
>>>>
>>>>
>>>> src/hotspot/share/runtime/thread.hpp
>>>>      No comments.
>>>>
>>>> src/hotspot/share/runtime/thread.cpp
>>>>      No comments.
>>>>
>>>> src/hotspot/share/runtime/threadSMR.cpp
>>>>      No comments.
>>>>
>>>> test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
>>>>      Very nice test! It specifically exercises ThreadLocalHandshakes
>>>>      with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
>>>>      only ran into this bug by accident (JDK-8212152) so I like the
>>>>      targeted test.
>>>>
>>>>      L49:         while(!exit_now) {
>>>>          nit - please add a space before '('
>>>>
>>>>      L51:             for (int i = 0; i < _threads.length; i+=2) {
>>>>      L58:             for (int i = 0; i < _threads.length; i+=2) {
>>>>          nit - please added spaces around '+='
>>>>
>>>>          So why every other thread? A comment would be good...
>>>>
>>>>      L52:                 wb.handshakeWalkStack(null, true);
>>>>          I'm guessing the 'null' parameter means current thread, but
>>>>          that's a guess on my part. A comment would be good.
>>>>
>>>>      L82:         for (int i = 0; i < _threads.length; i++) {
>>>>      L83:             _threads[i].join();
>>>>      L84:         }
>>>>          Thanks for cleaning up the test_threads. That will make
>>>>          the JTREG thread sweeper happy. However, you don't save
>>>>          the test_exit_thread references and you don't clean those
>>>>          up either. Yes, I realize that they are supposed to exit,
>>>>          but if something hangs up on exit, I'd rather have a join()
>>>>          hang failure in this test's code than have the JTREG thread
>>>>          sweeper catch it.
>>>>
>>>> Dan
>>>>
>>>>>
>>>>> Thanks, Robbin
>>>>
>>
> 


More information about the serviceability-dev mailing list