RFR 8234613: JavaThread can escape back to Java from an ongoing handshake

Sat Nov 23 01:12:20 UTC 2019

Hi Dan,

On 11/22/19 6:10 PM, Daniel D. Daugherty wrote:
> Hi Patricio,
>
>
> On 11/22/19 1:25 PM, Patricio Chilano wrote:
>> Hi,
>>
>> This patch aims to address a current bug where, given the right 
>> combination of handshakes and external suspend/resume, a JavaThread 
>> can transition from a safe state back to Java without blocking for a 
>> still-in-progress handshake. In the description of the bug I added an 
>> example, tracing the state changes of the JavaThread as it goes 
>> through the different transitions until it escapes the handshake. 
>> Currently, the window of time for this issue to happen is so small 
>> that we do not see actual failures running tests. Running test 
>> SuspendAtExit.java and adding some small delay before restoring the 
>> JavaThread state in java_suspend_self_with_safepoint_check() can 
>> demonstrate the issue.
>> The proposed fix is to check again if we have a pending/in-progress 
>> handshake operation after executing ~ThreadInVMForHandshake().
>>
>> Tested with mach5, tiers1-6 on all platforms (Linux, macOS, Windows 
>> and Solaris).
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8234613
>> Webrev: http://cr.openjdk.java.net/~pchilanomate/8234613/v01/webrev/
>
> src/hotspot/share/runtime/handshake.cpp
>     No comments.
>
> Thumbs up!
>
> Nice job on the write up in the bug.
Thanks!  : )

> I think I grok the fix. This is very much like suspend thread loops.
> As we are coming out of our block after being resumed, we have to
> check for another pending suspend request that was made after we
> were resumed and while we were in the process of unblocking... These
> async protocols are tricky.
>
> One last question: If a delay is added to the existing baseline code,
> would SuspendAtExit.java fail? I'm trying to figure out if this race
> is possible without your pending work for JDK-8232733.
Yes, it can fail in the current baseline, it is just unlikely so you 
have to manually add a short sleep to see it. For example, if I just add 
to the current baseline the line os::naked_short_nanosleep(100000) 
before set_thread_state_fence(state) in 
JavaThread::java_suspend_self_with_safepoint_check(), that test crashes 
when running on my Mac after 1-2 attempts. I tried to play with the 
timing a little bit in Linux too but couldn't make it fail. 8232733 will 
just make this issue more visible, since the JavaThread that is being 
handshaked could be resumed at any time during the handshake. Today, a 
JavaThread can only be resumed either before or after the handshake. So 
the issue only appears for the "before" case, when the VMThread trying 
to process the handshake sees that the JavaThread is blocked right after 
it is resumed but before its original state is restored (combined with 
the fact that the JavaThread suspended itself while polling inside the 
~ThreadInVMForHandshake()).

Thanks for reviewing this Dan!

Patricio
> Dan
>
>
>>
>> Thanks,
>> Patricio
>