RFR: 8233549: Thread interrupted state must only be accessed when not in a safepoint-safe state

Thu Nov 14 22:21:39 UTC 2019

Hi Serguei,

Thanks for taking a look.

On 15/11/2019 4:04 am, serguei.spitsyn at oracle.com wrote:
> Hi David,
> 
> It looks good to me.
> A couple of nits below.
> 
> http://cr.openjdk.java.net/~dholmes/8233549/webrev/src/hotspot/share/prims/jvmtiRawMonitor.cpp.frames.html
> 
> 236 if (self->is_Java_thread()) {
> 237 JavaThread* jt = (JavaThread*) self;
> 238 // Transition to VM so we can check interrupt state
> 239 ThreadInVMfromNative tivm(jt);
> 240 if (jt->is_interrupted(true)) {
> 241 ret = M_INTERRUPTED;
> 242 } else {
> 243 ThreadBlockInVM tbivm(jt);
> 244 jt->set_suspend_equivalent();
> 245 if (millis <= 0) {
> 246 self->_ParkEvent->park();
> 247 } else {
> 248 self->_ParkEvent->park(millis);
> 249 }
> 250 }
> 251 // Return to VM before post-check of interrupt state
> 252 if (jt->is_interrupted(true)) {
> 253 ret = M_INTERRUPTED;
> 254 }
> 255 } else {
> 
> 
> It seems, the fragment at lines 251-254 needs to bebefore the line 250.
> It will add more clarity to this code.

No, it has to be after line 250 as that is when we will hit the TBIVM 
destructor and so return to _thread_in_vm which is the state needed to 
read the interrupted field. Dan commented on the above and I changed it 
slightly by moving the comment:

 > 250   // Return to VM before post-check of interrupt state
 > 251 }
 > 252 if (jt->is_interrupted(true)) {
 > 253   ret = M_INTERRUPTED;
 > 254 }

>   412   if (self->is_Java_thread()) {
> 413 JavaThread* jt = (JavaThread*)self;
> 414 jt->set_suspend_equivalent();
>   415     for (;;) {
>   416       if (!jt->handle_special_suspend_equivalent_condition()) {
>   417         break;
> 418 } else {
> 419 // We've been suspended whilst waiting and so we have to
> 420 // relinquish the raw monitor until we are resumed. Of course
> 421 // after reacquiring we have to re-check for suspension again.
> 422 // Suspension requires we are _thread_blocked, and we also have to
> 423 // recheck for being interrupted.
>   424         simple_exit(jt);
> 425 {
> 426 ThreadInVMfromNative tivm(jt);
> 427 {
> 428 ThreadBlockInVM tbivm(jt);
>   429             jt->java_suspend_self();
> 430 }
> 431 if (jt->is_interrupted(true)) {
> 432 ret = M_INTERRUPTED;
> 433 }
> 434 }
>   435         simple_enter(jt);
>   436         jt->set_suspend_equivalent();
>   437       }
>   ...
> 
> This code can be simplified a little bit.
> The line:
> 
> 414 jt->set_suspend_equivalent();
> 
> can be placed before line 416.
> Then this line can be removed:
> 
>   436         jt->set_suspend_equivalent();

Yes you're right. I was trying to preserve the original loop structure, 
but then had to add the additional set_suspend_equivalent for the first 
iteration. But I can instead just move the existing one to the top of 
the loop.

Webrev updated in place.

Thanks,
David
-----

> 
> Thanks,
> Serguei
> 
> 
> On 11/11/19 20:52, David Holmes wrote:
>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>
>> In JDK-8229516 I moved the interrupted state of a thread from the 
>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>> overlooked a critical aspect, which is that to access the field of a 
>> Java object the JavaThread must not be in a safepoint-safe state** - 
>> otherwise the oop, and anything referenced there from could be 
>> relocated by the GC whilst the JavaThread is accessing it. This 
>> manifested in a number of tests using JVM TI Agent threads and JVM TI 
>> RawMonitors because the JavaThread's were marked _thread_blocked and 
>> hence safepoint-safe, and we read a non-zero value for the interrupted 
>> field even though we had never been interrupted.
>>
>> This problem existed in all the code that checks for interruption when 
>> "waiting":
>>
>> - Parker::park (the code underpinning 
>> java.util.concurrent.LockSupport.park())
>>
>> To fix this code I simply deleted a late check of the interrupted 
>> field. The check was not needed because if an interrupt has occurred 
>> then we will find the ParkEvent in a signalled state.
>>
>> - ObjectMonitor::wait
>>
>> Here the late check of the interrupted state is essential as we reset 
>> the ParkEvent after an earlier check of the interrupted state. But the 
>> fix was simply achieved by moving the check slightly earlier before we 
>> use ThreadBlockInVm to become _thread_blocked.
>>
>> - RawMonitor::wait
>>
>> This fix was much more involved. The RawMonitor code directly 
>> transitions the JavaThread from _thread_in_Native to _thread_blocked. 
>> This is safe from a safepoint perspective because they are equivalent 
>> safepoint-safe states. To allow access to the interrupted field I have 
>> to transition from native to _thread_in_vm, and that has to be done by 
>> proper thread-state transitions to ensure correct access to the oop 
>> and its fields. Having done that I can then use ThreadBlockInVM for 
>> the transitions to blocked. However, as the old code noted it can't 
>> use proper thread-state transitions as this will lead to deadlocks 
>> with the VMThread that can also use RawMonitors when executing various 
>> event callbacks. To deal with that we have to note that the real 
>> constraint is that the JavaThread cannot block at a safepoint whilst 
>> it holds the RawMonitor. Hence the fix was push all the interrupt 
>> checking code and the thread-state transitions to the lowest level of 
>> RawMonitorWait, around the final park() call, after we have enqueued 
>> the waiter and released the monitor. That avoids any deadlock 
>> possibility.
>>
>> I also added checks to is_interrupted/interrupted to ensure they are 
>> only called by a thread in a suitable state. This should only be the 
>> VMThread (as a consequence of the Thread.stop implementation occurring 
>> at a safepoint and issuing a JavaThread::interrupt() call to unblock 
>> the target); or a JavaThread that is not _thread_in_native or 
>> _thread_blocked.
>>
>> Testing: (still finalizing)
>>  - tiers 1 - 6 (Oracle platforms)
>>  - Local Linux testing
>>   - vmTestbase/nsk/monitoring/
>>   - vmTestbase/nsk/jdwp
>>   - vmTestbase/nsk/jdb/
>>   - vmTestbase/nsk/jdi/
>>   - vmTestbase/nsk/jvmti/
>>   - serviceability/jvmti/
>>   - serviceability/jdwp
>>   - JDK: java/lang/management
>>          com/sun/management
>>
>> ** Note that this applies to all accesses we make via code in 
>> javaClasses.*. For this particular code I thought about adding a guard 
>> in JavaThread::threadObj() but it turns out when we generate a crash 
>> report we access the Thread's name() field and that can happen when in 
>> any state, so we'd always trigger a secondary assertion failure during 
>> error reporting if we did that. Note that accessing name() can still 
>> easily lead to secondary assertions failures as I discovered when 
>> trying to debug this and print the thread name out - I would see an 
>> is_instance assertion fail checking that the Thread name() is an 
>> instance of java.lang.String!
>>
>> Thanks,
>> David
>> -----
>