RFR: 8233549: Thread interrupted state must only be accessed when not in a safepoint-safe state
Daniel D. Daugherty
daniel.daugherty at oracle.com
Thu Nov 14 22:33:34 UTC 2019
> Webrev updated in place.
Thumbs up.
Dan
On 11/14/19 5:21 PM, David Holmes wrote:
> Hi Serguei,
>
> Thanks for taking a look.
>
> On 15/11/2019 4:04 am, serguei.spitsyn at oracle.com wrote:
>> Hi David,
>>
>> It looks good to me.
>> A couple of nits below.
>>
>> http://cr.openjdk.java.net/~dholmes/8233549/webrev/src/hotspot/share/prims/jvmtiRawMonitor.cpp.frames.html
>>
>>
>> 236 if (self->is_Java_thread()) {
>> 237 JavaThread* jt = (JavaThread*) self;
>> 238 // Transition to VM so we can check interrupt state
>> 239 ThreadInVMfromNative tivm(jt);
>> 240 if (jt->is_interrupted(true)) {
>> 241 ret = M_INTERRUPTED;
>> 242 } else {
>> 243 ThreadBlockInVM tbivm(jt);
>> 244 jt->set_suspend_equivalent();
>> 245 if (millis <= 0) {
>> 246 self->_ParkEvent->park();
>> 247 } else {
>> 248 self->_ParkEvent->park(millis);
>> 249 }
>> 250 }
>> 251 // Return to VM before post-check of interrupt state
>> 252 if (jt->is_interrupted(true)) {
>> 253 ret = M_INTERRUPTED;
>> 254 }
>> 255 } else {
>>
>>
>> It seems, the fragment at lines 251-254 needs to bebefore the line 250.
>> It will add more clarity to this code.
>
> No, it has to be after line 250 as that is when we will hit the TBIVM
> destructor and so return to _thread_in_vm which is the state needed to
> read the interrupted field. Dan commented on the above and I changed
> it slightly by moving the comment:
>
> > 250 // Return to VM before post-check of interrupt state
> > 251 }
> > 252 if (jt->is_interrupted(true)) {
> > 253 ret = M_INTERRUPTED;
> > 254 }
>
>
>> 412 if (self->is_Java_thread()) {
>> 413 JavaThread* jt = (JavaThread*)self;
>> 414 jt->set_suspend_equivalent();
>> 415 for (;;) {
>> 416 if (!jt->handle_special_suspend_equivalent_condition()) {
>> 417 break;
>> 418 } else {
>> 419 // We've been suspended whilst waiting and so we have to
>> 420 // relinquish the raw monitor until we are resumed. Of course
>> 421 // after reacquiring we have to re-check for suspension again.
>> 422 // Suspension requires we are _thread_blocked, and we also have to
>> 423 // recheck for being interrupted.
>> 424 simple_exit(jt);
>> 425 {
>> 426 ThreadInVMfromNative tivm(jt);
>> 427 {
>> 428 ThreadBlockInVM tbivm(jt);
>> 429 jt->java_suspend_self();
>> 430 }
>> 431 if (jt->is_interrupted(true)) {
>> 432 ret = M_INTERRUPTED;
>> 433 }
>> 434 }
>> 435 simple_enter(jt);
>> 436 jt->set_suspend_equivalent();
>> 437 }
>> ...
>>
>> This code can be simplified a little bit.
>> The line:
>>
>> 414 jt->set_suspend_equivalent();
>>
>> can be placed before line 416.
>> Then this line can be removed:
>>
>> 436 jt->set_suspend_equivalent();
>
> Yes you're right. I was trying to preserve the original loop
> structure, but then had to add the additional set_suspend_equivalent
> for the first iteration. But I can instead just move the existing one
> to the top of the loop.
>
> Webrev updated in place.
>
> Thanks,
> David
> -----
>
>>
>> Thanks,
>> Serguei
>>
>>
>> On 11/11/19 20:52, David Holmes wrote:
>>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>>
>>> In JDK-8229516 I moved the interrupted state of a thread from the
>>> osThread in the VM to the java.lang.Thread instance. In doing that I
>>> overlooked a critical aspect, which is that to access the field of a
>>> Java object the JavaThread must not be in a safepoint-safe state** -
>>> otherwise the oop, and anything referenced there from could be
>>> relocated by the GC whilst the JavaThread is accessing it. This
>>> manifested in a number of tests using JVM TI Agent threads and JVM
>>> TI RawMonitors because the JavaThread's were marked _thread_blocked
>>> and hence safepoint-safe, and we read a non-zero value for the
>>> interrupted field even though we had never been interrupted.
>>>
>>> This problem existed in all the code that checks for interruption
>>> when "waiting":
>>>
>>> - Parker::park (the code underpinning
>>> java.util.concurrent.LockSupport.park())
>>>
>>> To fix this code I simply deleted a late check of the interrupted
>>> field. The check was not needed because if an interrupt has occurred
>>> then we will find the ParkEvent in a signalled state.
>>>
>>> - ObjectMonitor::wait
>>>
>>> Here the late check of the interrupted state is essential as we
>>> reset the ParkEvent after an earlier check of the interrupted state.
>>> But the fix was simply achieved by moving the check slightly earlier
>>> before we use ThreadBlockInVm to become _thread_blocked.
>>>
>>> - RawMonitor::wait
>>>
>>> This fix was much more involved. The RawMonitor code directly
>>> transitions the JavaThread from _thread_in_Native to
>>> _thread_blocked. This is safe from a safepoint perspective because
>>> they are equivalent safepoint-safe states. To allow access to the
>>> interrupted field I have to transition from native to _thread_in_vm,
>>> and that has to be done by proper thread-state transitions to ensure
>>> correct access to the oop and its fields. Having done that I can
>>> then use ThreadBlockInVM for the transitions to blocked. However, as
>>> the old code noted it can't use proper thread-state transitions as
>>> this will lead to deadlocks with the VMThread that can also use
>>> RawMonitors when executing various event callbacks. To deal with
>>> that we have to note that the real constraint is that the JavaThread
>>> cannot block at a safepoint whilst it holds the RawMonitor. Hence
>>> the fix was push all the interrupt checking code and the
>>> thread-state transitions to the lowest level of RawMonitorWait,
>>> around the final park() call, after we have enqueued the waiter and
>>> released the monitor. That avoids any deadlock possibility.
>>>
>>> I also added checks to is_interrupted/interrupted to ensure they are
>>> only called by a thread in a suitable state. This should only be the
>>> VMThread (as a consequence of the Thread.stop implementation
>>> occurring at a safepoint and issuing a JavaThread::interrupt() call
>>> to unblock the target); or a JavaThread that is not
>>> _thread_in_native or _thread_blocked.
>>>
>>> Testing: (still finalizing)
>>> - tiers 1 - 6 (Oracle platforms)
>>> - Local Linux testing
>>> - vmTestbase/nsk/monitoring/
>>> - vmTestbase/nsk/jdwp
>>> - vmTestbase/nsk/jdb/
>>> - vmTestbase/nsk/jdi/
>>> - vmTestbase/nsk/jvmti/
>>> - serviceability/jvmti/
>>> - serviceability/jdwp
>>> - JDK: java/lang/management
>>> com/sun/management
>>>
>>> ** Note that this applies to all accesses we make via code in
>>> javaClasses.*. For this particular code I thought about adding a
>>> guard in JavaThread::threadObj() but it turns out when we generate a
>>> crash report we access the Thread's name() field and that can happen
>>> when in any state, so we'd always trigger a secondary assertion
>>> failure during error reporting if we did that. Note that accessing
>>> name() can still easily lead to secondary assertions failures as I
>>> discovered when trying to debug this and print the thread name out -
>>> I would see an is_instance assertion fail checking that the Thread
>>> name() is an instance of java.lang.String!
>>>
>>> Thanks,
>>> David
>>> -----
>>
More information about the serviceability-dev
mailing list