RFR: 8233549: Thread interrupted state must only be accessed when not in a safepoint-safe state

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Nov 13 14:17:58 UTC 2019


On 11/12/19 5:50 PM, David Holmes wrote:
> Hi Dan,
>
> Thanks for taking a look so quickly!

Your welcome! I figured you would prefer to get this one out of
the way quickly.


>
> On 13/11/2019 3:18 am, Daniel D. Daugherty wrote:
>> On 11/11/19 11:52 PM, David Holmes wrote:
>>> webrev: http://cr.openjdk.java.net/~dholmes/8233549/webrev/
>>
>> src/hotspot/os/posix/os_posix.cpp
>>      L2078:   // Can't access interrupt state now we are 
>> _thread_blocked. If we've been
>>      L2079:   // interrupted since we checked above then _counter 
>> will be > 0.
>>          nit - grammar. Please consider:
>>               // Can't access interrupt state now that we are 
>> _thread_blocked. If we've
>>               // been interrupted since we checked above then 
>> _counter will be > 0.
>>
>> src/hotspot/os/solaris/os_solaris.cpp
>>      L4924:   // Can't access interrupt state now we are 
>> _thread_blocked. If we've been
>>      L4925:   // interrupted since we checked above then _counter 
>> will be > 0.
>>          nit - grammar. Please consider:
>>               // Can't access interrupt state now that we are 
>> _thread_blocked. If we've
>>               // been interrupted since we checked above then 
>> _counter will be > 0.
>
> Will fix grammatical nits.
>
>> src/hotspot/share/classfile/javaClasses.cpp
>>      No comments.
>>
>> src/hotspot/share/prims/jvmtiEnv.cpp
>>      Hmmm... did the "non-JavaThread can't be interrupted" check also 
>> get
>>      pushed down?
>>      Update: Similar check is now in JvmtiRawMonitor::raw_wait().
>>
>> src/hotspot/share/prims/jvmtiRawMonitor.cpp
>>      L239:     ThreadInVMfromNative tivm(jt);
>>      L240:     if (jt->is_interrupted(true)) {
>>      L241:         ret = M_INTERRUPTED;
>>      L242:     } else {
>>      L243:       ThreadBlockInVM tbivm(jt);
>>      L244:       jt->set_suspend_equivalent();
>>      L245:       if (millis <= 0) {
>>      L246:         self->_ParkEvent->park();
>>      L247:       } else {
>>      L248:         self->_ParkEvent->park(millis);
>>      L249:       }
>>      L250:     }
>>      L251:     // Return to VM before post-check of interrupt state
>>      L252:     if (jt->is_interrupted(true)) {
>>          The comment on L251 is better between L249 and L250 since that
>>          is where 'tbivm' gets destroyed and you transition back.
>>
>>          You could have this comment before L252:
>>
>>                 // Must be in VM to safely access interrupt state:
>>
>>          if you think you really need a comment there.
>
> Will move comment up as suggested.
>
>> src/hotspot/share/prims/jvmtiRawMonitor.hpp
>>      No comments.
>>
>> src/hotspot/share/runtime/objectMonitor.cpp
>>      You've moved the is_interrupted() check from after ThreadBlockInVM
>>      to before it. ThreadBlockInVM can block for a safepoint which 
>> widens
>>      the window for an interrupt to come in after the check on L1272 and
>>      and before the thread parks on L1286 or L1288.
>>
>>      Can this result in an unexpected park() where before we would have
>>      taken the "Intentionally empty" code path on L1283?
>>
>>      What I'm worried about is whether we've opened a window where we
>>      do Object.wait(0) and that wait() is supposed to be interrupted.
>>      However, we lose that interrupt because it arrives in the now wider
>>      window between L1272 and L1286 and we never return from the 
>> wait(0).
>>
>>      It is possible that I'm not remembering something about how 
>> interrupt()
>>      interacts with park().
>
> The interrupt() not only sets the field but also issues an unpark() to 
> the ParkEvent. So if we are interrupted whilst processing through the 
> TBIVM, the call to park() will return immediately as the ParkEvent 
> will be in the signalled state.

That was the piece I wasn't remembering. Thanks for filling in the detail.


>
>> test/hotspot/jtreg/ProblemList.txt
>>      Thanks for remembering to update the ProblemList.
>>
>> The only part I'm worried about is ObjectMonitor::wait(). If my worry is
>> baseless, then thumbs up.
>
> Worry is baseless :)

Agreed!


>
>> I have a couple of nits above. If you choose to fix those, then I don't
>> need to see another webrev.
>
> Thanks again!

You're welcome.

Dan

>
> David
> -----
>
>> Dan
>>
>>
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8233549
>>>
>>> In JDK-8229516 I moved the interrupted state of a thread from the 
>>> osThread in the VM to the java.lang.Thread instance. In doing that I 
>>> overlooked a critical aspect, which is that to access the field of a 
>>> Java object the JavaThread must not be in a safepoint-safe state** - 
>>> otherwise the oop, and anything referenced there from could be 
>>> relocated by the GC whilst the JavaThread is accessing it. This 
>>> manifested in a number of tests using JVM TI Agent threads and JVM 
>>> TI RawMonitors because the JavaThread's were marked _thread_blocked 
>>> and hence safepoint-safe, and we read a non-zero value for the 
>>> interrupted field even though we had never been interrupted.
>>>
>>> This problem existed in all the code that checks for interruption 
>>> when "waiting":
>>>
>>> - Parker::park (the code underpinning 
>>> java.util.concurrent.LockSupport.park())
>>>
>>> To fix this code I simply deleted a late check of the interrupted 
>>> field. The check was not needed because if an interrupt has occurred 
>>> then we will find the ParkEvent in a signalled state.
>>>
>>> - ObjectMonitor::wait
>>>
>>> Here the late check of the interrupted state is essential as we 
>>> reset the ParkEvent after an earlier check of the interrupted state. 
>>> But the fix was simply achieved by moving the check slightly earlier 
>>> before we use ThreadBlockInVm to become _thread_blocked.
>>>
>>> - RawMonitor::wait
>>>
>>> This fix was much more involved. The RawMonitor code directly 
>>> transitions the JavaThread from _thread_in_Native to 
>>> _thread_blocked. This is safe from a safepoint perspective because 
>>> they are equivalent safepoint-safe states. To allow access to the 
>>> interrupted field I have to transition from native to _thread_in_vm, 
>>> and that has to be done by proper thread-state transitions to ensure 
>>> correct access to the oop and its fields. Having done that I can 
>>> then use ThreadBlockInVM for the transitions to blocked. However, as 
>>> the old code noted it can't use proper thread-state transitions as 
>>> this will lead to deadlocks with the VMThread that can also use 
>>> RawMonitors when executing various event callbacks. To deal with 
>>> that we have to note that the real constraint is that the JavaThread 
>>> cannot block at a safepoint whilst it holds the RawMonitor. Hence 
>>> the fix was push all the interrupt checking code and the 
>>> thread-state transitions to the lowest level of RawMonitorWait, 
>>> around the final park() call, after we have enqueued the waiter and 
>>> released the monitor. That avoids any deadlock possibility.
>>>
>>> I also added checks to is_interrupted/interrupted to ensure they are 
>>> only called by a thread in a suitable state. This should only be the 
>>> VMThread (as a consequence of the Thread.stop implementation 
>>> occurring at a safepoint and issuing a JavaThread::interrupt() call 
>>> to unblock the target); or a JavaThread that is not 
>>> _thread_in_native or _thread_blocked.
>>>
>>> Testing: (still finalizing)
>>>  - tiers 1 - 6 (Oracle platforms)
>>>  - Local Linux testing
>>>   - vmTestbase/nsk/monitoring/
>>>   - vmTestbase/nsk/jdwp
>>>   - vmTestbase/nsk/jdb/
>>>   - vmTestbase/nsk/jdi/
>>>   - vmTestbase/nsk/jvmti/
>>>   - serviceability/jvmti/
>>>   - serviceability/jdwp
>>>   - JDK: java/lang/management
>>>          com/sun/management
>>>
>>> ** Note that this applies to all accesses we make via code in 
>>> javaClasses.*. For this particular code I thought about adding a 
>>> guard in JavaThread::threadObj() but it turns out when we generate a 
>>> crash report we access the Thread's name() field and that can happen 
>>> when in any state, so we'd always trigger a secondary assertion 
>>> failure during error reporting if we did that. Note that accessing 
>>> name() can still easily lead to secondary assertions failures as I 
>>> discovered when trying to debug this and print the thread name out - 
>>> I would see an is_instance assertion fail checking that the Thread 
>>> name() is an instance of java.lang.String!
>>>
>>> Thanks,
>>> David
>>> -----
>>



More information about the hotspot-runtime-dev mailing list