RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

Thu Jun 18 08:36:23 UTC 2020

On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
> Hi David,
> 
> Both ThreadsListHandle and ResourceMarks would use `Thread::current()` 
> for their resource. It is set as default parameter in c'tor.
> Do you mean we should it explicitly in c'tor?

Yes pass current_thread so we don't do the additional unnecessary calls 
to Thread::current().

David

> 
> Thanks,
> 
> Yasumasa
> 
> 
> On 2020/06/18 13:58, David Holmes wrote:
>> Hi Yasumasa,
>>
>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
>>> Hi Serguei,
>>>
>>> Thanks for your comment!
>>> I uploaded new webrev:
>>>
>>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/
>>>
>>> I'm not sure the following change is correct.
>>> Can we assume owning_thread is not NULL at safepoint?
>>
>> We can if "owner != NULL". So that change seem fine to me.
>>
>> But given this is now only executed at a safepoint there are 
>> additional simplifications that can be made:
>>
>> - current thread determination can be simplified:
>>
>> 945   Thread* current_thread = Thread::current();
>>
>> becomes:
>>
>>     Thread* current_thread = VMThread::vm_thread();
>>     assert(current_thread == Thread::current(), "must be");
>>
>> - these comments can be removed
>>
>>   994       // Use current thread since function can be called from a
>>   995       // JavaThread or the VMThread.
>> 1053       // Use current thread since function can be called from a
>> 1054       // JavaThread or the VMThread.
>>
>> - these TLH constructions should be passing current_thread (existing bug)
>>
>> 996       ThreadsListHandle tlh;
>> 1055       ThreadsListHandle tlh;
>>
>> - All ResourceMarks should be passing current_thread (existing bug)
>>
>>
>> Aside: there is a major inconsistency between the spec and 
>> implementation for this method. I've traced the history to see how 
>> this came about from JVMDI (ref JDK-4546581) but it never resulted in 
>> the JVM TI specification clearly stating what the waiters/waiter_count 
>> means. I will file a bug to have the spec clarified to match the 
>> implementation (even though I think the implementation is what is 
>> wrong). :(
>>
>> Thanks,
>> David
>> -----
>>
>>> All tests on submit repo and serviceability/jvmti and 
>>> vmTestbase/nsk/jvmti have been passed with this change.
>>>
>>>
>>> ```
>>>         // This monitor is owned so we have to find the owning 
>>> JavaThread.
>>>         owning_thread = 
>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>> -      // Cannot assume (owning_thread != NULL) here because this 
>>> function
>>> -      // may not have been called at a safepoint and the owning_thread
>>> -      // might not be suspended.
>>> -      if (owning_thread != NULL) {
>>> -        // The monitor's owner either has to be the current thread, 
>>> at safepoint
>>> -        // or it has to be suspended. Any of these conditions will 
>>> prevent both
>>> -        // contending and waiting threads from modifying the state of
>>> -        // the monitor.
>>> -        if (!at_safepoint && 
>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) {
>>> -          // Don't worry! This return of 
>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED
>>> -          // will not make it back to the JVM/TI agent. The error 
>>> code will
>>> -          // get intercepted in JvmtiEnv::GetObjectMonitorUsage() which
>>> -          // will retry the call via a VM_GetObjectMonitorUsage VM op.
>>> -          return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>>> -        }
>>> -        HandleMark hm;
>>> +      assert(owning_thread != NULL, "owning JavaThread must not be 
>>> NULL");
>>>           Handle     th(current_thread, owning_thread->threadObj());
>>>           ret.owner = (jthread)jni_reference(calling_thread, th);
>>>
>>> ```
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote:
>>>> Hi Yasumasa,
>>>>
>>>> This fix is not enough.
>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two 
>>>> modes: in VMop and non-VMop.
>>>> The non-VMop mode has to be removed.
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote:
>>>>> (Change subject for RFR)
>>>>>
>>>>> Hi,
>>>>>
>>>>> I filed it to JBS and upload a webrev for it.
>>>>> Could you review it?
>>>>>
>>>>>   JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
>>>>>   webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
>>>>>
>>>>> This change has passed tests on submit repo.
>>>>> Also I tested it with serviceability/jvmti and vmTestbase/nsk/jvmti 
>>>>> on Linux x64.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote:
>>>>>> Yes. It seems we have a consensus.
>>>>>> Thank you for taking care about it.
>>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>>
>>>>>>
>>>>>> On 6/16/20 18:34, David Holmes wrote:
>>>>>>>> Ok, may I file it to JBS and fix it? 
>>>>>>>
>>>>>>> Go for it! :)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>>>>>>>> Hi Dan, David and Yasumasa,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I wonder why 
>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not 
>>>>>>>>>>>>>>>>>>>> perform at safepoint.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the 
>>>>>>>>>>>>>>>>>>> target is not suspended:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>>>>>>>    JavaThread* calling_thread = JavaThread::current();
>>>>>>>>>>>>>>>>>>>    jvmtiError err = 
>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, 
>>>>>>>>>>>>>>>>>>> info_ptr);
>>>>>>>>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>>>>>>>      // Some of the critical threads were not 
>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again
>>>>>>>>>>>>>>>>>>>      VM_GetObjectMonitorUsage op(this, 
>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr);
>>>>>>>>>>>>>>>>>>>      VMThread::execute(&op);
>>>>>>>>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>    return err;
>>>>>>>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases when 
>>>>>>>>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from 
>>>>>>>>>>>>>>>>>> get_object_monitor_usage().
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object 
>>>>>>>>>>>>>>>>>>>> at first [1], but it would perform concurrently.
>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be 
>>>>>>>>>>>>>>>>>>>> changed to others in subsequent code.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor 
>>>>>>>>>>>>>>>>>>>> before [2].
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner thread 
>>>>>>>>>>>>>>>>>>> it is either suspended or not. If it is suspended 
>>>>>>>>>>>>>>>>>>> then it cannot release the monitor. If it is not 
>>>>>>>>>>>>>>>>>>> suspended we detect that and redo the whole query at 
>>>>>>>>>>>>>>>>>>> a safepoint.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately 
>>>>>>>>>>>>>>>>>> after suspending check.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also 
>>>>>>>>>>>>>>>>> required a safepoint but it only requires the 
>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Which code is wrong?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the 
>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>>>>>>>> has started the process of gathering the information 
>>>>>>>>>>>>>>>> while not at a
>>>>>>>>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The code tries to make sure that it either collects data 
>>>>>>>>>>>>>>> about a monitor owned by a thread that is suspended, or 
>>>>>>>>>>>>>>> else it collects that data at a safepoint. But the owning 
>>>>>>>>>>>>>>> thread can be resumed just after the code determined it 
>>>>>>>>>>>>>>> was suspended. The monitor can then be released and the 
>>>>>>>>>>>>>>> information gathered not only stale but potentially 
>>>>>>>>>>>>>>> completely wrong as it could now be owned by a different 
>>>>>>>>>>>>>>> thread and will report that thread's entry count.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as
>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the information
>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns
>>>>>>>>>>>>>> from the safepoint that gathered the info, the target thread
>>>>>>>>>>>>>> could have moved on.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But 
>>>>>>>>>>>>> the expectation is that the information was actually an 
>>>>>>>>>>>>> accurate snapshot of the state of the monitor at some point 
>>>>>>>>>>>>> in time. The current code does not ensure that.
>>>>>>>>>>>>
>>>>>>>>>>>> Please explain. I clearly don't understand why you think the 
>>>>>>>>>>>> info
>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the 
>>>>>>>>>>>> monitor
>>>>>>>>>>>> at some point in time".
>>>>>>>>>>>
>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no 
>>>>>>>>>>> atomicity**. The reported owner thread may not own it any 
>>>>>>>>>>> longer when the entry count is read, so straight away you may 
>>>>>>>>>>> have the wrong entry count information. The set of threads 
>>>>>>>>>>> trying to acquire the monitor, or wait on the monitor can 
>>>>>>>>>>> change in unexpected ways. It would be possible for instance 
>>>>>>>>>>> to report the same thread as being the owner, being blocked 
>>>>>>>>>>> trying to enter the monitor, and being in the wait-set of the 
>>>>>>>>>>> monitor - apparently all at the same time!
>>>>>>>>>>>
>>>>>>>>>>> ** even if the owner is suspended we don't have complete 
>>>>>>>>>>> atomicity because threads can join the set of threads trying 
>>>>>>>>>>> to enter the monitor (unless they are all suspended).
>>>>>>>>>>
>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended:
>>>>>>>>>>
>>>>>>>>>>   - GetObjectMonitorUsage() uses a safepoint to gather the 
>>>>>>>>>> info about
>>>>>>>>>>     the object's monitor. Since we're at a safepoint, the info 
>>>>>>>>>> that
>>>>>>>>>>     we are gathering cannot change until we return from the 
>>>>>>>>>> safepoint.
>>>>>>>>>>     It is a snapshot and a valid one at that.
>>>>>>>>>>
>>>>>>>>>> Consider the case when the monitor's owner is suspended:
>>>>>>>>>>
>>>>>>>>>>   - GetObjectMonitorUsage() will gather info about the object's
>>>>>>>>>>     monitor while _not_ at a safepoint. Assuming that no other
>>>>>>>>>>     thread is suspended, then entry_count can change because
>>>>>>>>>>     another thread can block on entry while we are gathering
>>>>>>>>>>     info. waiter_count and waiters can change if a thread was
>>>>>>>>>>     in a timed wait that has timed out and now that thread is
>>>>>>>>>>     blocked on re-entry. I don't think that notify_waiter_count
>>>>>>>>>>     and notify_waiters can change.
>>>>>>>>>>
>>>>>>>>>>     So in this case, the owner info and notify info is stable,
>>>>>>>>>>     but the entry_count and waiter info is not stable.
>>>>>>>>>>
>>>>>>>>>> Consider the case when the monitor is not owned:
>>>>>>>>>>
>>>>>>>>>>   - GetObjectMonitorUsage() will start to gather info about the
>>>>>>>>>>     object's monitor while _not_ at a safepoint. If it finds a
>>>>>>>>>>     thread on the entry queue that is not suspended, then it will
>>>>>>>>>>     bail out and redo the info gather at a safepoint. I just
>>>>>>>>>>     noticed that it doesn't check for suspension for the threads
>>>>>>>>>>     on the waiters list so a timed Object.wait() call can cause
>>>>>>>>>>     some confusion here.
>>>>>>>>>>
>>>>>>>>>>     So in this case, the owner info is not stable if a thread
>>>>>>>>>>     comes out of a timed wait and reenters the monitor. This
>>>>>>>>>>     case is no different than if a "barger" thread comes in
>>>>>>>>>>     after the NULL owner field is observed and enters the
>>>>>>>>>>     monitor. We'll return that there is no owner, a list of
>>>>>>>>>>     suspended pending entry thread and a list of waiting
>>>>>>>>>>     threads. The reality is that the object's monitor is
>>>>>>>>>>     owned by the "barger" that completely bypassed the entry
>>>>>>>>>>     queue by virtue of seeing the NULL owner field at exactly
>>>>>>>>>>     the right time.
>>>>>>>>>>
>>>>>>>>>> So the owner field is only stable when we have an owner. If
>>>>>>>>>> that owner is not suspended, then the other fields are also
>>>>>>>>>> stable because we gathered the info at a safepoint. If the
>>>>>>>>>> owner is suspended, then the owner and notify info is stable,
>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>
>>>>>>>>>> If we have a NULL owner field, then the info is only stable
>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>>>>>>>> That's deterministic, but not without some work.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Okay so only when we gather the info at a safepoint is all
>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>>>>>>>> do that at a safepoint when the owner thread is not suspended
>>>>>>>>>> or if owner == NULL and one of the entry threads is not
>>>>>>>>>> suspended. If either of those conditions is not true, then
>>>>>>>>>> the different pieces of info is unstable to varying degrees.
>>>>>>>>>>
>>>>>>>>>> As for this claim:
>>>>>>>>>>
>>>>>>>>>>> It would be possible for instance to report the same thread
>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor,
>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at
>>>>>>>>>>> the same time! 
>>>>>>>>>>
>>>>>>>>>> I can't figure out a way to make that scenario work. If the
>>>>>>>>>> thread is seen as the owner and is not suspended, then we
>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't
>>>>>>>>>> then be seen as on the entry queue or on the wait queue since
>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not
>>>>>>>>>> suspended, then we gather info at a safepoint. If it is
>>>>>>>>>> suspended on the entry queue, then it can't be seen on the
>>>>>>>>>> wait queue.
>>>>>>>>>>
>>>>>>>>>> So the info instability of this API is bad, but it's not
>>>>>>>>>> quite that bad. :-) (That is a small mercy.)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Handshaking is not going to make this situation any better
>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>>>>>>>> handshake with the owner, the stability or instability of
>>>>>>>>>> the other fields remains the same as when SuspendThread is
>>>>>>>>>> used. Handshaking with all threads won't make the data as
>>>>>>>>>> stable as when at a safepoint because individual threads
>>>>>>>>>> can resume execution after doing their handshake so there
>>>>>>>>>> will still be field instability.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather
>>>>>>>>>> data at a safepoint. Yes, I've changed my mind.
>>>>>>>>>
>>>>>>>>> I agree with this.
>>>>>>>>> The advantages are:
>>>>>>>>>   - the result is stable
>>>>>>>>>   - the implementation can be simplified
>>>>>>>>>
>>>>>>>>> Performance impact is not very clear but should not be that
>>>>>>>>> big as suspending all the threads has some overhead too.
>>>>>>>>> I'm not sure if using handshakes can make performance better.
>>>>>>>>
>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The only way to make sure you don't have stale information is
>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the 
>>>>>>>>>>>>>> doc
>>>>>>>>>>>>>> should have more clear about the possibility of returning 
>>>>>>>>>>>>>> stale
>>>>>>>>>>>>>> info. That's a question for Robert F.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being 
>>>>>>>>>>>>>>> suspended so I can't see how this could be construed as 
>>>>>>>>>>>>>>> an agent bug.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In your scenario above, you mention that the target thread 
>>>>>>>>>>>>>> was
>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the 
>>>>>>>>>>>>>> target
>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after
>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before
>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All three of those calls: SuspendThread(), 
>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent 
>>>>>>>>>>>>>> should not
>>>>>>>>>>>>>> resume the target thread while also calling 
>>>>>>>>>>>>>> GetObjectMonitorUsage().
>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent bug.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps. I was thinking more generally about an independent 
>>>>>>>>>>>>> resume, but you're right that doesn't really make a lot of 
>>>>>>>>>>>>> sense. But when the spec says nothing about suspension ...
>>>>>>>>>>>>
>>>>>>>>>>>> And it is intentional that suspension is not required. 
>>>>>>>>>>>> JVM/DI and JVM/PI
>>>>>>>>>>>> used to require suspension for these kinds of get-the-info 
>>>>>>>>>>>> APIs. JVM/TI
>>>>>>>>>>>> intentionally was designed to not require suspension.
>>>>>>>>>>>>
>>>>>>>>>>>> As I've said before, we could add a note about the data 
>>>>>>>>>>>> being potentially
>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like 
>>>>>>>>>>>> stat(2). You can
>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the 
>>>>>>>>>>>> info is current
>>>>>>>>>>>> by the time you process what you got back. Is it too much 
>>>>>>>>>>>> motherhood to
>>>>>>>>>>>> state that the data might be stale? I could go either way...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this to 
>>>>>>>>>>>>>>> be fixed in the future without forcing/using any safepoints.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding 
>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>> handshakes in this thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst 
>>>>>>>>>>>>> the monitor is queried. In effect the operation would 
>>>>>>>>>>>>> create a per-thread safepoint.
>>>>>>>>>>>>
>>>>>>>>>>>> I "know" that, but I still need time to think about it and 
>>>>>>>>>>>> probably
>>>>>>>>>>>> see the code to see if there are holes...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Semantically it is no different to the code actually 
>>>>>>>>>>>>> suspending the owner thread, but it can't actually do that 
>>>>>>>>>>>>> because suspends/resume don't nest.
>>>>>>>>>>>>
>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked 
>>>>>>>>>>>> internal and
>>>>>>>>>>>> external suspends separately. That was a nightmare...
>>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to 
>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the thread 
>>>>>>>>>>>>>>>>>> is sleeping [3], or when it performs in native [4].
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't 
>>>>>>>>>>>>>>>>> continue execution in the VM or in Java code.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed 
>>>>>>>>>>>>>>>>>>> common case where threads are first suspended and 
>>>>>>>>>>>>>>>>>>> then the monitors are queried.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI 
>>>>>>>>>>>>>>>>>> spec - it just says "Get information about the 
>>>>>>>>>>>>>>>>>> object's monitor."
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, nothing 
>>>>>>>>>>>>>>>>> to do with the spec.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect 
>>>>>>>>>>>>>>>>>> information in some case.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner 
>>>>>>>>>>>>>>>>>> might be just before wakeup.
>>>>>>>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() 
>>>>>>>>>>>>>>>>>> is called at safepoint in any case.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using 
>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will require 
>>>>>>>>>>>>>>>>> that the apparent owner is Handshake-safe (by entering 
>>>>>>>>>>>>>>>>> a handshake with it) before querying the monitor. This 
>>>>>>>>>>>>>>>>> would still be preferable I think to always using a 
>>>>>>>>>>>>>>>>> safepoint for the entire operation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [3] 
>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [4] 
>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the thread 
>>>>>>>>>>>>>>>>>>> reported as the owner may not be suspended at the 
>>>>>>>>>>>>>>>>>>> time we first see it, and may release the monitor, 
>>>>>>>>>>>>>>>>>>> but then it may get suspended before we call:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), 
>>>>>>>>>>>>>>>>>>> owner);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and 
>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a racy 
>>>>>>>>>>>>>>>>>>> way. This can't happen when suspension itself 
>>>>>>>>>>>>>>>>>>> requires a safepoint as the current thread won't go 
>>>>>>>>>>>>>>>>>>> to that safepoint during this code. However, if 
>>>>>>>>>>>>>>>>>>> suspension is implemented via a direct handshake with 
>>>>>>>>>>>>>>>>>>> the target thread then we have a problem.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>