RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

Thu Jun 18 13:18:00 UTC 2020

On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote:
> On 2020/06/18 17:36, David Holmes wrote:
>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
>>> Hi David,
>>>
>>> Both ThreadsListHandle and ResourceMarks would use 
>>> `Thread::current()` for their resource. It is set as default 
>>> parameter in c'tor.
>>> Do you mean we should it explicitly in c'tor?
>>
>> Yes pass current_thread so we don't do the additional unnecessary 
>> calls to Thread::current().
> 
> Ok, I've fixed them. Could you review again?
> 
>    http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/

Updates look good. One nit I missed before:

src/hotspot/share/prims/jvmtiEnv.cpp

// It need to perform at safepoint for gathering stable data

please change to:

// This need to be performed at a safepoint to gather stable data

Thanks,
David

> 
> Thanks,
> 
> Yasumasa
> 
> 
>> David
>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2020/06/18 13:58, David Holmes wrote:
>>>> Hi Yasumasa,
>>>>
>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
>>>>> Hi Serguei,
>>>>>
>>>>> Thanks for your comment!
>>>>> I uploaded new webrev:
>>>>>
>>>>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/
>>>>>
>>>>> I'm not sure the following change is correct.
>>>>> Can we assume owning_thread is not NULL at safepoint?
>>>>
>>>> We can if "owner != NULL". So that change seem fine to me.
>>>>
>>>> But given this is now only executed at a safepoint there are 
>>>> additional simplifications that can be made:
>>>>
>>>> - current thread determination can be simplified:
>>>>
>>>> 945   Thread* current_thread = Thread::current();
>>>>
>>>> becomes:
>>>>
>>>>     Thread* current_thread = VMThread::vm_thread();
>>>>     assert(current_thread == Thread::current(), "must be");
>>>>
>>>> - these comments can be removed
>>>>
>>>>   994       // Use current thread since function can be called from a
>>>>   995       // JavaThread or the VMThread.
>>>> 1053       // Use current thread since function can be called from a
>>>> 1054       // JavaThread or the VMThread.
>>>>
>>>> - these TLH constructions should be passing current_thread (existing 
>>>> bug)
>>>>
>>>> 996       ThreadsListHandle tlh;
>>>> 1055       ThreadsListHandle tlh;
>>>>
>>>> - All ResourceMarks should be passing current_thread (existing bug)
>>>>
>>>>
>>>> Aside: there is a major inconsistency between the spec and 
>>>> implementation for this method. I've traced the history to see how 
>>>> this came about from JVMDI (ref JDK-4546581) but it never resulted 
>>>> in the JVM TI specification clearly stating what the 
>>>> waiters/waiter_count means. I will file a bug to have the spec 
>>>> clarified to match the implementation (even though I think the 
>>>> implementation is what is wrong). :(
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> All tests on submit repo and serviceability/jvmti and 
>>>>> vmTestbase/nsk/jvmti have been passed with this change.
>>>>>
>>>>>
>>>>> ```
>>>>>         // This monitor is owned so we have to find the owning 
>>>>> JavaThread.
>>>>>         owning_thread = 
>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>> -      // Cannot assume (owning_thread != NULL) here because this 
>>>>> function
>>>>> -      // may not have been called at a safepoint and the 
>>>>> owning_thread
>>>>> -      // might not be suspended.
>>>>> -      if (owning_thread != NULL) {
>>>>> -        // The monitor's owner either has to be the current 
>>>>> thread, at safepoint
>>>>> -        // or it has to be suspended. Any of these conditions will 
>>>>> prevent both
>>>>> -        // contending and waiting threads from modifying the state of
>>>>> -        // the monitor.
>>>>> -        if (!at_safepoint && 
>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) {
>>>>> -          // Don't worry! This return of 
>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED
>>>>> -          // will not make it back to the JVM/TI agent. The error 
>>>>> code will
>>>>> -          // get intercepted in JvmtiEnv::GetObjectMonitorUsage() 
>>>>> which
>>>>> -          // will retry the call via a VM_GetObjectMonitorUsage VM 
>>>>> op.
>>>>> -          return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>>>>> -        }
>>>>> -        HandleMark hm;
>>>>> +      assert(owning_thread != NULL, "owning JavaThread must not be 
>>>>> NULL");
>>>>>           Handle     th(current_thread, owning_thread->threadObj());
>>>>>           ret.owner = (jthread)jni_reference(calling_thread, th);
>>>>>
>>>>> ```
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> This fix is not enough.
>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two 
>>>>>> modes: in VMop and non-VMop.
>>>>>> The non-VMop mode has to be removed.
>>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>>
>>>>>>
>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote:
>>>>>>> (Change subject for RFR)
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I filed it to JBS and upload a webrev for it.
>>>>>>> Could you review it?
>>>>>>>
>>>>>>>   JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
>>>>>>>   webrev: 
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
>>>>>>>
>>>>>>> This change has passed tests on submit repo.
>>>>>>> Also I tested it with serviceability/jvmti and 
>>>>>>> vmTestbase/nsk/jvmti on Linux x64.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Yes. It seems we have a consensus.
>>>>>>>> Thank you for taking care about it.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/16/20 18:34, David Holmes wrote:
>>>>>>>>>> Ok, may I file it to JBS and fix it? 
>>>>>>>>>
>>>>>>>>> Go for it! :)
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>> Hi Dan, David and Yasumasa,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I wonder why 
>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does 
>>>>>>>>>>>>>>>>>>>>>> not perform at safepoint.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the 
>>>>>>>>>>>>>>>>>>>>> target is not suspended:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>>>>>>>>>    JavaThread* calling_thread = JavaThread::current();
>>>>>>>>>>>>>>>>>>>>>    jvmtiError err = 
>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, 
>>>>>>>>>>>>>>>>>>>>> info_ptr);
>>>>>>>>>>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>>>>>>>>>      // Some of the critical threads were not 
>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again
>>>>>>>>>>>>>>>>>>>>>      VM_GetObjectMonitorUsage op(this, 
>>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>      VMThread::execute(&op);
>>>>>>>>>>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>    return err;
>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases 
>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not 
>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage().
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor 
>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform 
>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner might 
>>>>>>>>>>>>>>>>>>>>>> be changed to others in subsequent code.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor 
>>>>>>>>>>>>>>>>>>>>>> before [2].
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner 
>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is 
>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If it 
>>>>>>>>>>>>>>>>>>>>> is not suspended we detect that and redo the whole 
>>>>>>>>>>>>>>>>>>>>> query at a safepoint.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume unfortunately 
>>>>>>>>>>>>>>>>>>>> after suspending check.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also 
>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the 
>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Which code is wrong?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the 
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>>>>>>>>>> has started the process of gathering the information 
>>>>>>>>>>>>>>>>>> while not at a
>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects 
>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is 
>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a 
>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just 
>>>>>>>>>>>>>>>>> after the code determined it was suspended. The monitor 
>>>>>>>>>>>>>>>>> can then be released and the information gathered not 
>>>>>>>>>>>>>>>>> only stale but potentially completely wrong as it could 
>>>>>>>>>>>>>>>>> now be owned by a different thread and will report that 
>>>>>>>>>>>>>>>>> thread's entry count.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as soon as
>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the 
>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation returns
>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target 
>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>> could have moved on.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. But 
>>>>>>>>>>>>>>> the expectation is that the information was actually an 
>>>>>>>>>>>>>>> accurate snapshot of the state of the monitor at some 
>>>>>>>>>>>>>>> point in time. The current code does not ensure that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think 
>>>>>>>>>>>>>> the info
>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the 
>>>>>>>>>>>>>> monitor
>>>>>>>>>>>>>> at some point in time".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no 
>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any 
>>>>>>>>>>>>> longer when the entry count is read, so straight away you 
>>>>>>>>>>>>> may have the wrong entry count information. The set of 
>>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the 
>>>>>>>>>>>>> monitor can change in unexpected ways. It would be possible 
>>>>>>>>>>>>> for instance to report the same thread as being the owner, 
>>>>>>>>>>>>> being blocked trying to enter the monitor, and being in the 
>>>>>>>>>>>>> wait-set of the monitor - apparently all at the same time!
>>>>>>>>>>>>>
>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete 
>>>>>>>>>>>>> atomicity because threads can join the set of threads 
>>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended).
>>>>>>>>>>>>
>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ suspended:
>>>>>>>>>>>>
>>>>>>>>>>>>   - GetObjectMonitorUsage() uses a safepoint to gather the 
>>>>>>>>>>>> info about
>>>>>>>>>>>>     the object's monitor. Since we're at a safepoint, the 
>>>>>>>>>>>> info that
>>>>>>>>>>>>     we are gathering cannot change until we return from the 
>>>>>>>>>>>> safepoint.
>>>>>>>>>>>>     It is a snapshot and a valid one at that.
>>>>>>>>>>>>
>>>>>>>>>>>> Consider the case when the monitor's owner is suspended:
>>>>>>>>>>>>
>>>>>>>>>>>>   - GetObjectMonitorUsage() will gather info about the object's
>>>>>>>>>>>>     monitor while _not_ at a safepoint. Assuming that no other
>>>>>>>>>>>>     thread is suspended, then entry_count can change because
>>>>>>>>>>>>     another thread can block on entry while we are gathering
>>>>>>>>>>>>     info. waiter_count and waiters can change if a thread was
>>>>>>>>>>>>     in a timed wait that has timed out and now that thread is
>>>>>>>>>>>>     blocked on re-entry. I don't think that notify_waiter_count
>>>>>>>>>>>>     and notify_waiters can change.
>>>>>>>>>>>>
>>>>>>>>>>>>     So in this case, the owner info and notify info is stable,
>>>>>>>>>>>>     but the entry_count and waiter info is not stable.
>>>>>>>>>>>>
>>>>>>>>>>>> Consider the case when the monitor is not owned:
>>>>>>>>>>>>
>>>>>>>>>>>>   - GetObjectMonitorUsage() will start to gather info about the
>>>>>>>>>>>>     object's monitor while _not_ at a safepoint. If it finds a
>>>>>>>>>>>>     thread on the entry queue that is not suspended, then it 
>>>>>>>>>>>> will
>>>>>>>>>>>>     bail out and redo the info gather at a safepoint. I just
>>>>>>>>>>>>     noticed that it doesn't check for suspension for the 
>>>>>>>>>>>> threads
>>>>>>>>>>>>     on the waiters list so a timed Object.wait() call can cause
>>>>>>>>>>>>     some confusion here.
>>>>>>>>>>>>
>>>>>>>>>>>>     So in this case, the owner info is not stable if a thread
>>>>>>>>>>>>     comes out of a timed wait and reenters the monitor. This
>>>>>>>>>>>>     case is no different than if a "barger" thread comes in
>>>>>>>>>>>>     after the NULL owner field is observed and enters the
>>>>>>>>>>>>     monitor. We'll return that there is no owner, a list of
>>>>>>>>>>>>     suspended pending entry thread and a list of waiting
>>>>>>>>>>>>     threads. The reality is that the object's monitor is
>>>>>>>>>>>>     owned by the "barger" that completely bypassed the entry
>>>>>>>>>>>>     queue by virtue of seeing the NULL owner field at exactly
>>>>>>>>>>>>     the right time.
>>>>>>>>>>>>
>>>>>>>>>>>> So the owner field is only stable when we have an owner. If
>>>>>>>>>>>> that owner is not suspended, then the other fields are also
>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the
>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable,
>>>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>>>
>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable
>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>>>>>>>>>> That's deterministic, but not without some work.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all
>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended
>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not
>>>>>>>>>>>> suspended. If either of those conditions is not true, then
>>>>>>>>>>>> the different pieces of info is unstable to varying degrees.
>>>>>>>>>>>>
>>>>>>>>>>>> As for this claim:
>>>>>>>>>>>>
>>>>>>>>>>>>> It would be possible for instance to report the same thread
>>>>>>>>>>>>> as being the owner, being blocked trying to enter the monitor,
>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at
>>>>>>>>>>>>> the same time! 
>>>>>>>>>>>>
>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the
>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we
>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't
>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since
>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not
>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is
>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the
>>>>>>>>>>>> wait queue.
>>>>>>>>>>>>
>>>>>>>>>>>> So the info instability of this API is bad, but it's not
>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Handshaking is not going to make this situation any better
>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>>>>>>>>>> handshake with the owner, the stability or instability of
>>>>>>>>>>>> the other fields remains the same as when SuspendThread is
>>>>>>>>>>>> used. Handshaking with all threads won't make the data as
>>>>>>>>>>>> stable as when at a safepoint because individual threads
>>>>>>>>>>>> can resume execution after doing their handshake so there
>>>>>>>>>>>> will still be field instability.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather
>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind.
>>>>>>>>>>>
>>>>>>>>>>> I agree with this.
>>>>>>>>>>> The advantages are:
>>>>>>>>>>>   - the result is stable
>>>>>>>>>>>   - the implementation can be simplified
>>>>>>>>>>>
>>>>>>>>>>> Performance impact is not very clear but should not be that
>>>>>>>>>>> big as suspending all the threads has some overhead too.
>>>>>>>>>>> I'm not sure if using handshakes can make performance better.
>>>>>>>>>>
>>>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Serguei
>>>>>>>>>>>
>>>>>>>>>>>> Dan
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The only way to make sure you don't have stale 
>>>>>>>>>>>>>>>> information is
>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps 
>>>>>>>>>>>>>>>> the doc
>>>>>>>>>>>>>>>> should have more clear about the possibility of 
>>>>>>>>>>>>>>>> returning stale
>>>>>>>>>>>>>>>> info. That's a question for Robert F.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being 
>>>>>>>>>>>>>>>>> suspended so I can't see how this could be construed as 
>>>>>>>>>>>>>>>>> an agent bug.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In your scenario above, you mention that the target 
>>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while the 
>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed after
>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before
>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), 
>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent 
>>>>>>>>>>>>>>>> should not
>>>>>>>>>>>>>>>> resume the target thread while also calling 
>>>>>>>>>>>>>>>> GetObjectMonitorUsage().
>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so agent 
>>>>>>>>>>>>>>>> bug.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an 
>>>>>>>>>>>>>>> independent resume, but you're right that doesn't really 
>>>>>>>>>>>>>>> make a lot of sense. But when the spec says nothing about 
>>>>>>>>>>>>>>> suspension ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And it is intentional that suspension is not required. 
>>>>>>>>>>>>>> JVM/DI and JVM/PI
>>>>>>>>>>>>>> used to require suspension for these kinds of get-the-info 
>>>>>>>>>>>>>> APIs. JVM/TI
>>>>>>>>>>>>>> intentionally was designed to not require suspension.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As I've said before, we could add a note about the data 
>>>>>>>>>>>>>> being potentially
>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like 
>>>>>>>>>>>>>> stat(2). You can
>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the 
>>>>>>>>>>>>>> info is current
>>>>>>>>>>>>>> by the time you process what you got back. Is it too much 
>>>>>>>>>>>>>> motherhood to
>>>>>>>>>>>>>> state that the data might be stale? I could go either way...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this 
>>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any 
>>>>>>>>>>>>>>>>> safepoints.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding 
>>>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>>>> handshakes in this thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread whilst 
>>>>>>>>>>>>>>> the monitor is queried. In effect the operation would 
>>>>>>>>>>>>>>> create a per-thread safepoint.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I "know" that, but I still need time to think about it and 
>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>> see the code to see if there are holes...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Semantically it is no different to the code actually 
>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do 
>>>>>>>>>>>>>>> that because suspends/resume don't nest.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we tracked 
>>>>>>>>>>>>>> internal and
>>>>>>>>>>>>>> external suspends separately. That was a nightmare...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to 
>>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the 
>>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in 
>>>>>>>>>>>>>>>>>>>> native [4].
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it can't 
>>>>>>>>>>>>>>>>>>> continue execution in the VM or in Java code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the assumed 
>>>>>>>>>>>>>>>>>>>>> common case where threads are first suspended and 
>>>>>>>>>>>>>>>>>>>>> then the monitors are queried.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from 
>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about the 
>>>>>>>>>>>>>>>>>>>> object's monitor."
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, 
>>>>>>>>>>>>>>>>>>> nothing to do with the spec.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect 
>>>>>>>>>>>>>>>>>>>> information in some case.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner 
>>>>>>>>>>>>>>>>>>>> might be just before wakeup.
>>>>>>>>>>>>>>>>>>>> So I think it is more safe if 
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in 
>>>>>>>>>>>>>>>>>>>> any case.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using 
>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will require 
>>>>>>>>>>>>>>>>>>> that the apparent owner is Handshake-safe (by 
>>>>>>>>>>>>>>>>>>> entering a handshake with it) before querying the 
>>>>>>>>>>>>>>>>>>> monitor. This would still be preferable I think to 
>>>>>>>>>>>>>>>>>>> always using a safepoint for the entire operation.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [3] 
>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [4] 
>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the 
>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended 
>>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the 
>>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we call:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), 
>>>>>>>>>>>>>>>>>>>>> owner);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and 
>>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a racy 
>>>>>>>>>>>>>>>>>>>>> way. This can't happen when suspension itself 
>>>>>>>>>>>>>>>>>>>>> requires a safepoint as the current thread won't go 
>>>>>>>>>>>>>>>>>>>>> to that safepoint during this code. However, if 
>>>>>>>>>>>>>>>>>>>>> suspension is implemented via a direct handshake 
>>>>>>>>>>>>>>>>>>>>> with the target thread then we have a problem.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>