RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

Thu Jun 18 14:04:19 UTC 2020

On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote:
> On 6/18/20 9:18 AM, David Holmes wrote:
>> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote:
>>> On 2020/06/18 17:36, David Holmes wrote:
>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
>>>>> Hi David,
>>>>>
>>>>> Both ThreadsListHandle and ResourceMarks would use 
>>>>> `Thread::current()` for their resource. It is set as default 
>>>>> parameter in c'tor.
>>>>> Do you mean we should it explicitly in c'tor?
>>>>
>>>> Yes pass current_thread so we don't do the additional unnecessary 
>>>> calls to Thread::current().
>>>
>>> Ok, I've fixed them. Could you review again?
>>>
>>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/
>>
>> Updates look good. One nit I missed before:
>>
>> src/hotspot/share/prims/jvmtiEnv.cpp
>>
>> // It need to perform at safepoint for gathering stable data
>>
>> please change to:
>>
>> // This need to be performed at a safepoint to gather stable data
> 
> Just a comment on this comment... I still haven't gotten to the webrev 
> yet...
> 
> Perhaps:
> 
>      // This needs to be performed at a safepoint to gather stable data.

There is a second line that continues the sentence

// because monitor owner / waiters might not be suspended.

David
-----

> Dan
>>
>> Thanks,
>> David
>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>>> David
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2020/06/18 13:58, David Holmes wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> Thanks for your comment!
>>>>>>> I uploaded new webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/
>>>>>>>
>>>>>>> I'm not sure the following change is correct.
>>>>>>> Can we assume owning_thread is not NULL at safepoint?
>>>>>>
>>>>>> We can if "owner != NULL". So that change seem fine to me.
>>>>>>
>>>>>> But given this is now only executed at a safepoint there are 
>>>>>> additional simplifications that can be made:
>>>>>>
>>>>>> - current thread determination can be simplified:
>>>>>>
>>>>>> 945   Thread* current_thread = Thread::current();
>>>>>>
>>>>>> becomes:
>>>>>>
>>>>>>     Thread* current_thread = VMThread::vm_thread();
>>>>>>     assert(current_thread == Thread::current(), "must be");
>>>>>>
>>>>>> - these comments can be removed
>>>>>>
>>>>>>   994       // Use current thread since function can be called from a
>>>>>>   995       // JavaThread or the VMThread.
>>>>>> 1053       // Use current thread since function can be called from a
>>>>>> 1054       // JavaThread or the VMThread.
>>>>>>
>>>>>> - these TLH constructions should be passing current_thread 
>>>>>> (existing bug)
>>>>>>
>>>>>> 996       ThreadsListHandle tlh;
>>>>>> 1055       ThreadsListHandle tlh;
>>>>>>
>>>>>> - All ResourceMarks should be passing current_thread (existing bug)
>>>>>>
>>>>>>
>>>>>> Aside: there is a major inconsistency between the spec and 
>>>>>> implementation for this method. I've traced the history to see how 
>>>>>> this came about from JVMDI (ref JDK-4546581) but it never resulted 
>>>>>> in the JVM TI specification clearly stating what the 
>>>>>> waiters/waiter_count means. I will file a bug to have the spec 
>>>>>> clarified to match the implementation (even though I think the 
>>>>>> implementation is what is wrong). :(
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> All tests on submit repo and serviceability/jvmti and 
>>>>>>> vmTestbase/nsk/jvmti have been passed with this change.
>>>>>>>
>>>>>>>
>>>>>>> ```
>>>>>>>         // This monitor is owned so we have to find the owning 
>>>>>>> JavaThread.
>>>>>>>         owning_thread = 
>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>>> -      // Cannot assume (owning_thread != NULL) here because this 
>>>>>>> function
>>>>>>> -      // may not have been called at a safepoint and the 
>>>>>>> owning_thread
>>>>>>> -      // might not be suspended.
>>>>>>> -      if (owning_thread != NULL) {
>>>>>>> -        // The monitor's owner either has to be the current 
>>>>>>> thread, at safepoint
>>>>>>> -        // or it has to be suspended. Any of these conditions 
>>>>>>> will prevent both
>>>>>>> -        // contending and waiting threads from modifying the 
>>>>>>> state of
>>>>>>> -        // the monitor.
>>>>>>> -        if (!at_safepoint && 
>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) {
>>>>>>> -          // Don't worry! This return of 
>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED
>>>>>>> -          // will not make it back to the JVM/TI agent. The 
>>>>>>> error code will
>>>>>>> -          // get intercepted in 
>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which
>>>>>>> -          // will retry the call via a VM_GetObjectMonitorUsage 
>>>>>>> VM op.
>>>>>>> -          return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>>>>>>> -        }
>>>>>>> -        HandleMark hm;
>>>>>>> +      assert(owning_thread != NULL, "owning JavaThread must not 
>>>>>>> be NULL");
>>>>>>>           Handle     th(current_thread, owning_thread->threadObj());
>>>>>>>           ret.owner = (jthread)jni_reference(calling_thread, th);
>>>>>>>
>>>>>>> ```
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> This fix is not enough.
>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two 
>>>>>>>> modes: in VMop and non-VMop.
>>>>>>>> The non-VMop mode has to be removed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote:
>>>>>>>>> (Change subject for RFR)
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I filed it to JBS and upload a webrev for it.
>>>>>>>>> Could you review it?
>>>>>>>>>
>>>>>>>>>   JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
>>>>>>>>>   webrev: 
>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
>>>>>>>>>
>>>>>>>>> This change has passed tests on submit repo.
>>>>>>>>> Also I tested it with serviceability/jvmti and 
>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>> Yes. It seems we have a consensus.
>>>>>>>>>> Thank you for taking care about it.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Serguei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote:
>>>>>>>>>>>> Ok, may I file it to JBS and fix it? 
>>>>>>>>>>>
>>>>>>>>>>> Go for it! :)
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>>>> Hi Dan, David and Yasumasa,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I wonder why 
>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does 
>>>>>>>>>>>>>>>>>>>>>>>> not perform at safepoint.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the 
>>>>>>>>>>>>>>>>>>>>>>> target is not suspended:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>>>>>>>>>>>    JavaThread* calling_thread = 
>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current();
>>>>>>>>>>>>>>>>>>>>>>>    jvmtiError err = 
>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, 
>>>>>>>>>>>>>>>>>>>>>>> info_ptr);
>>>>>>>>>>>>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>>>>>>>>>>>      // Some of the critical threads were not 
>>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again
>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, 
>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op);
>>>>>>>>>>>>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>    return err;
>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases 
>>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not 
>>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage().
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor 
>>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform 
>>>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner 
>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor 
>>>>>>>>>>>>>>>>>>>>>>>> before [2].
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner 
>>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is 
>>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If 
>>>>>>>>>>>>>>>>>>>>>>> it is not suspended we detect that and redo the 
>>>>>>>>>>>>>>>>>>>>>>> whole query at a safepoint.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume 
>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also 
>>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the 
>>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Which code is wrong?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the 
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>>>>>>>>>>>> has started the process of gathering the information 
>>>>>>>>>>>>>>>>>>>> while not at a
>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects 
>>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is 
>>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a 
>>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just 
>>>>>>>>>>>>>>>>>>> after the code determined it was suspended. The 
>>>>>>>>>>>>>>>>>>> monitor can then be released and the information 
>>>>>>>>>>>>>>>>>>> gathered not only stale but potentially completely 
>>>>>>>>>>>>>>>>>>> wrong as it could now be owned by a different thread 
>>>>>>>>>>>>>>>>>>> and will report that thread's entry count.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as 
>>>>>>>>>>>>>>>>>> soon as
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the 
>>>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation 
>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target 
>>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>>> could have moved on.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. 
>>>>>>>>>>>>>>>>> But the expectation is that the information was 
>>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the 
>>>>>>>>>>>>>>>>> monitor at some point in time. The current code does 
>>>>>>>>>>>>>>>>> not ensure that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think 
>>>>>>>>>>>>>>>> the info
>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the 
>>>>>>>>>>>>>>>> monitor
>>>>>>>>>>>>>>>> at some point in time".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no 
>>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any 
>>>>>>>>>>>>>>> longer when the entry count is read, so straight away you 
>>>>>>>>>>>>>>> may have the wrong entry count information. The set of 
>>>>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the 
>>>>>>>>>>>>>>> monitor can change in unexpected ways. It would be 
>>>>>>>>>>>>>>> possible for instance to report the same thread as being 
>>>>>>>>>>>>>>> the owner, being blocked trying to enter the monitor, and 
>>>>>>>>>>>>>>> being in the wait-set of the monitor - apparently all at 
>>>>>>>>>>>>>>> the same time!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete 
>>>>>>>>>>>>>>> atomicity because threads can join the set of threads 
>>>>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ 
>>>>>>>>>>>>>> suspended:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   - GetObjectMonitorUsage() uses a safepoint to gather the 
>>>>>>>>>>>>>> info about
>>>>>>>>>>>>>>     the object's monitor. Since we're at a safepoint, the 
>>>>>>>>>>>>>> info that
>>>>>>>>>>>>>>     we are gathering cannot change until we return from 
>>>>>>>>>>>>>> the safepoint.
>>>>>>>>>>>>>>     It is a snapshot and a valid one at that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   - GetObjectMonitorUsage() will gather info about the 
>>>>>>>>>>>>>> object's
>>>>>>>>>>>>>>     monitor while _not_ at a safepoint. Assuming that no 
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>     thread is suspended, then entry_count can change because
>>>>>>>>>>>>>>     another thread can block on entry while we are gathering
>>>>>>>>>>>>>>     info. waiter_count and waiters can change if a thread was
>>>>>>>>>>>>>>     in a timed wait that has timed out and now that thread is
>>>>>>>>>>>>>>     blocked on re-entry. I don't think that 
>>>>>>>>>>>>>> notify_waiter_count
>>>>>>>>>>>>>>     and notify_waiters can change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     So in this case, the owner info and notify info is 
>>>>>>>>>>>>>> stable,
>>>>>>>>>>>>>>     but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consider the case when the monitor is not owned:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   - GetObjectMonitorUsage() will start to gather info 
>>>>>>>>>>>>>> about the
>>>>>>>>>>>>>>     object's monitor while _not_ at a safepoint. If it 
>>>>>>>>>>>>>> finds a
>>>>>>>>>>>>>>     thread on the entry queue that is not suspended, then 
>>>>>>>>>>>>>> it will
>>>>>>>>>>>>>>     bail out and redo the info gather at a safepoint. I just
>>>>>>>>>>>>>>     noticed that it doesn't check for suspension for the 
>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>     on the waiters list so a timed Object.wait() call can 
>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>>     some confusion here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     So in this case, the owner info is not stable if a thread
>>>>>>>>>>>>>>     comes out of a timed wait and reenters the monitor. This
>>>>>>>>>>>>>>     case is no different than if a "barger" thread comes in
>>>>>>>>>>>>>>     after the NULL owner field is observed and enters the
>>>>>>>>>>>>>>     monitor. We'll return that there is no owner, a list of
>>>>>>>>>>>>>>     suspended pending entry thread and a list of waiting
>>>>>>>>>>>>>>     threads. The reality is that the object's monitor is
>>>>>>>>>>>>>>     owned by the "barger" that completely bypassed the entry
>>>>>>>>>>>>>>     queue by virtue of seeing the NULL owner field at exactly
>>>>>>>>>>>>>>     the right time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If
>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also
>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the
>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable,
>>>>>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable
>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>>>>>>>>>>>> That's deterministic, but not without some work.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all
>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended
>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not
>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then
>>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for this claim:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It would be possible for instance to report the same thread
>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the 
>>>>>>>>>>>>>>> monitor,
>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at
>>>>>>>>>>>>>>> the same time! 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the
>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we
>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't
>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since
>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not
>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is
>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the
>>>>>>>>>>>>>> wait queue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not
>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Handshaking is not going to make this situation any better
>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>>>>>>>>>>>> handshake with the owner, the stability or instability of
>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is
>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as
>>>>>>>>>>>>>> stable as when at a safepoint because individual threads
>>>>>>>>>>>>>> can resume execution after doing their handshake so there
>>>>>>>>>>>>>> will still be field instability.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather
>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>> The advantages are:
>>>>>>>>>>>>>   - the result is stable
>>>>>>>>>>>>>   - the implementation can be simplified
>>>>>>>>>>>>>
>>>>>>>>>>>>> Performance impact is not very clear but should not be that
>>>>>>>>>>>>> big as suspending all the threads has some overhead too.
>>>>>>>>>>>>> I'm not sure if using handshakes can make performance better.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Serguei
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale 
>>>>>>>>>>>>>>>>>> information is
>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps 
>>>>>>>>>>>>>>>>>> the doc
>>>>>>>>>>>>>>>>>> should have more clear about the possibility of 
>>>>>>>>>>>>>>>>>> returning stale
>>>>>>>>>>>>>>>>>> info. That's a question for Robert F.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's 
>>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be 
>>>>>>>>>>>>>>>>>>> construed as an agent bug.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target 
>>>>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while 
>>>>>>>>>>>>>>>>>> the target
>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed 
>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but 
>>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), 
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent 
>>>>>>>>>>>>>>>>>> should not
>>>>>>>>>>>>>>>>>> resume the target thread while also calling 
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage().
>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so 
>>>>>>>>>>>>>>>>>> agent bug.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an 
>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't 
>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says 
>>>>>>>>>>>>>>>>> nothing about suspension ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. 
>>>>>>>>>>>>>>>> JVM/DI and JVM/PI
>>>>>>>>>>>>>>>> used to require suspension for these kinds of 
>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI
>>>>>>>>>>>>>>>> intentionally was designed to not require suspension.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As I've said before, we could add a note about the data 
>>>>>>>>>>>>>>>> being potentially
>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like 
>>>>>>>>>>>>>>>> stat(2). You can
>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the 
>>>>>>>>>>>>>>>> info is current
>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too 
>>>>>>>>>>>>>>>> much motherhood to
>>>>>>>>>>>>>>>> state that the data might be stale? I could go either 
>>>>>>>>>>>>>>>> way...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this 
>>>>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any 
>>>>>>>>>>>>>>>>>>> safepoints.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding 
>>>>>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>>>>>> handshakes in this thread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread 
>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the operation 
>>>>>>>>>>>>>>>>> would create a per-thread safepoint.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it 
>>>>>>>>>>>>>>>> and probably
>>>>>>>>>>>>>>>> see the code to see if there are holes...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually 
>>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do 
>>>>>>>>>>>>>>>>> that because suspends/resume don't nest.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we 
>>>>>>>>>>>>>>>> tracked internal and
>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to 
>>>>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the 
>>>>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in 
>>>>>>>>>>>>>>>>>>>>>> native [4].
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it 
>>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the 
>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first 
>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from 
>>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about 
>>>>>>>>>>>>>>>>>>>>>> the object's monitor."
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, 
>>>>>>>>>>>>>>>>>>>>> nothing to do with the spec.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect 
>>>>>>>>>>>>>>>>>>>>>> information in some case.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner 
>>>>>>>>>>>>>>>>>>>>>> might be just before wakeup.
>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if 
>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in 
>>>>>>>>>>>>>>>>>>>>>> any case.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using 
>>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will 
>>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe 
>>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying 
>>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I think 
>>>>>>>>>>>>>>>>>>>>> to always using a safepoint for the entire operation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [3] 
>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [4] 
>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the 
>>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended 
>>>>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the 
>>>>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we 
>>>>>>>>>>>>>>>>>>>>>>> call:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), 
>>>>>>>>>>>>>>>>>>>>>>> owner);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and 
>>>>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a 
>>>>>>>>>>>>>>>>>>>>>>> racy way. This can't happen when suspension 
>>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current thread 
>>>>>>>>>>>>>>>>>>>>>>> won't go to that safepoint during this code. 
>>>>>>>>>>>>>>>>>>>>>>> However, if suspension is implemented via a 
>>>>>>>>>>>>>>>>>>>>>>> direct handshake with the target thread then we 
>>>>>>>>>>>>>>>>>>>>>>> have a problem.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>