RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

Thu Jun 18 13:55:22 UTC 2020

On 6/18/20 9:18 AM, David Holmes wrote:
> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote:
>> On 2020/06/18 17:36, David Holmes wrote:
>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
>>>> Hi David,
>>>>
>>>> Both ThreadsListHandle and ResourceMarks would use 
>>>> `Thread::current()` for their resource. It is set as default 
>>>> parameter in c'tor.
>>>> Do you mean we should it explicitly in c'tor?
>>>
>>> Yes pass current_thread so we don't do the additional unnecessary 
>>> calls to Thread::current().
>>
>> Ok, I've fixed them. Could you review again?
>>
>>    http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/
>
> Updates look good. One nit I missed before:
>
> src/hotspot/share/prims/jvmtiEnv.cpp
>
> // It need to perform at safepoint for gathering stable data
>
> please change to:
>
> // This need to be performed at a safepoint to gather stable data

Just a comment on this comment... I still haven't gotten to the webrev 
yet...

Perhaps:

     // This needs to be performed at a safepoint to gather stable data.

Dan
>
> Thanks,
> David
>
>>
>> Thanks,
>>
>> Yasumasa
>>
>>
>>> David
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2020/06/18 13:58, David Holmes wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
>>>>>> Hi Serguei,
>>>>>>
>>>>>> Thanks for your comment!
>>>>>> I uploaded new webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/
>>>>>>
>>>>>> I'm not sure the following change is correct.
>>>>>> Can we assume owning_thread is not NULL at safepoint?
>>>>>
>>>>> We can if "owner != NULL". So that change seem fine to me.
>>>>>
>>>>> But given this is now only executed at a safepoint there are 
>>>>> additional simplifications that can be made:
>>>>>
>>>>> - current thread determination can be simplified:
>>>>>
>>>>> 945   Thread* current_thread = Thread::current();
>>>>>
>>>>> becomes:
>>>>>
>>>>>     Thread* current_thread = VMThread::vm_thread();
>>>>>     assert(current_thread == Thread::current(), "must be");
>>>>>
>>>>> - these comments can be removed
>>>>>
>>>>>   994       // Use current thread since function can be called from a
>>>>>   995       // JavaThread or the VMThread.
>>>>> 1053       // Use current thread since function can be called from a
>>>>> 1054       // JavaThread or the VMThread.
>>>>>
>>>>> - these TLH constructions should be passing current_thread 
>>>>> (existing bug)
>>>>>
>>>>> 996       ThreadsListHandle tlh;
>>>>> 1055       ThreadsListHandle tlh;
>>>>>
>>>>> - All ResourceMarks should be passing current_thread (existing bug)
>>>>>
>>>>>
>>>>> Aside: there is a major inconsistency between the spec and 
>>>>> implementation for this method. I've traced the history to see how 
>>>>> this came about from JVMDI (ref JDK-4546581) but it never resulted 
>>>>> in the JVM TI specification clearly stating what the 
>>>>> waiters/waiter_count means. I will file a bug to have the spec 
>>>>> clarified to match the implementation (even though I think the 
>>>>> implementation is what is wrong). :(
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>>> All tests on submit repo and serviceability/jvmti and 
>>>>>> vmTestbase/nsk/jvmti have been passed with this change.
>>>>>>
>>>>>>
>>>>>> ```
>>>>>>         // This monitor is owned so we have to find the owning 
>>>>>> JavaThread.
>>>>>>         owning_thread = 
>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>> -      // Cannot assume (owning_thread != NULL) here because this 
>>>>>> function
>>>>>> -      // may not have been called at a safepoint and the 
>>>>>> owning_thread
>>>>>> -      // might not be suspended.
>>>>>> -      if (owning_thread != NULL) {
>>>>>> -        // The monitor's owner either has to be the current 
>>>>>> thread, at safepoint
>>>>>> -        // or it has to be suspended. Any of these conditions 
>>>>>> will prevent both
>>>>>> -        // contending and waiting threads from modifying the 
>>>>>> state of
>>>>>> -        // the monitor.
>>>>>> -        if (!at_safepoint && 
>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) {
>>>>>> -          // Don't worry! This return of 
>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED
>>>>>> -          // will not make it back to the JVM/TI agent. The 
>>>>>> error code will
>>>>>> -          // get intercepted in 
>>>>>> JvmtiEnv::GetObjectMonitorUsage() which
>>>>>> -          // will retry the call via a VM_GetObjectMonitorUsage 
>>>>>> VM op.
>>>>>> -          return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>>>>>> -        }
>>>>>> -        HandleMark hm;
>>>>>> +      assert(owning_thread != NULL, "owning JavaThread must not 
>>>>>> be NULL");
>>>>>>           Handle     th(current_thread, owning_thread->threadObj());
>>>>>>           ret.owner = (jthread)jni_reference(calling_thread, th);
>>>>>>
>>>>>> ```
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> This fix is not enough.
>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two 
>>>>>>> modes: in VMop and non-VMop.
>>>>>>> The non-VMop mode has to be removed.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Serguei
>>>>>>>
>>>>>>>
>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote:
>>>>>>>> (Change subject for RFR)
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I filed it to JBS and upload a webrev for it.
>>>>>>>> Could you review it?
>>>>>>>>
>>>>>>>>   JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
>>>>>>>>   webrev: 
>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
>>>>>>>>
>>>>>>>> This change has passed tests on submit repo.
>>>>>>>> Also I tested it with serviceability/jvmti and 
>>>>>>>> vmTestbase/nsk/jvmti on Linux x64.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote:
>>>>>>>>> Yes. It seems we have a consensus.
>>>>>>>>> Thank you for taking care about it.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/16/20 18:34, David Holmes wrote:
>>>>>>>>>>> Ok, may I file it to JBS and fix it? 
>>>>>>>>>>
>>>>>>>>>> Go for it! :)
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>>> Hi Dan, David and Yasumasa,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I wonder why 
>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does 
>>>>>>>>>>>>>>>>>>>>>>> not perform at safepoint.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the 
>>>>>>>>>>>>>>>>>>>>>> target is not suspended:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>>>>>>>>>>    JavaThread* calling_thread = 
>>>>>>>>>>>>>>>>>>>>>> JavaThread::current();
>>>>>>>>>>>>>>>>>>>>>>    jvmtiError err = 
>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, 
>>>>>>>>>>>>>>>>>>>>>> info_ptr);
>>>>>>>>>>>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>>>>>>>>>>      // Some of the critical threads were not 
>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again
>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread, 
>>>>>>>>>>>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op);
>>>>>>>>>>>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>    return err;
>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases 
>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not 
>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage().
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor 
>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform 
>>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner 
>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor 
>>>>>>>>>>>>>>>>>>>>>>> before [2].
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner 
>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is 
>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If 
>>>>>>>>>>>>>>>>>>>>>> it is not suspended we detect that and redo the 
>>>>>>>>>>>>>>>>>>>>>> whole query at a safepoint.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume 
>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also 
>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the 
>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Which code is wrong?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the 
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>>>>>>>>>>> has started the process of gathering the information 
>>>>>>>>>>>>>>>>>>> while not at a
>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects 
>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is 
>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a 
>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just 
>>>>>>>>>>>>>>>>>> after the code determined it was suspended. The 
>>>>>>>>>>>>>>>>>> monitor can then be released and the information 
>>>>>>>>>>>>>>>>>> gathered not only stale but potentially completely 
>>>>>>>>>>>>>>>>>> wrong as it could now be owned by a different thread 
>>>>>>>>>>>>>>>>>> and will report that thread's entry count.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as 
>>>>>>>>>>>>>>>>> soon as
>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the 
>>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation 
>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target 
>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>> could have moved on.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. 
>>>>>>>>>>>>>>>> But the expectation is that the information was 
>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the 
>>>>>>>>>>>>>>>> monitor at some point in time. The current code does 
>>>>>>>>>>>>>>>> not ensure that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think 
>>>>>>>>>>>>>>> the info
>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the 
>>>>>>>>>>>>>>> monitor
>>>>>>>>>>>>>>> at some point in time".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no 
>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any 
>>>>>>>>>>>>>> longer when the entry count is read, so straight away you 
>>>>>>>>>>>>>> may have the wrong entry count information. The set of 
>>>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the 
>>>>>>>>>>>>>> monitor can change in unexpected ways. It would be 
>>>>>>>>>>>>>> possible for instance to report the same thread as being 
>>>>>>>>>>>>>> the owner, being blocked trying to enter the monitor, and 
>>>>>>>>>>>>>> being in the wait-set of the monitor - apparently all at 
>>>>>>>>>>>>>> the same time!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete 
>>>>>>>>>>>>>> atomicity because threads can join the set of threads 
>>>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ 
>>>>>>>>>>>>> suspended:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   - GetObjectMonitorUsage() uses a safepoint to gather the 
>>>>>>>>>>>>> info about
>>>>>>>>>>>>>     the object's monitor. Since we're at a safepoint, the 
>>>>>>>>>>>>> info that
>>>>>>>>>>>>>     we are gathering cannot change until we return from 
>>>>>>>>>>>>> the safepoint.
>>>>>>>>>>>>>     It is a snapshot and a valid one at that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   - GetObjectMonitorUsage() will gather info about the 
>>>>>>>>>>>>> object's
>>>>>>>>>>>>>     monitor while _not_ at a safepoint. Assuming that no 
>>>>>>>>>>>>> other
>>>>>>>>>>>>>     thread is suspended, then entry_count can change because
>>>>>>>>>>>>>     another thread can block on entry while we are gathering
>>>>>>>>>>>>>     info. waiter_count and waiters can change if a thread was
>>>>>>>>>>>>>     in a timed wait that has timed out and now that thread is
>>>>>>>>>>>>>     blocked on re-entry. I don't think that 
>>>>>>>>>>>>> notify_waiter_count
>>>>>>>>>>>>>     and notify_waiters can change.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     So in this case, the owner info and notify info is 
>>>>>>>>>>>>> stable,
>>>>>>>>>>>>>     but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consider the case when the monitor is not owned:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   - GetObjectMonitorUsage() will start to gather info 
>>>>>>>>>>>>> about the
>>>>>>>>>>>>>     object's monitor while _not_ at a safepoint. If it 
>>>>>>>>>>>>> finds a
>>>>>>>>>>>>>     thread on the entry queue that is not suspended, then 
>>>>>>>>>>>>> it will
>>>>>>>>>>>>>     bail out and redo the info gather at a safepoint. I just
>>>>>>>>>>>>>     noticed that it doesn't check for suspension for the 
>>>>>>>>>>>>> threads
>>>>>>>>>>>>>     on the waiters list so a timed Object.wait() call can 
>>>>>>>>>>>>> cause
>>>>>>>>>>>>>     some confusion here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     So in this case, the owner info is not stable if a thread
>>>>>>>>>>>>>     comes out of a timed wait and reenters the monitor. This
>>>>>>>>>>>>>     case is no different than if a "barger" thread comes in
>>>>>>>>>>>>>     after the NULL owner field is observed and enters the
>>>>>>>>>>>>>     monitor. We'll return that there is no owner, a list of
>>>>>>>>>>>>>     suspended pending entry thread and a list of waiting
>>>>>>>>>>>>>     threads. The reality is that the object's monitor is
>>>>>>>>>>>>>     owned by the "barger" that completely bypassed the entry
>>>>>>>>>>>>>     queue by virtue of seeing the NULL owner field at exactly
>>>>>>>>>>>>>     the right time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If
>>>>>>>>>>>>> that owner is not suspended, then the other fields are also
>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the
>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable,
>>>>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable
>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>>>>>>>>>>> That's deterministic, but not without some work.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all
>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended
>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not
>>>>>>>>>>>>> suspended. If either of those conditions is not true, then
>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for this claim:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It would be possible for instance to report the same thread
>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the 
>>>>>>>>>>>>>> monitor,
>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at
>>>>>>>>>>>>>> the same time! 
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the
>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we
>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't
>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since
>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not
>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is
>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the
>>>>>>>>>>>>> wait queue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the info instability of this API is bad, but it's not
>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Handshaking is not going to make this situation any better
>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>>>>>>>>>>> handshake with the owner, the stability or instability of
>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is
>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as
>>>>>>>>>>>>> stable as when at a safepoint because individual threads
>>>>>>>>>>>>> can resume execution after doing their handshake so there
>>>>>>>>>>>>> will still be field instability.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather
>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind.
>>>>>>>>>>>>
>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>> The advantages are:
>>>>>>>>>>>>   - the result is stable
>>>>>>>>>>>>   - the implementation can be simplified
>>>>>>>>>>>>
>>>>>>>>>>>> Performance impact is not very clear but should not be that
>>>>>>>>>>>> big as suspending all the threads has some overhead too.
>>>>>>>>>>>> I'm not sure if using handshakes can make performance better.
>>>>>>>>>>>
>>>>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Serguei
>>>>>>>>>>>>
>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale 
>>>>>>>>>>>>>>>>> information is
>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps 
>>>>>>>>>>>>>>>>> the doc
>>>>>>>>>>>>>>>>> should have more clear about the possibility of 
>>>>>>>>>>>>>>>>> returning stale
>>>>>>>>>>>>>>>>> info. That's a question for Robert F.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's 
>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be 
>>>>>>>>>>>>>>>>>> construed as an agent bug.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target 
>>>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while 
>>>>>>>>>>>>>>>>> the target
>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed 
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but 
>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), 
>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent 
>>>>>>>>>>>>>>>>> should not
>>>>>>>>>>>>>>>>> resume the target thread while also calling 
>>>>>>>>>>>>>>>>> GetObjectMonitorUsage().
>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so 
>>>>>>>>>>>>>>>>> agent bug.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an 
>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't 
>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says 
>>>>>>>>>>>>>>>> nothing about suspension ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And it is intentional that suspension is not required. 
>>>>>>>>>>>>>>> JVM/DI and JVM/PI
>>>>>>>>>>>>>>> used to require suspension for these kinds of 
>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI
>>>>>>>>>>>>>>> intentionally was designed to not require suspension.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As I've said before, we could add a note about the data 
>>>>>>>>>>>>>>> being potentially
>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like 
>>>>>>>>>>>>>>> stat(2). You can
>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the 
>>>>>>>>>>>>>>> info is current
>>>>>>>>>>>>>>> by the time you process what you got back. Is it too 
>>>>>>>>>>>>>>> much motherhood to
>>>>>>>>>>>>>>> state that the data might be stale? I could go either 
>>>>>>>>>>>>>>> way...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this 
>>>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any 
>>>>>>>>>>>>>>>>>> safepoints.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding 
>>>>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>>>>> handshakes in this thread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread 
>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the operation 
>>>>>>>>>>>>>>>> would create a per-thread safepoint.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it 
>>>>>>>>>>>>>>> and probably
>>>>>>>>>>>>>>> see the code to see if there are holes...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Semantically it is no different to the code actually 
>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do 
>>>>>>>>>>>>>>>> that because suspends/resume don't nest.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we 
>>>>>>>>>>>>>>> tracked internal and
>>>>>>>>>>>>>>> external suspends separately. That was a nightmare...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to 
>>>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the 
>>>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in 
>>>>>>>>>>>>>>>>>>>>> native [4].
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it 
>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the 
>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first 
>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from 
>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about 
>>>>>>>>>>>>>>>>>>>>> the object's monitor."
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, 
>>>>>>>>>>>>>>>>>>>> nothing to do with the spec.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect 
>>>>>>>>>>>>>>>>>>>>> information in some case.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner 
>>>>>>>>>>>>>>>>>>>>> might be just before wakeup.
>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if 
>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in 
>>>>>>>>>>>>>>>>>>>>> any case.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using 
>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will 
>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe 
>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying 
>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I think 
>>>>>>>>>>>>>>>>>>>> to always using a safepoint for the entire operation.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [3] 
>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [4] 
>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the 
>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended 
>>>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the 
>>>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we 
>>>>>>>>>>>>>>>>>>>>>> call:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), 
>>>>>>>>>>>>>>>>>>>>>> owner);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and 
>>>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a 
>>>>>>>>>>>>>>>>>>>>>> racy way. This can't happen when suspension 
>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current thread 
>>>>>>>>>>>>>>>>>>>>>> won't go to that safepoint during this code. 
>>>>>>>>>>>>>>>>>>>>>> However, if suspension is implemented via a 
>>>>>>>>>>>>>>>>>>>>>> direct handshake with the target thread then we 
>>>>>>>>>>>>>>>>>>>>>> have a problem.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>