RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

Thu Jun 18 14:09:36 UTC 2020

On 19/06/2020 12:06 am, Daniel D. Daugherty wrote:
> On 6/18/20 10:04 AM, David Holmes wrote:
>> On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote:
>>> On 6/18/20 9:18 AM, David Holmes wrote:
>>>> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote:
>>>>> On 2020/06/18 17:36, David Holmes wrote:
>>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Both ThreadsListHandle and ResourceMarks would use 
>>>>>>> `Thread::current()` for their resource. It is set as default 
>>>>>>> parameter in c'tor.
>>>>>>> Do you mean we should it explicitly in c'tor?
>>>>>>
>>>>>> Yes pass current_thread so we don't do the additional unnecessary 
>>>>>> calls to Thread::current().
>>>>>
>>>>> Ok, I've fixed them. Could you review again?
>>>>>
>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/
>>>>
>>>> Updates look good. One nit I missed before:
>>>>
>>>> src/hotspot/share/prims/jvmtiEnv.cpp
>>>>
>>>> // It need to perform at safepoint for gathering stable data
>>>>
>>>> please change to:
>>>>
>>>> // This need to be performed at a safepoint to gather stable data
>>>
>>> Just a comment on this comment... I still haven't gotten to the 
>>> webrev yet...
>>>
>>> Perhaps:
>>>
>>>      // This needs to be performed at a safepoint to gather stable data.
>>
>> There is a second line that continues the sentence
>>
>> // because monitor owner / waiters might not be suspended.
> 
> So no period at the end... but the s/need/needs/ still works. :-)

Yes missed that. Sorry blurry eyes, after midnight.

Good night ;-)

David

> Dan
> 
>>
>> David
>> -----
>>
>>> Dan
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2020/06/18 13:58, David Holmes wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
>>>>>>>>> Hi Serguei,
>>>>>>>>>
>>>>>>>>> Thanks for your comment!
>>>>>>>>> I uploaded new webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/
>>>>>>>>>
>>>>>>>>> I'm not sure the following change is correct.
>>>>>>>>> Can we assume owning_thread is not NULL at safepoint?
>>>>>>>>
>>>>>>>> We can if "owner != NULL". So that change seem fine to me.
>>>>>>>>
>>>>>>>> But given this is now only executed at a safepoint there are 
>>>>>>>> additional simplifications that can be made:
>>>>>>>>
>>>>>>>> - current thread determination can be simplified:
>>>>>>>>
>>>>>>>> 945   Thread* current_thread = Thread::current();
>>>>>>>>
>>>>>>>> becomes:
>>>>>>>>
>>>>>>>>     Thread* current_thread = VMThread::vm_thread();
>>>>>>>>     assert(current_thread == Thread::current(), "must be");
>>>>>>>>
>>>>>>>> - these comments can be removed
>>>>>>>>
>>>>>>>>   994       // Use current thread since function can be called 
>>>>>>>> from a
>>>>>>>>   995       // JavaThread or the VMThread.
>>>>>>>> 1053       // Use current thread since function can be called 
>>>>>>>> from a
>>>>>>>> 1054       // JavaThread or the VMThread.
>>>>>>>>
>>>>>>>> - these TLH constructions should be passing current_thread 
>>>>>>>> (existing bug)
>>>>>>>>
>>>>>>>> 996       ThreadsListHandle tlh;
>>>>>>>> 1055       ThreadsListHandle tlh;
>>>>>>>>
>>>>>>>> - All ResourceMarks should be passing current_thread (existing bug)
>>>>>>>>
>>>>>>>>
>>>>>>>> Aside: there is a major inconsistency between the spec and 
>>>>>>>> implementation for this method. I've traced the history to see 
>>>>>>>> how this came about from JVMDI (ref JDK-4546581) but it never 
>>>>>>>> resulted in the JVM TI specification clearly stating what the 
>>>>>>>> waiters/waiter_count means. I will file a bug to have the spec 
>>>>>>>> clarified to match the implementation (even though I think the 
>>>>>>>> implementation is what is wrong). :(
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> All tests on submit repo and serviceability/jvmti and 
>>>>>>>>> vmTestbase/nsk/jvmti have been passed with this change.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>>         // This monitor is owned so we have to find the owning 
>>>>>>>>> JavaThread.
>>>>>>>>>         owning_thread = 
>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>>>>> -      // Cannot assume (owning_thread != NULL) here because 
>>>>>>>>> this function
>>>>>>>>> -      // may not have been called at a safepoint and the 
>>>>>>>>> owning_thread
>>>>>>>>> -      // might not be suspended.
>>>>>>>>> -      if (owning_thread != NULL) {
>>>>>>>>> -        // The monitor's owner either has to be the current 
>>>>>>>>> thread, at safepoint
>>>>>>>>> -        // or it has to be suspended. Any of these conditions 
>>>>>>>>> will prevent both
>>>>>>>>> -        // contending and waiting threads from modifying the 
>>>>>>>>> state of
>>>>>>>>> -        // the monitor.
>>>>>>>>> -        if (!at_safepoint && 
>>>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) {
>>>>>>>>> -          // Don't worry! This return of 
>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED
>>>>>>>>> -          // will not make it back to the JVM/TI agent. The 
>>>>>>>>> error code will
>>>>>>>>> -          // get intercepted in 
>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which
>>>>>>>>> -          // will retry the call via a 
>>>>>>>>> VM_GetObjectMonitorUsage VM op.
>>>>>>>>> -          return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>>>>>>>>> -        }
>>>>>>>>> -        HandleMark hm;
>>>>>>>>> +      assert(owning_thread != NULL, "owning JavaThread must 
>>>>>>>>> not be NULL");
>>>>>>>>>           Handle     th(current_thread, 
>>>>>>>>> owning_thread->threadObj());
>>>>>>>>>           ret.owner = (jthread)jni_reference(calling_thread, th);
>>>>>>>>>
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>
>>>>>>>>>> This fix is not enough.
>>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in 
>>>>>>>>>> two modes: in VMop and non-VMop.
>>>>>>>>>> The non-VMop mode has to be removed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Serguei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote:
>>>>>>>>>>> (Change subject for RFR)
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I filed it to JBS and upload a webrev for it.
>>>>>>>>>>> Could you review it?
>>>>>>>>>>>
>>>>>>>>>>>   JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
>>>>>>>>>>>   webrev: 
>>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>> This change has passed tests on submit repo.
>>>>>>>>>>> Also I tested it with serviceability/jvmti and 
>>>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>>> Yes. It seems we have a consensus.
>>>>>>>>>>>> Thank you for taking care about it.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Serguei
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote:
>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? 
>>>>>>>>>>>>>
>>>>>>>>>>>>> Go for it! :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>>>>>> Hi Dan, David and Yasumasa,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why 
>>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) 
>>>>>>>>>>>>>>>>>>>>>>>>>> does not perform at safepoint.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if 
>>>>>>>>>>>>>>>>>>>>>>>>> the target is not suspended:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>>>>>>>>>>>>>    JavaThread* calling_thread = 
>>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current();
>>>>>>>>>>>>>>>>>>>>>>>>>    jvmtiError err = 
>>>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, 
>>>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>>>>>>>>>>>>>      // Some of the critical threads were not 
>>>>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again
>>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, 
>>>>>>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op);
>>>>>>>>>>>>>>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>>>    return err;
>>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases 
>>>>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not 
>>>>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage().
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor 
>>>>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform 
>>>>>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner 
>>>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the 
>>>>>>>>>>>>>>>>>>>>>>>>>> monitor before [2].
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner 
>>>>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is 
>>>>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. 
>>>>>>>>>>>>>>>>>>>>>>>>> If it is not suspended we detect that and redo 
>>>>>>>>>>>>>>>>>>>>>>>>> the whole query at a safepoint.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume 
>>>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also 
>>>>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the 
>>>>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Which code is wrong?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the 
>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the 
>>>>>>>>>>>>>>>>>>>>>> information while not at a
>>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects 
>>>>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is 
>>>>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a 
>>>>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed 
>>>>>>>>>>>>>>>>>>>>> just after the code determined it was suspended. 
>>>>>>>>>>>>>>>>>>>>> The monitor can then be released and the 
>>>>>>>>>>>>>>>>>>>>> information gathered not only stale but potentially 
>>>>>>>>>>>>>>>>>>>>> completely wrong as it could now be owned by a 
>>>>>>>>>>>>>>>>>>>>> different thread and will report that thread's 
>>>>>>>>>>>>>>>>>>>>> entry count.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as 
>>>>>>>>>>>>>>>>>>>> soon as
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the 
>>>>>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation 
>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the 
>>>>>>>>>>>>>>>>>>>> target thread
>>>>>>>>>>>>>>>>>>>> could have moved on.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. 
>>>>>>>>>>>>>>>>>>> But the expectation is that the information was 
>>>>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the 
>>>>>>>>>>>>>>>>>>> monitor at some point in time. The current code does 
>>>>>>>>>>>>>>>>>>> not ensure that.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you 
>>>>>>>>>>>>>>>>>> think the info
>>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of 
>>>>>>>>>>>>>>>>>> the monitor
>>>>>>>>>>>>>>>>>> at some point in time".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no 
>>>>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it 
>>>>>>>>>>>>>>>>> any longer when the entry count is read, so straight 
>>>>>>>>>>>>>>>>> away you may have the wrong entry count information. 
>>>>>>>>>>>>>>>>> The set of threads trying to acquire the monitor, or 
>>>>>>>>>>>>>>>>> wait on the monitor can change in unexpected ways. It 
>>>>>>>>>>>>>>>>> would be possible for instance to report the same 
>>>>>>>>>>>>>>>>> thread as being the owner, being blocked trying to 
>>>>>>>>>>>>>>>>> enter the monitor, and being in the wait-set of the 
>>>>>>>>>>>>>>>>> monitor - apparently all at the same time!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have 
>>>>>>>>>>>>>>>>> complete atomicity because threads can join the set of 
>>>>>>>>>>>>>>>>> threads trying to enter the monitor (unless they are 
>>>>>>>>>>>>>>>>> all suspended).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ 
>>>>>>>>>>>>>>>> suspended:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   - GetObjectMonitorUsage() uses a safepoint to gather 
>>>>>>>>>>>>>>>> the info about
>>>>>>>>>>>>>>>>     the object's monitor. Since we're at a safepoint, 
>>>>>>>>>>>>>>>> the info that
>>>>>>>>>>>>>>>>     we are gathering cannot change until we return from 
>>>>>>>>>>>>>>>> the safepoint.
>>>>>>>>>>>>>>>>     It is a snapshot and a valid one at that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   - GetObjectMonitorUsage() will gather info about the 
>>>>>>>>>>>>>>>> object's
>>>>>>>>>>>>>>>>     monitor while _not_ at a safepoint. Assuming that no 
>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>     thread is suspended, then entry_count can change 
>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>     another thread can block on entry while we are 
>>>>>>>>>>>>>>>> gathering
>>>>>>>>>>>>>>>>     info. waiter_count and waiters can change if a 
>>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>>     in a timed wait that has timed out and now that 
>>>>>>>>>>>>>>>> thread is
>>>>>>>>>>>>>>>>     blocked on re-entry. I don't think that 
>>>>>>>>>>>>>>>> notify_waiter_count
>>>>>>>>>>>>>>>>     and notify_waiters can change.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     So in this case, the owner info and notify info is 
>>>>>>>>>>>>>>>> stable,
>>>>>>>>>>>>>>>>     but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Consider the case when the monitor is not owned:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   - GetObjectMonitorUsage() will start to gather info 
>>>>>>>>>>>>>>>> about the
>>>>>>>>>>>>>>>>     object's monitor while _not_ at a safepoint. If it 
>>>>>>>>>>>>>>>> finds a
>>>>>>>>>>>>>>>>     thread on the entry queue that is not suspended, 
>>>>>>>>>>>>>>>> then it will
>>>>>>>>>>>>>>>>     bail out and redo the info gather at a safepoint. I 
>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>     noticed that it doesn't check for suspension for the 
>>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>>     on the waiters list so a timed Object.wait() call 
>>>>>>>>>>>>>>>> can cause
>>>>>>>>>>>>>>>>     some confusion here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     So in this case, the owner info is not stable if a 
>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>     comes out of a timed wait and reenters the monitor. 
>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>     case is no different than if a "barger" thread comes in
>>>>>>>>>>>>>>>>     after the NULL owner field is observed and enters the
>>>>>>>>>>>>>>>>     monitor. We'll return that there is no owner, a list of
>>>>>>>>>>>>>>>>     suspended pending entry thread and a list of waiting
>>>>>>>>>>>>>>>>     threads. The reality is that the object's monitor is
>>>>>>>>>>>>>>>>     owned by the "barger" that completely bypassed the 
>>>>>>>>>>>>>>>> entry
>>>>>>>>>>>>>>>>     queue by virtue of seeing the NULL owner field at 
>>>>>>>>>>>>>>>> exactly
>>>>>>>>>>>>>>>>     the right time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If
>>>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also
>>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the
>>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is 
>>>>>>>>>>>>>>>> stable,
>>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable
>>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>>>>>>>>>>>>>> That's deterministic, but not without some work.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all
>>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not 
>>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not
>>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then
>>>>>>>>>>>>>>>> the different pieces of info is unstable to varying 
>>>>>>>>>>>>>>>> degrees.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As for this claim:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It would be possible for instance to report the same 
>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the 
>>>>>>>>>>>>>>>>> monitor,
>>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently 
>>>>>>>>>>>>>>>>> all at
>>>>>>>>>>>>>>>>> the same time! 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the
>>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we
>>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it 
>>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue 
>>>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is 
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is
>>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the
>>>>>>>>>>>>>>>> wait queue.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not
>>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Handshaking is not going to make this situation any better
>>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>>>>>>>>>>>>>> handshake with the owner, the stability or instability of
>>>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is
>>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as
>>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads
>>>>>>>>>>>>>>>> can resume execution after doing their handshake so there
>>>>>>>>>>>>>>>> will still be field instability.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather
>>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>>>> The advantages are:
>>>>>>>>>>>>>>>   - the result is stable
>>>>>>>>>>>>>>>   - the implementation can be simplified
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Performance impact is not very clear but should not be that
>>>>>>>>>>>>>>> big as suspending all the threads has some overhead too.
>>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance 
>>>>>>>>>>>>>>> better.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Serguei
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale 
>>>>>>>>>>>>>>>>>>>> information is
>>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. 
>>>>>>>>>>>>>>>>>>>> Perhaps the doc
>>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of 
>>>>>>>>>>>>>>>>>>>> returning stale
>>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's 
>>>>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be 
>>>>>>>>>>>>>>>>>>>>> construed as an agent bug.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target 
>>>>>>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while 
>>>>>>>>>>>>>>>>>>>> the target
>>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was 
>>>>>>>>>>>>>>>>>>>> resumed after
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but 
>>>>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), 
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the 
>>>>>>>>>>>>>>>>>>>> agent should not
>>>>>>>>>>>>>>>>>>>> resume the target thread while also calling 
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage().
>>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so 
>>>>>>>>>>>>>>>>>>>> agent bug.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an 
>>>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't 
>>>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says 
>>>>>>>>>>>>>>>>>>> nothing about suspension ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. 
>>>>>>>>>>>>>>>>>> JVM/DI and JVM/PI
>>>>>>>>>>>>>>>>>> used to require suspension for these kinds of 
>>>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI
>>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the 
>>>>>>>>>>>>>>>>>> data being potentially
>>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like 
>>>>>>>>>>>>>>>>>> stat(2). You can
>>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that 
>>>>>>>>>>>>>>>>>> the info is current
>>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too 
>>>>>>>>>>>>>>>>>> much motherhood to
>>>>>>>>>>>>>>>>>> state that the data might be stale? I could go either 
>>>>>>>>>>>>>>>>>> way...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow 
>>>>>>>>>>>>>>>>>>>>> this to be fixed in the future without 
>>>>>>>>>>>>>>>>>>>>> forcing/using any safepoints.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding 
>>>>>>>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>>>>>>>> handshakes in this thread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread 
>>>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the 
>>>>>>>>>>>>>>>>>>> operation would create a per-thread safepoint.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it 
>>>>>>>>>>>>>>>>>> and probably
>>>>>>>>>>>>>>>>>> see the code to see if there are holes...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually 
>>>>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do 
>>>>>>>>>>>>>>>>>>> that because suspends/resume don't nest.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we 
>>>>>>>>>>>>>>>>>> tracked internal and
>>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used 
>>>>>>>>>>>>>>>>>>>>>>>> to check thread state, it returns `true` when 
>>>>>>>>>>>>>>>>>>>>>>>> the thread is sleeping [3], or when it performs 
>>>>>>>>>>>>>>>>>>>>>>>> in native [4].
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it 
>>>>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the 
>>>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first 
>>>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from 
>>>>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about 
>>>>>>>>>>>>>>>>>>>>>>>> the object's monitor."
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, 
>>>>>>>>>>>>>>>>>>>>>>> nothing to do with the spec.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect 
>>>>>>>>>>>>>>>>>>>>>>>> information in some case.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the 
>>>>>>>>>>>>>>>>>>>>>>>> owner might be just before wakeup.
>>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if 
>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint 
>>>>>>>>>>>>>>>>>>>>>>>> in any case.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using 
>>>>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will 
>>>>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe 
>>>>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying 
>>>>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I 
>>>>>>>>>>>>>>>>>>>>>>> think to always using a safepoint for the entire 
>>>>>>>>>>>>>>>>>>>>>>> operation.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [3] 
>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [4] 
>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the 
>>>>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be 
>>>>>>>>>>>>>>>>>>>>>>>>> suspended at the time we first see it, and may 
>>>>>>>>>>>>>>>>>>>>>>>>> release the monitor, but then it may get 
>>>>>>>>>>>>>>>>>>>>>>>>> suspended before we call:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), 
>>>>>>>>>>>>>>>>>>>>>>>>> owner);
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner 
>>>>>>>>>>>>>>>>>>>>>>>>> and proceed to query the monitor information in 
>>>>>>>>>>>>>>>>>>>>>>>>> a racy way. This can't happen when suspension 
>>>>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current 
>>>>>>>>>>>>>>>>>>>>>>>>> thread won't go to that safepoint during this 
>>>>>>>>>>>>>>>>>>>>>>>>> code. However, if suspension is implemented via 
>>>>>>>>>>>>>>>>>>>>>>>>> a direct handshake with the target thread then 
>>>>>>>>>>>>>>>>>>>>>>>>> we have a problem.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>
>