Question about GetObjectMonitorUsage() JVMTI function

Wed Jun 17 05:37:36 UTC 2020

Yes. It seems we have a consensus.
Thank you for taking care about it.

Thanks,
Serguei

On 6/16/20 18:34, David Holmes wrote:
>> Ok, may I file it to JBS and fix it? 
>
> Go for it! :)
>
> Cheers,
> David
>
> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>> Hi Dan, David and Yasumasa,
>>>
>>>
>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>> Hi Dan,
>>>>>>>>>
>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not 
>>>>>>>>>>>>>> perform at safepoint.
>>>>>>>>>>>>>
>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target 
>>>>>>>>>>>>> is not suspended:
>>>>>>>>>>>>>
>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>    JavaThread* calling_thread = JavaThread::current();
>>>>>>>>>>>>>    jvmtiError err = 
>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object, info_ptr);
>>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>      // Some of the critical threads were not suspended. 
>>>>>>>>>>>>> go to a safepoint and try again
>>>>>>>>>>>>>      VM_GetObjectMonitorUsage op(this, calling_thread, 
>>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>>      VMThread::execute(&op);
>>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>>    }
>>>>>>>>>>>>>    return err;
>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>
>>>>>>>>>>>> I saw this code, so I guess there are some cases when 
>>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from 
>>>>>>>>>>>> get_object_monitor_usage().
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at 
>>>>>>>>>>>>>> first [1], but it would perform concurrently.
>>>>>>>>>>>>>> If owner thread is not suspended, the owner might be 
>>>>>>>>>>>>>> changed to others in subsequent code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For example, the owner might release the monitor before [2].
>>>>>>>>>>>>>
>>>>>>>>>>>>> The expectation is that when we find an owner thread it is 
>>>>>>>>>>>>> either suspended or not. If it is suspended then it cannot 
>>>>>>>>>>>>> release the monitor. If it is not suspended we detect that 
>>>>>>>>>>>>> and redo the whole query at a safepoint.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the owner thread might resume unfortunately after 
>>>>>>>>>>>> suspending check.
>>>>>>>>>>>
>>>>>>>>>>> Yes you are right. I was thinking resuming also required a 
>>>>>>>>>>> safepoint but it only requires the Threads_lock. So yes the 
>>>>>>>>>>> code is wrong.
>>>>>>>>>>
>>>>>>>>>> Which code is wrong?
>>>>>>>>>>
>>>>>>>>>> Yes, a rogue resume can happen when the 
>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>> has started the process of gathering the information while 
>>>>>>>>>> not at a
>>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>
>>>>>>>>> The code tries to make sure that it either collects data about 
>>>>>>>>> a monitor owned by a thread that is suspended, or else it 
>>>>>>>>> collects that data at a safepoint. But the owning thread can 
>>>>>>>>> be resumed just after the code determined it was suspended. 
>>>>>>>>> The monitor can then be released and the information gathered 
>>>>>>>>> not only stale but potentially completely wrong as it could 
>>>>>>>>> now be owned by a different thread and will report that 
>>>>>>>>> thread's entry count.
>>>>>>>>
>>>>>>>> If the agent is not using SuspendThread(), then as soon as
>>>>>>>> GetObjectMonitorUsage() returns to the caller the information
>>>>>>>> can be stale. In fact as soon as the implementation returns
>>>>>>>> from the safepoint that gathered the info, the target thread
>>>>>>>> could have moved on.
>>>>>>>
>>>>>>> That isn't the issue. That the info is stale is fine. But the 
>>>>>>> expectation is that the information was actually an accurate 
>>>>>>> snapshot of the state of the monitor at some point in time. The 
>>>>>>> current code does not ensure that.
>>>>>>
>>>>>> Please explain. I clearly don't understand why you think the info
>>>>>> returned isn't "an accurate snapshot of the state of the monitor
>>>>>> at some point in time".
>>>>>
>>>>> Because it may not be a "snapshot" at all. There is no 
>>>>> atomicity**. The reported owner thread may not own it any longer 
>>>>> when the entry count is read, so straight away you may have the 
>>>>> wrong entry count information. The set of threads trying to 
>>>>> acquire the monitor, or wait on the monitor can change in 
>>>>> unexpected ways. It would be possible for instance to report the 
>>>>> same thread as being the owner, being blocked trying to enter the 
>>>>> monitor, and being in the wait-set of the monitor - apparently all 
>>>>> at the same time!
>>>>>
>>>>> ** even if the owner is suspended we don't have complete atomicity 
>>>>> because threads can join the set of threads trying to enter the 
>>>>> monitor (unless they are all suspended).
>>>>
>>>> Consider the case when the monitor's owner is _not_ suspended:
>>>>
>>>>   - GetObjectMonitorUsage() uses a safepoint to gather the info about
>>>>     the object's monitor. Since we're at a safepoint, the info that
>>>>     we are gathering cannot change until we return from the safepoint.
>>>>     It is a snapshot and a valid one at that.
>>>>
>>>> Consider the case when the monitor's owner is suspended:
>>>>
>>>>   - GetObjectMonitorUsage() will gather info about the object's
>>>>     monitor while _not_ at a safepoint. Assuming that no other
>>>>     thread is suspended, then entry_count can change because
>>>>     another thread can block on entry while we are gathering
>>>>     info. waiter_count and waiters can change if a thread was
>>>>     in a timed wait that has timed out and now that thread is
>>>>     blocked on re-entry. I don't think that notify_waiter_count
>>>>     and notify_waiters can change.
>>>>
>>>>     So in this case, the owner info and notify info is stable,
>>>>     but the entry_count and waiter info is not stable.
>>>>
>>>> Consider the case when the monitor is not owned:
>>>>
>>>>   - GetObjectMonitorUsage() will start to gather info about the
>>>>     object's monitor while _not_ at a safepoint. If it finds a
>>>>     thread on the entry queue that is not suspended, then it will
>>>>     bail out and redo the info gather at a safepoint. I just
>>>>     noticed that it doesn't check for suspension for the threads
>>>>     on the waiters list so a timed Object.wait() call can cause
>>>>     some confusion here.
>>>>
>>>>     So in this case, the owner info is not stable if a thread
>>>>     comes out of a timed wait and reenters the monitor. This
>>>>     case is no different than if a "barger" thread comes in
>>>>     after the NULL owner field is observed and enters the
>>>>     monitor. We'll return that there is no owner, a list of
>>>>     suspended pending entry thread and a list of waiting
>>>>     threads. The reality is that the object's monitor is
>>>>     owned by the "barger" that completely bypassed the entry
>>>>     queue by virtue of seeing the NULL owner field at exactly
>>>>     the right time.
>>>>
>>>> So the owner field is only stable when we have an owner. If
>>>> that owner is not suspended, then the other fields are also
>>>> stable because we gathered the info at a safepoint. If the
>>>> owner is suspended, then the owner and notify info is stable,
>>>> but the entry_count and waiter info is not stable.
>>>>
>>>> If we have a NULL owner field, then the info is only stable
>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>> That's deterministic, but not without some work.
>>>>
>>>>
>>>> Okay so only when we gather the info at a safepoint is all
>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>> do that at a safepoint when the owner thread is not suspended
>>>> or if owner == NULL and one of the entry threads is not
>>>> suspended. If either of those conditions is not true, then
>>>> the different pieces of info is unstable to varying degrees.
>>>>
>>>> As for this claim:
>>>>
>>>>> It would be possible for instance to report the same thread
>>>>> as being the owner, being blocked trying to enter the monitor,
>>>>> and being in the wait-set of the monitor - apparently all at
>>>>> the same time! 
>>>>
>>>> I can't figure out a way to make that scenario work. If the
>>>> thread is seen as the owner and is not suspended, then we
>>>> gather info at a safepoint. If it is suspended, then it can't
>>>> then be seen as on the entry queue or on the wait queue since
>>>> it is suspended. If it is seen on the entry queue and is not
>>>> suspended, then we gather info at a safepoint. If it is
>>>> suspended on the entry queue, then it can't be seen on the
>>>> wait queue.
>>>>
>>>> So the info instability of this API is bad, but it's not
>>>> quite that bad. :-) (That is a small mercy.)
>>>>
>>>>
>>>> Handshaking is not going to make this situation any better
>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>> handshake with the owner, the stability or instability of
>>>> the other fields remains the same as when SuspendThread is
>>>> used. Handshaking with all threads won't make the data as
>>>> stable as when at a safepoint because individual threads
>>>> can resume execution after doing their handshake so there
>>>> will still be field instability.
>>>>
>>>>
>>>> Short version: GetObjectMonitorUsage() should only gather
>>>> data at a safepoint. Yes, I've changed my mind.
>>>
>>> I agree with this.
>>> The advantages are:
>>>   - the result is stable
>>>   - the implementation can be simplified
>>>
>>> Performance impact is not very clear but should not be that
>>> big as suspending all the threads has some overhead too.
>>> I'm not sure if using handshakes can make performance better.
>>
>> Ok, may I file it to JBS and fix it?
>>
>> Yasumasa
>>
>>
>>> Thanks,
>>> Serguei
>>>
>>>> Dan
>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> The only way to make sure you don't have stale information is
>>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc
>>>>>>>> should have more clear about the possibility of returning stale
>>>>>>>> info. That's a question for Robert F.
>>>>>>>>
>>>>>>>>
>>>>>>>>> GetObjectMonitorUsage says nothing about thread's being 
>>>>>>>>> suspended so I can't see how this could be construed as an 
>>>>>>>>> agent bug.
>>>>>>>>
>>>>>>>> In your scenario above, you mention that the target thread was
>>>>>>>> suspended, GetObjectMonitorUsage() was called while the target
>>>>>>>> was suspended, and then the target thread was resumed after
>>>>>>>> GetObjectMonitorUsage() checked for suspension, but before
>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>
>>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage()
>>>>>>>> and ResumeThread() are made by the agent and the agent should not
>>>>>>>> resume the target thread while also calling 
>>>>>>>> GetObjectMonitorUsage().
>>>>>>>> The calls were allowed to be made out of order so agent bug.
>>>>>>>
>>>>>>> Perhaps. I was thinking more generally about an independent 
>>>>>>> resume, but you're right that doesn't really make a lot of 
>>>>>>> sense. But when the spec says nothing about suspension ...
>>>>>>
>>>>>> And it is intentional that suspension is not required. JVM/DI and 
>>>>>> JVM/PI
>>>>>> used to require suspension for these kinds of get-the-info APIs. 
>>>>>> JVM/TI
>>>>>> intentionally was designed to not require suspension.
>>>>>>
>>>>>> As I've said before, we could add a note about the data being 
>>>>>> potentially
>>>>>> stale unless SuspendThread is used. I think of it like stat(2). 
>>>>>> You can
>>>>>> fetch the file's info, but there's no guarantee that the info is 
>>>>>> current
>>>>>> by the time you process what you got back. Is it too much 
>>>>>> motherhood to
>>>>>> state that the data might be stale? I could go either way...
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>> Using a handshake on the owner thread will allow this to be 
>>>>>>>>> fixed in the future without forcing/using any safepoints.
>>>>>>>>
>>>>>>>> I have to think about that which is why I'm avoiding talking about
>>>>>>>> handshakes in this thread.
>>>>>>>
>>>>>>> Effectively the handshake can "suspend" the thread whilst the 
>>>>>>> monitor is queried. In effect the operation would create a 
>>>>>>> per-thread safepoint.
>>>>>>
>>>>>> I "know" that, but I still need time to think about it and probably
>>>>>> see the code to see if there are holes...
>>>>>>
>>>>>>
>>>>>>> Semantically it is no different to the code actually suspending 
>>>>>>> the owner thread, but it can't actually do that because 
>>>>>>> suspends/resume don't nest.
>>>>>>
>>>>>> Yeah... we used have a suspend count back when we tracked 
>>>>>> internal and
>>>>>> external suspends separately. That was a nightmare...
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check 
>>>>>>>>>>>> thread state, it returns `true` when the thread is sleeping 
>>>>>>>>>>>> [3], or when it performs in native [4].
>>>>>>>>>>>
>>>>>>>>>>> Sure but if the thread is actually suspended it can't 
>>>>>>>>>>> continue execution in the VM or in Java code.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> This appears to be an optimisation for the assumed common 
>>>>>>>>>>>>> case where threads are first suspended and then the 
>>>>>>>>>>>>> monitors are queried.
>>>>>>>>>>>>
>>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec 
>>>>>>>>>>>> - it just says "Get information about the object's monitor."
>>>>>>>>>>>
>>>>>>>>>>> Yes it was just an implementation optimisation, nothing to 
>>>>>>>>>>> do with the spec.
>>>>>>>>>>>
>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information 
>>>>>>>>>>>> in some case.
>>>>>>>>>>>>
>>>>>>>>>>>> It starts with finding owner thread, but the owner might be 
>>>>>>>>>>>> just before wakeup.
>>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is 
>>>>>>>>>>>> called at safepoint in any case.
>>>>>>>>>>>
>>>>>>>>>>> Except we're moving away from safepoints to using 
>>>>>>>>>>> Handshakes, so this particular operation will require that 
>>>>>>>>>>> the apparent owner is Handshake-safe (by entering a 
>>>>>>>>>>> handshake with it) before querying the monitor. This would 
>>>>>>>>>>> still be preferable I think to always using a safepoint for 
>>>>>>>>>>> the entire operation.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [3] 
>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>>
>>>>>>>>>>>> [4] 
>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> However there is still a potential bug as the thread 
>>>>>>>>>>>>> reported as the owner may not be suspended at the time we 
>>>>>>>>>>>>> first see it, and may release the monitor, but then it may 
>>>>>>>>>>>>> get suspended before we call:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>>>>>>>>>
>>>>>>>>>>>>> and so we think it is still the monitor owner and proceed 
>>>>>>>>>>>>> to query the monitor information in a racy way. This can't 
>>>>>>>>>>>>> happen when suspension itself requires a safepoint as the 
>>>>>>>>>>>>> current thread won't go to that safepoint during this 
>>>>>>>>>>>>> code. However, if suspension is implemented via a direct 
>>>>>>>>>>>>> handshake with the target thread then we have a problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>