Question about GetObjectMonitorUsage() JVMTI function

Wed Jun 17 01:34:01 UTC 2020

> Ok, may I file it to JBS and fix it? 

Go for it! :)

Cheers,
David

On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>> Hi Dan, David and Yasumasa,
>>
>>
>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>> Hi Dan,
>>>>>>>>
>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>
>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does not 
>>>>>>>>>>>>> perform at safepoint.
>>>>>>>>>>>>
>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the target is 
>>>>>>>>>>>> not suspended:
>>>>>>>>>>>>
>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>    JavaThread* calling_thread = JavaThread::current();
>>>>>>>>>>>>    jvmtiError err = get_object_monitor_usage(calling_thread, 
>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>      // Some of the critical threads were not suspended. go 
>>>>>>>>>>>> to a safepoint and try again
>>>>>>>>>>>>      VM_GetObjectMonitorUsage op(this, calling_thread, 
>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>      VMThread::execute(&op);
>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>    }
>>>>>>>>>>>>    return err;
>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>
>>>>>>>>>>> I saw this code, so I guess there are some cases when 
>>>>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from 
>>>>>>>>>>> get_object_monitor_usage().
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Monitor owner would be acquired from monitor object at 
>>>>>>>>>>>>> first [1], but it would perform concurrently.
>>>>>>>>>>>>> If owner thread is not suspended, the owner might be 
>>>>>>>>>>>>> changed to others in subsequent code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, the owner might release the monitor before [2].
>>>>>>>>>>>>
>>>>>>>>>>>> The expectation is that when we find an owner thread it is 
>>>>>>>>>>>> either suspended or not. If it is suspended then it cannot 
>>>>>>>>>>>> release the monitor. If it is not suspended we detect that 
>>>>>>>>>>>> and redo the whole query at a safepoint.
>>>>>>>>>>>
>>>>>>>>>>> I think the owner thread might resume unfortunately after 
>>>>>>>>>>> suspending check.
>>>>>>>>>>
>>>>>>>>>> Yes you are right. I was thinking resuming also required a 
>>>>>>>>>> safepoint but it only requires the Threads_lock. So yes the 
>>>>>>>>>> code is wrong.
>>>>>>>>>
>>>>>>>>> Which code is wrong?
>>>>>>>>>
>>>>>>>>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() 
>>>>>>>>> caller
>>>>>>>>> has started the process of gathering the information while not 
>>>>>>>>> at a
>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>
>>>>>>>> The code tries to make sure that it either collects data about a 
>>>>>>>> monitor owned by a thread that is suspended, or else it collects 
>>>>>>>> that data at a safepoint. But the owning thread can be resumed 
>>>>>>>> just after the code determined it was suspended. The monitor can 
>>>>>>>> then be released and the information gathered not only stale but 
>>>>>>>> potentially completely wrong as it could now be owned by a 
>>>>>>>> different thread and will report that thread's entry count.
>>>>>>>
>>>>>>> If the agent is not using SuspendThread(), then as soon as
>>>>>>> GetObjectMonitorUsage() returns to the caller the information
>>>>>>> can be stale. In fact as soon as the implementation returns
>>>>>>> from the safepoint that gathered the info, the target thread
>>>>>>> could have moved on.
>>>>>>
>>>>>> That isn't the issue. That the info is stale is fine. But the 
>>>>>> expectation is that the information was actually an accurate 
>>>>>> snapshot of the state of the monitor at some point in time. The 
>>>>>> current code does not ensure that.
>>>>>
>>>>> Please explain. I clearly don't understand why you think the info
>>>>> returned isn't "an accurate snapshot of the state of the monitor
>>>>> at some point in time".
>>>>
>>>> Because it may not be a "snapshot" at all. There is no atomicity**. 
>>>> The reported owner thread may not own it any longer when the entry 
>>>> count is read, so straight away you may have the wrong entry count 
>>>> information. The set of threads trying to acquire the monitor, or 
>>>> wait on the monitor can change in unexpected ways. It would be 
>>>> possible for instance to report the same thread as being the owner, 
>>>> being blocked trying to enter the monitor, and being in the wait-set 
>>>> of the monitor - apparently all at the same time!
>>>>
>>>> ** even if the owner is suspended we don't have complete atomicity 
>>>> because threads can join the set of threads trying to enter the 
>>>> monitor (unless they are all suspended).
>>>
>>> Consider the case when the monitor's owner is _not_ suspended:
>>>
>>>   - GetObjectMonitorUsage() uses a safepoint to gather the info about
>>>     the object's monitor. Since we're at a safepoint, the info that
>>>     we are gathering cannot change until we return from the safepoint.
>>>     It is a snapshot and a valid one at that.
>>>
>>> Consider the case when the monitor's owner is suspended:
>>>
>>>   - GetObjectMonitorUsage() will gather info about the object's
>>>     monitor while _not_ at a safepoint. Assuming that no other
>>>     thread is suspended, then entry_count can change because
>>>     another thread can block on entry while we are gathering
>>>     info. waiter_count and waiters can change if a thread was
>>>     in a timed wait that has timed out and now that thread is
>>>     blocked on re-entry. I don't think that notify_waiter_count
>>>     and notify_waiters can change.
>>>
>>>     So in this case, the owner info and notify info is stable,
>>>     but the entry_count and waiter info is not stable.
>>>
>>> Consider the case when the monitor is not owned:
>>>
>>>   - GetObjectMonitorUsage() will start to gather info about the
>>>     object's monitor while _not_ at a safepoint. If it finds a
>>>     thread on the entry queue that is not suspended, then it will
>>>     bail out and redo the info gather at a safepoint. I just
>>>     noticed that it doesn't check for suspension for the threads
>>>     on the waiters list so a timed Object.wait() call can cause
>>>     some confusion here.
>>>
>>>     So in this case, the owner info is not stable if a thread
>>>     comes out of a timed wait and reenters the monitor. This
>>>     case is no different than if a "barger" thread comes in
>>>     after the NULL owner field is observed and enters the
>>>     monitor. We'll return that there is no owner, a list of
>>>     suspended pending entry thread and a list of waiting
>>>     threads. The reality is that the object's monitor is
>>>     owned by the "barger" that completely bypassed the entry
>>>     queue by virtue of seeing the NULL owner field at exactly
>>>     the right time.
>>>
>>> So the owner field is only stable when we have an owner. If
>>> that owner is not suspended, then the other fields are also
>>> stable because we gathered the info at a safepoint. If the
>>> owner is suspended, then the owner and notify info is stable,
>>> but the entry_count and waiter info is not stable.
>>>
>>> If we have a NULL owner field, then the info is only stable
>>> if you have a non-suspended thread on the entry list. Ouch!
>>> That's deterministic, but not without some work.
>>>
>>>
>>> Okay so only when we gather the info at a safepoint is all
>>> of it a valid and stable snapshot. Unfortunately, we only
>>> do that at a safepoint when the owner thread is not suspended
>>> or if owner == NULL and one of the entry threads is not
>>> suspended. If either of those conditions is not true, then
>>> the different pieces of info is unstable to varying degrees.
>>>
>>> As for this claim:
>>>
>>>> It would be possible for instance to report the same thread
>>>> as being the owner, being blocked trying to enter the monitor,
>>>> and being in the wait-set of the monitor - apparently all at
>>>> the same time! 
>>>
>>> I can't figure out a way to make that scenario work. If the
>>> thread is seen as the owner and is not suspended, then we
>>> gather info at a safepoint. If it is suspended, then it can't
>>> then be seen as on the entry queue or on the wait queue since
>>> it is suspended. If it is seen on the entry queue and is not
>>> suspended, then we gather info at a safepoint. If it is
>>> suspended on the entry queue, then it can't be seen on the
>>> wait queue.
>>>
>>> So the info instability of this API is bad, but it's not
>>> quite that bad. :-) (That is a small mercy.)
>>>
>>>
>>> Handshaking is not going to make this situation any better
>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>> handshake with the owner, the stability or instability of
>>> the other fields remains the same as when SuspendThread is
>>> used. Handshaking with all threads won't make the data as
>>> stable as when at a safepoint because individual threads
>>> can resume execution after doing their handshake so there
>>> will still be field instability.
>>>
>>>
>>> Short version: GetObjectMonitorUsage() should only gather
>>> data at a safepoint. Yes, I've changed my mind.
>>
>> I agree with this.
>> The advantages are:
>>   - the result is stable
>>   - the implementation can be simplified
>>
>> Performance impact is not very clear but should not be that
>> big as suspending all the threads has some overhead too.
>> I'm not sure if using handshakes can make performance better.
> 
> Ok, may I file it to JBS and fix it?
> 
> Yasumasa
> 
> 
>> Thanks,
>> Serguei
>>
>>> Dan
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> The only way to make sure you don't have stale information is
>>>>>>> to use SuspendThread(), but it's not required. Perhaps the doc
>>>>>>> should have more clear about the possibility of returning stale
>>>>>>> info. That's a question for Robert F.
>>>>>>>
>>>>>>>
>>>>>>>> GetObjectMonitorUsage says nothing about thread's being 
>>>>>>>> suspended so I can't see how this could be construed as an agent 
>>>>>>>> bug.
>>>>>>>
>>>>>>> In your scenario above, you mention that the target thread was
>>>>>>> suspended, GetObjectMonitorUsage() was called while the target
>>>>>>> was suspended, and then the target thread was resumed after
>>>>>>> GetObjectMonitorUsage() checked for suspension, but before
>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>
>>>>>>> All three of those calls: SuspendThread(), GetObjectMonitorUsage()
>>>>>>> and ResumeThread() are made by the agent and the agent should not
>>>>>>> resume the target thread while also calling GetObjectMonitorUsage().
>>>>>>> The calls were allowed to be made out of order so agent bug.
>>>>>>
>>>>>> Perhaps. I was thinking more generally about an independent 
>>>>>> resume, but you're right that doesn't really make a lot of sense. 
>>>>>> But when the spec says nothing about suspension ...
>>>>>
>>>>> And it is intentional that suspension is not required. JVM/DI and 
>>>>> JVM/PI
>>>>> used to require suspension for these kinds of get-the-info APIs. 
>>>>> JVM/TI
>>>>> intentionally was designed to not require suspension.
>>>>>
>>>>> As I've said before, we could add a note about the data being 
>>>>> potentially
>>>>> stale unless SuspendThread is used. I think of it like stat(2). You 
>>>>> can
>>>>> fetch the file's info, but there's no guarantee that the info is 
>>>>> current
>>>>> by the time you process what you got back. Is it too much 
>>>>> motherhood to
>>>>> state that the data might be stale? I could go either way...
>>>>>
>>>>>
>>>>>>
>>>>>>>> Using a handshake on the owner thread will allow this to be 
>>>>>>>> fixed in the future without forcing/using any safepoints.
>>>>>>>
>>>>>>> I have to think about that which is why I'm avoiding talking about
>>>>>>> handshakes in this thread.
>>>>>>
>>>>>> Effectively the handshake can "suspend" the thread whilst the 
>>>>>> monitor is queried. In effect the operation would create a 
>>>>>> per-thread safepoint.
>>>>>
>>>>> I "know" that, but I still need time to think about it and probably
>>>>> see the code to see if there are holes...
>>>>>
>>>>>
>>>>>> Semantically it is no different to the code actually suspending 
>>>>>> the owner thread, but it can't actually do that because 
>>>>>> suspends/resume don't nest.
>>>>>
>>>>> Yeah... we used have a suspend count back when we tracked internal and
>>>>> external suspends separately. That was a nightmare...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> David
>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to check 
>>>>>>>>>>> thread state, it returns `true` when the thread is sleeping 
>>>>>>>>>>> [3], or when it performs in native [4].
>>>>>>>>>>
>>>>>>>>>> Sure but if the thread is actually suspended it can't continue 
>>>>>>>>>> execution in the VM or in Java code.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> This appears to be an optimisation for the assumed common 
>>>>>>>>>>>> case where threads are first suspended and then the monitors 
>>>>>>>>>>>> are queried.
>>>>>>>>>>>
>>>>>>>>>>> I agree with this, but I could find out it from JVMTI spec - 
>>>>>>>>>>> it just says "Get information about the object's monitor."
>>>>>>>>>>
>>>>>>>>>> Yes it was just an implementation optimisation, nothing to do 
>>>>>>>>>> with the spec.
>>>>>>>>>>
>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect information in 
>>>>>>>>>>> some case.
>>>>>>>>>>>
>>>>>>>>>>> It starts with finding owner thread, but the owner might be 
>>>>>>>>>>> just before wakeup.
>>>>>>>>>>> So I think it is more safe if GetObjectMonitorUsage() is 
>>>>>>>>>>> called at safepoint in any case.
>>>>>>>>>>
>>>>>>>>>> Except we're moving away from safepoints to using Handshakes, 
>>>>>>>>>> so this particular operation will require that the apparent 
>>>>>>>>>> owner is Handshake-safe (by entering a handshake with it) 
>>>>>>>>>> before querying the monitor. This would still be preferable I 
>>>>>>>>>> think to always using a safepoint for the entire operation.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Yasumasa
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [3] 
>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>
>>>>>>>>>>> [4] 
>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> However there is still a potential bug as the thread 
>>>>>>>>>>>> reported as the owner may not be suspended at the time we 
>>>>>>>>>>>> first see it, and may release the monitor, but then it may 
>>>>>>>>>>>> get suspended before we call:
>>>>>>>>>>>>
>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>>>>>>>>
>>>>>>>>>>>> and so we think it is still the monitor owner and proceed to 
>>>>>>>>>>>> query the monitor information in a racy way. This can't 
>>>>>>>>>>>> happen when suspension itself requires a safepoint as the 
>>>>>>>>>>>> current thread won't go to that safepoint during this code. 
>>>>>>>>>>>> However, if suspension is implemented via a direct handshake 
>>>>>>>>>>>> with the target thread then we have a problem.
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>