Question about GetObjectMonitorUsage() JVMTI function

Mon Jun 15 22:40:10 UTC 2020

On 6/15/20 6:14 PM, David Holmes wrote:
> Hi Dan,
>
> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>> On 6/15/20 3:26 AM, David Holmes wrote:
>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>> Hi David,
>>>>
>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I wonder why JvmtiEnvBase::get_object_monitor_usage() 
>>>>>> (implementation of GetObjectMonitorUsage()) does not perform at 
>>>>>> safepoint.
>>>>>
>>>>> GetObjectMonitorUsage will use a safepoint if the target is not 
>>>>> suspended:
>>>>>
>>>>> jvmtiError
>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, jvmtiMonitorUsage* 
>>>>> info_ptr) {
>>>>>    JavaThread* calling_thread = JavaThread::current();
>>>>>    jvmtiError err = get_object_monitor_usage(calling_thread, 
>>>>> object, info_ptr);
>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>      // Some of the critical threads were not suspended. go to a 
>>>>> safepoint and try again
>>>>>      VM_GetObjectMonitorUsage op(this, calling_thread, object, 
>>>>> info_ptr);
>>>>>      VMThread::execute(&op);
>>>>>      err = op.result();
>>>>>    }
>>>>>    return err;
>>>>> } /* end GetObject */
>>>>
>>>> I saw this code, so I guess there are some cases when 
>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from 
>>>> get_object_monitor_usage().
>>>>
>>>>
>>>>>> Monitor owner would be acquired from monitor object at first [1], 
>>>>>> but it would perform concurrently.
>>>>>> If owner thread is not suspended, the owner might be changed to 
>>>>>> others in subsequent code.
>>>>>>
>>>>>> For example, the owner might release the monitor before [2].
>>>>>
>>>>> The expectation is that when we find an owner thread it is either 
>>>>> suspended or not. If it is suspended then it cannot release the 
>>>>> monitor. If it is not suspended we detect that and redo the whole 
>>>>> query at a safepoint.
>>>>
>>>> I think the owner thread might resume unfortunately after 
>>>> suspending check.
>>>
>>> Yes you are right. I was thinking resuming also required a safepoint 
>>> but it only requires the Threads_lock. So yes the code is wrong.
>>
>> Which code is wrong?
>>
>> Yes, a rogue resume can happen when the GetObjectMonitorUsage() caller
>> has started the process of gathering the information while not at a
>> safepoint. Thus the information returned by GetObjectMonitorUsage()
>> might be stale, but that's a bug in the agent code.
>
> The code tries to make sure that it either collects data about a 
> monitor owned by a thread that is suspended, or else it collects that 
> data at a safepoint. But the owning thread can be resumed just after 
> the code determined it was suspended. The monitor can then be released 
> and the information gathered not only stale but potentially completely 
> wrong as it could now be owned by a different thread and will report 
> that thread's entry count.

If the agent is not using SuspendThread(), then as soon as
GetObjectMonitorUsage() returns to the caller the information
can be stale. In fact as soon as the implementation returns
from the safepoint that gathered the info, the target thread
could have moved on.

The only way to make sure you don't have stale information is
to use SuspendThread(), but it's not required. Perhaps the doc
should have more clear about the possibility of returning stale
info. That's a question for Robert F.

> GetObjectMonitorUsage says nothing about thread's being suspended so I 
> can't see how this could be construed as an agent bug.

In your scenario above, you mention that the target thread was
suspended, GetObjectMonitorUsage() was called while the target
was suspended, and then the target thread was resumed after
GetObjectMonitorUsage() checked for suspension, but before
GetObjectMonitorUsage() was able to gather the info.

All three of those calls: SuspendThread(), GetObjectMonitorUsage()
and ResumeThread() are made by the agent and the agent should not
resume the target thread while also calling GetObjectMonitorUsage().
The calls were allowed to be made out of order so agent bug.

> Using a handshake on the owner thread will allow this to be fixed in 
> the future without forcing/using any safepoints.

I have to think about that which is why I'm avoiding talking about
handshakes in this thread.

Dan

>
> Cheers,
> David
>
>> Dan
>>
>>
>>>
>>>
>>>> JavaThread::is_ext_suspend_completed() is used to check thread 
>>>> state, it returns `true` when the thread is sleeping [3], or when 
>>>> it performs in native [4].
>>>
>>> Sure but if the thread is actually suspended it can't continue 
>>> execution in the VM or in Java code.
>>>
>>>>
>>>>> This appears to be an optimisation for the assumed common case 
>>>>> where threads are first suspended and then the monitors are queried.
>>>>
>>>> I agree with this, but I could find out it from JVMTI spec - it 
>>>> just says "Get information about the object's monitor."
>>>
>>> Yes it was just an implementation optimisation, nothing to do with 
>>> the spec.
>>>
>>>> GetObjectMonitorUsage() might return incorrect information in some 
>>>> case.
>>>>
>>>> It starts with finding owner thread, but the owner might be just 
>>>> before wakeup.
>>>> So I think it is more safe if GetObjectMonitorUsage() is called at 
>>>> safepoint in any case.
>>>
>>> Except we're moving away from safepoints to using Handshakes, so 
>>> this particular operation will require that the apparent owner is 
>>> Handshake-safe (by entering a handshake with it) before querying the 
>>> monitor. This would still be preferable I think to always using a 
>>> safepoint for the entire operation.
>>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> [3] 
>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>
>>>> [4] 
>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>
>>>>
>>>>
>>>>> However there is still a potential bug as the thread reported as 
>>>>> the owner may not be suspended at the time we first see it, and 
>>>>> may release the monitor, but then it may get suspended before we 
>>>>> call:
>>>>>
>>>>>   owning_thread = 
>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>
>>>>> and so we think it is still the monitor owner and proceed to query 
>>>>> the monitor information in a racy way. This can't happen when 
>>>>> suspension itself requires a safepoint as the current thread won't 
>>>>> go to that safepoint during this code. However, if suspension is 
>>>>> implemented via a direct handshake with the target thread then we 
>>>>> have a problem.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> [1] 
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>
>>>>>> [2] 
>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>
>>