RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

Daniel D. Daugherty daniel.daugherty at oracle.com
Thu Jun 18 14:06:38 UTC 2020


On 6/18/20 10:04 AM, David Holmes wrote:
> On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote:
>> On 6/18/20 9:18 AM, David Holmes wrote:
>>> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote:
>>>> On 2020/06/18 17:36, David Holmes wrote:
>>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Both ThreadsListHandle and ResourceMarks would use 
>>>>>> `Thread::current()` for their resource. It is set as default 
>>>>>> parameter in c'tor.
>>>>>> Do you mean we should it explicitly in c'tor?
>>>>>
>>>>> Yes pass current_thread so we don't do the additional unnecessary 
>>>>> calls to Thread::current().
>>>>
>>>> Ok, I've fixed them. Could you review again?
>>>>
>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/
>>>
>>> Updates look good. One nit I missed before:
>>>
>>> src/hotspot/share/prims/jvmtiEnv.cpp
>>>
>>> // It need to perform at safepoint for gathering stable data
>>>
>>> please change to:
>>>
>>> // This need to be performed at a safepoint to gather stable data
>>
>> Just a comment on this comment... I still haven't gotten to the 
>> webrev yet...
>>
>> Perhaps:
>>
>>      // This needs to be performed at a safepoint to gather stable data.
>
> There is a second line that continues the sentence
>
> // because monitor owner / waiters might not be suspended.

So no period at the end... but the s/need/needs/ still works. :-)

Dan

>
> David
> -----
>
>> Dan
>>>
>>> Thanks,
>>> David
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>> David
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>> On 2020/06/18 13:58, David Holmes wrote:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
>>>>>>>> Hi Serguei,
>>>>>>>>
>>>>>>>> Thanks for your comment!
>>>>>>>> I uploaded new webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/
>>>>>>>>
>>>>>>>> I'm not sure the following change is correct.
>>>>>>>> Can we assume owning_thread is not NULL at safepoint?
>>>>>>>
>>>>>>> We can if "owner != NULL". So that change seem fine to me.
>>>>>>>
>>>>>>> But given this is now only executed at a safepoint there are 
>>>>>>> additional simplifications that can be made:
>>>>>>>
>>>>>>> - current thread determination can be simplified:
>>>>>>>
>>>>>>> 945   Thread* current_thread = Thread::current();
>>>>>>>
>>>>>>> becomes:
>>>>>>>
>>>>>>>     Thread* current_thread = VMThread::vm_thread();
>>>>>>>     assert(current_thread == Thread::current(), "must be");
>>>>>>>
>>>>>>> - these comments can be removed
>>>>>>>
>>>>>>>   994       // Use current thread since function can be called 
>>>>>>> from a
>>>>>>>   995       // JavaThread or the VMThread.
>>>>>>> 1053       // Use current thread since function can be called 
>>>>>>> from a
>>>>>>> 1054       // JavaThread or the VMThread.
>>>>>>>
>>>>>>> - these TLH constructions should be passing current_thread 
>>>>>>> (existing bug)
>>>>>>>
>>>>>>> 996       ThreadsListHandle tlh;
>>>>>>> 1055       ThreadsListHandle tlh;
>>>>>>>
>>>>>>> - All ResourceMarks should be passing current_thread (existing bug)
>>>>>>>
>>>>>>>
>>>>>>> Aside: there is a major inconsistency between the spec and 
>>>>>>> implementation for this method. I've traced the history to see 
>>>>>>> how this came about from JVMDI (ref JDK-4546581) but it never 
>>>>>>> resulted in the JVM TI specification clearly stating what the 
>>>>>>> waiters/waiter_count means. I will file a bug to have the spec 
>>>>>>> clarified to match the implementation (even though I think the 
>>>>>>> implementation is what is wrong). :(
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> All tests on submit repo and serviceability/jvmti and 
>>>>>>>> vmTestbase/nsk/jvmti have been passed with this change.
>>>>>>>>
>>>>>>>>
>>>>>>>> ```
>>>>>>>>         // This monitor is owned so we have to find the owning 
>>>>>>>> JavaThread.
>>>>>>>>         owning_thread = 
>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>>>> -      // Cannot assume (owning_thread != NULL) here because 
>>>>>>>> this function
>>>>>>>> -      // may not have been called at a safepoint and the 
>>>>>>>> owning_thread
>>>>>>>> -      // might not be suspended.
>>>>>>>> -      if (owning_thread != NULL) {
>>>>>>>> -        // The monitor's owner either has to be the current 
>>>>>>>> thread, at safepoint
>>>>>>>> -        // or it has to be suspended. Any of these conditions 
>>>>>>>> will prevent both
>>>>>>>> -        // contending and waiting threads from modifying the 
>>>>>>>> state of
>>>>>>>> -        // the monitor.
>>>>>>>> -        if (!at_safepoint && 
>>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) {
>>>>>>>> -          // Don't worry! This return of 
>>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED
>>>>>>>> -          // will not make it back to the JVM/TI agent. The 
>>>>>>>> error code will
>>>>>>>> -          // get intercepted in 
>>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which
>>>>>>>> -          // will retry the call via a 
>>>>>>>> VM_GetObjectMonitorUsage VM op.
>>>>>>>> -          return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>>>>>>>> -        }
>>>>>>>> -        HandleMark hm;
>>>>>>>> +      assert(owning_thread != NULL, "owning JavaThread must 
>>>>>>>> not be NULL");
>>>>>>>>           Handle     th(current_thread, 
>>>>>>>> owning_thread->threadObj());
>>>>>>>>           ret.owner = (jthread)jni_reference(calling_thread, th);
>>>>>>>>
>>>>>>>> ```
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote:
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> This fix is not enough.
>>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in 
>>>>>>>>> two modes: in VMop and non-VMop.
>>>>>>>>> The non-VMop mode has to be removed.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote:
>>>>>>>>>> (Change subject for RFR)
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I filed it to JBS and upload a webrev for it.
>>>>>>>>>> Could you review it?
>>>>>>>>>>
>>>>>>>>>>   JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
>>>>>>>>>>   webrev: 
>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
>>>>>>>>>>
>>>>>>>>>> This change has passed tests on submit repo.
>>>>>>>>>> Also I tested it with serviceability/jvmti and 
>>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>> Yes. It seems we have a consensus.
>>>>>>>>>>> Thank you for taking care about it.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Serguei
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote:
>>>>>>>>>>>>> Ok, may I file it to JBS and fix it? 
>>>>>>>>>>>>
>>>>>>>>>>>> Go for it! :)
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>>>>> Hi Dan, David and Yasumasa,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I wonder why 
>>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage() 
>>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) 
>>>>>>>>>>>>>>>>>>>>>>>>> does not perform at safepoint.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if 
>>>>>>>>>>>>>>>>>>>>>>>> the target is not suspended:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object, 
>>>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>>>>>>>>>>>>    JavaThread* calling_thread = 
>>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current();
>>>>>>>>>>>>>>>>>>>>>>>>    jvmtiError err = 
>>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, 
>>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>>>>    if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>>>>>>>>>>>>      // Some of the critical threads were not 
>>>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again
>>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, 
>>>>>>>>>>>>>>>>>>>>>>>> calling_thread, object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op);
>>>>>>>>>>>>>>>>>>>>>>>>      err = op.result();
>>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>>    return err;
>>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases 
>>>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not 
>>>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage().
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor 
>>>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform 
>>>>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner 
>>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the 
>>>>>>>>>>>>>>>>>>>>>>>>> monitor before [2].
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner 
>>>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is 
>>>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. 
>>>>>>>>>>>>>>>>>>>>>>>> If it is not suspended we detect that and redo 
>>>>>>>>>>>>>>>>>>>>>>>> the whole query at a safepoint.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume 
>>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also 
>>>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the 
>>>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Which code is wrong?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the 
>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>>>>>>>>>>>>> has started the process of gathering the 
>>>>>>>>>>>>>>>>>>>>> information while not at a
>>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by 
>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects 
>>>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is 
>>>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a 
>>>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed 
>>>>>>>>>>>>>>>>>>>> just after the code determined it was suspended. 
>>>>>>>>>>>>>>>>>>>> The monitor can then be released and the 
>>>>>>>>>>>>>>>>>>>> information gathered not only stale but potentially 
>>>>>>>>>>>>>>>>>>>> completely wrong as it could now be owned by a 
>>>>>>>>>>>>>>>>>>>> different thread and will report that thread's 
>>>>>>>>>>>>>>>>>>>> entry count.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as 
>>>>>>>>>>>>>>>>>>> soon as
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the 
>>>>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation 
>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the 
>>>>>>>>>>>>>>>>>>> target thread
>>>>>>>>>>>>>>>>>>> could have moved on.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine. 
>>>>>>>>>>>>>>>>>> But the expectation is that the information was 
>>>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the 
>>>>>>>>>>>>>>>>>> monitor at some point in time. The current code does 
>>>>>>>>>>>>>>>>>> not ensure that.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you 
>>>>>>>>>>>>>>>>> think the info
>>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of 
>>>>>>>>>>>>>>>>> the monitor
>>>>>>>>>>>>>>>>> at some point in time".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no 
>>>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it 
>>>>>>>>>>>>>>>> any longer when the entry count is read, so straight 
>>>>>>>>>>>>>>>> away you may have the wrong entry count information. 
>>>>>>>>>>>>>>>> The set of threads trying to acquire the monitor, or 
>>>>>>>>>>>>>>>> wait on the monitor can change in unexpected ways. It 
>>>>>>>>>>>>>>>> would be possible for instance to report the same 
>>>>>>>>>>>>>>>> thread as being the owner, being blocked trying to 
>>>>>>>>>>>>>>>> enter the monitor, and being in the wait-set of the 
>>>>>>>>>>>>>>>> monitor - apparently all at the same time!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have 
>>>>>>>>>>>>>>>> complete atomicity because threads can join the set of 
>>>>>>>>>>>>>>>> threads trying to enter the monitor (unless they are 
>>>>>>>>>>>>>>>> all suspended).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_ 
>>>>>>>>>>>>>>> suspended:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   - GetObjectMonitorUsage() uses a safepoint to gather 
>>>>>>>>>>>>>>> the info about
>>>>>>>>>>>>>>>     the object's monitor. Since we're at a safepoint, 
>>>>>>>>>>>>>>> the info that
>>>>>>>>>>>>>>>     we are gathering cannot change until we return from 
>>>>>>>>>>>>>>> the safepoint.
>>>>>>>>>>>>>>>     It is a snapshot and a valid one at that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   - GetObjectMonitorUsage() will gather info about the 
>>>>>>>>>>>>>>> object's
>>>>>>>>>>>>>>>     monitor while _not_ at a safepoint. Assuming that no 
>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>     thread is suspended, then entry_count can change 
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>     another thread can block on entry while we are 
>>>>>>>>>>>>>>> gathering
>>>>>>>>>>>>>>>     info. waiter_count and waiters can change if a 
>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>     in a timed wait that has timed out and now that 
>>>>>>>>>>>>>>> thread is
>>>>>>>>>>>>>>>     blocked on re-entry. I don't think that 
>>>>>>>>>>>>>>> notify_waiter_count
>>>>>>>>>>>>>>>     and notify_waiters can change.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     So in this case, the owner info and notify info is 
>>>>>>>>>>>>>>> stable,
>>>>>>>>>>>>>>>     but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Consider the case when the monitor is not owned:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   - GetObjectMonitorUsage() will start to gather info 
>>>>>>>>>>>>>>> about the
>>>>>>>>>>>>>>>     object's monitor while _not_ at a safepoint. If it 
>>>>>>>>>>>>>>> finds a
>>>>>>>>>>>>>>>     thread on the entry queue that is not suspended, 
>>>>>>>>>>>>>>> then it will
>>>>>>>>>>>>>>>     bail out and redo the info gather at a safepoint. I 
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>     noticed that it doesn't check for suspension for the 
>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>     on the waiters list so a timed Object.wait() call 
>>>>>>>>>>>>>>> can cause
>>>>>>>>>>>>>>>     some confusion here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     So in this case, the owner info is not stable if a 
>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>     comes out of a timed wait and reenters the monitor. 
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>     case is no different than if a "barger" thread comes in
>>>>>>>>>>>>>>>     after the NULL owner field is observed and enters the
>>>>>>>>>>>>>>>     monitor. We'll return that there is no owner, a list of
>>>>>>>>>>>>>>>     suspended pending entry thread and a list of waiting
>>>>>>>>>>>>>>>     threads. The reality is that the object's monitor is
>>>>>>>>>>>>>>>     owned by the "barger" that completely bypassed the 
>>>>>>>>>>>>>>> entry
>>>>>>>>>>>>>>>     queue by virtue of seeing the NULL owner field at 
>>>>>>>>>>>>>>> exactly
>>>>>>>>>>>>>>>     the right time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If
>>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also
>>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the
>>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is 
>>>>>>>>>>>>>>> stable,
>>>>>>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable
>>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>>>>>>>>>>>>> That's deterministic, but not without some work.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all
>>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not 
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not
>>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then
>>>>>>>>>>>>>>> the different pieces of info is unstable to varying 
>>>>>>>>>>>>>>> degrees.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for this claim:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It would be possible for instance to report the same 
>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the 
>>>>>>>>>>>>>>>> monitor,
>>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently 
>>>>>>>>>>>>>>>> all at
>>>>>>>>>>>>>>>> the same time! 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the
>>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we
>>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it 
>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue 
>>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is 
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is
>>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the
>>>>>>>>>>>>>>> wait queue.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not
>>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Handshaking is not going to make this situation any better
>>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>>>>>>>>>>>>> handshake with the owner, the stability or instability of
>>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is
>>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as
>>>>>>>>>>>>>>> stable as when at a safepoint because individual threads
>>>>>>>>>>>>>>> can resume execution after doing their handshake so there
>>>>>>>>>>>>>>> will still be field instability.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather
>>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>>> The advantages are:
>>>>>>>>>>>>>>   - the result is stable
>>>>>>>>>>>>>>   - the implementation can be simplified
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Performance impact is not very clear but should not be that
>>>>>>>>>>>>>> big as suspending all the threads has some overhead too.
>>>>>>>>>>>>>> I'm not sure if using handshakes can make performance 
>>>>>>>>>>>>>> better.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Serguei
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale 
>>>>>>>>>>>>>>>>>>> information is
>>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. 
>>>>>>>>>>>>>>>>>>> Perhaps the doc
>>>>>>>>>>>>>>>>>>> should have more clear about the possibility of 
>>>>>>>>>>>>>>>>>>> returning stale
>>>>>>>>>>>>>>>>>>> info. That's a question for Robert F.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's 
>>>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be 
>>>>>>>>>>>>>>>>>>>> construed as an agent bug.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target 
>>>>>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while 
>>>>>>>>>>>>>>>>>>> the target
>>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was 
>>>>>>>>>>>>>>>>>>> resumed after
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but 
>>>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(), 
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the 
>>>>>>>>>>>>>>>>>>> agent should not
>>>>>>>>>>>>>>>>>>> resume the target thread while also calling 
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage().
>>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so 
>>>>>>>>>>>>>>>>>>> agent bug.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an 
>>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't 
>>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says 
>>>>>>>>>>>>>>>>>> nothing about suspension ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And it is intentional that suspension is not required. 
>>>>>>>>>>>>>>>>> JVM/DI and JVM/PI
>>>>>>>>>>>>>>>>> used to require suspension for these kinds of 
>>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI
>>>>>>>>>>>>>>>>> intentionally was designed to not require suspension.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As I've said before, we could add a note about the 
>>>>>>>>>>>>>>>>> data being potentially
>>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like 
>>>>>>>>>>>>>>>>> stat(2). You can
>>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that 
>>>>>>>>>>>>>>>>> the info is current
>>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too 
>>>>>>>>>>>>>>>>> much motherhood to
>>>>>>>>>>>>>>>>> state that the data might be stale? I could go either 
>>>>>>>>>>>>>>>>> way...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow 
>>>>>>>>>>>>>>>>>>>> this to be fixed in the future without 
>>>>>>>>>>>>>>>>>>>> forcing/using any safepoints.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding 
>>>>>>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>>>>>>> handshakes in this thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread 
>>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the 
>>>>>>>>>>>>>>>>>> operation would create a per-thread safepoint.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it 
>>>>>>>>>>>>>>>>> and probably
>>>>>>>>>>>>>>>>> see the code to see if there are holes...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually 
>>>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do 
>>>>>>>>>>>>>>>>>> that because suspends/resume don't nest.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we 
>>>>>>>>>>>>>>>>> tracked internal and
>>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used 
>>>>>>>>>>>>>>>>>>>>>>> to check thread state, it returns `true` when 
>>>>>>>>>>>>>>>>>>>>>>> the thread is sleeping [3], or when it performs 
>>>>>>>>>>>>>>>>>>>>>>> in native [4].
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it 
>>>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the 
>>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first 
>>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from 
>>>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about 
>>>>>>>>>>>>>>>>>>>>>>> the object's monitor."
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation, 
>>>>>>>>>>>>>>>>>>>>>> nothing to do with the spec.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect 
>>>>>>>>>>>>>>>>>>>>>>> information in some case.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the 
>>>>>>>>>>>>>>>>>>>>>>> owner might be just before wakeup.
>>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if 
>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint 
>>>>>>>>>>>>>>>>>>>>>>> in any case.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using 
>>>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will 
>>>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe 
>>>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying 
>>>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I 
>>>>>>>>>>>>>>>>>>>>>> think to always using a safepoint for the entire 
>>>>>>>>>>>>>>>>>>>>>> operation.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [3] 
>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671 
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [4] 
>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684 
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the 
>>>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be 
>>>>>>>>>>>>>>>>>>>>>>>> suspended at the time we first see it, and may 
>>>>>>>>>>>>>>>>>>>>>>>> release the monitor, but then it may get 
>>>>>>>>>>>>>>>>>>>>>>>> suspended before we call:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>   owning_thread = 
>>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), 
>>>>>>>>>>>>>>>>>>>>>>>> owner);
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner 
>>>>>>>>>>>>>>>>>>>>>>>> and proceed to query the monitor information in 
>>>>>>>>>>>>>>>>>>>>>>>> a racy way. This can't happen when suspension 
>>>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current 
>>>>>>>>>>>>>>>>>>>>>>>> thread won't go to that safepoint during this 
>>>>>>>>>>>>>>>>>>>>>>>> code. However, if suspension is implemented via 
>>>>>>>>>>>>>>>>>>>>>>>> a direct handshake with the target thread then 
>>>>>>>>>>>>>>>>>>>>>>>> we have a problem.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [1] 
>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973 
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [2] 
>>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996 
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>



More information about the serviceability-dev mailing list