RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information
David Holmes
david.holmes at oracle.com
Thu Jun 18 14:04:19 UTC 2020
On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote:
> On 6/18/20 9:18 AM, David Holmes wrote:
>> On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote:
>>> On 2020/06/18 17:36, David Holmes wrote:
>>>> On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
>>>>> Hi David,
>>>>>
>>>>> Both ThreadsListHandle and ResourceMarks would use
>>>>> `Thread::current()` for their resource. It is set as default
>>>>> parameter in c'tor.
>>>>> Do you mean we should it explicitly in c'tor?
>>>>
>>>> Yes pass current_thread so we don't do the additional unnecessary
>>>> calls to Thread::current().
>>>
>>> Ok, I've fixed them. Could you review again?
>>>
>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/
>>
>> Updates look good. One nit I missed before:
>>
>> src/hotspot/share/prims/jvmtiEnv.cpp
>>
>> // It need to perform at safepoint for gathering stable data
>>
>> please change to:
>>
>> // This need to be performed at a safepoint to gather stable data
>
> Just a comment on this comment... I still haven't gotten to the webrev
> yet...
>
> Perhaps:
>
> // This needs to be performed at a safepoint to gather stable data.
There is a second line that continues the sentence
// because monitor owner / waiters might not be suspended.
David
-----
> Dan
>>
>> Thanks,
>> David
>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>>> David
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2020/06/18 13:58, David Holmes wrote:
>>>>>> Hi Yasumasa,
>>>>>>
>>>>>> On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
>>>>>>> Hi Serguei,
>>>>>>>
>>>>>>> Thanks for your comment!
>>>>>>> I uploaded new webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/
>>>>>>>
>>>>>>> I'm not sure the following change is correct.
>>>>>>> Can we assume owning_thread is not NULL at safepoint?
>>>>>>
>>>>>> We can if "owner != NULL". So that change seem fine to me.
>>>>>>
>>>>>> But given this is now only executed at a safepoint there are
>>>>>> additional simplifications that can be made:
>>>>>>
>>>>>> - current thread determination can be simplified:
>>>>>>
>>>>>> 945 Thread* current_thread = Thread::current();
>>>>>>
>>>>>> becomes:
>>>>>>
>>>>>> Thread* current_thread = VMThread::vm_thread();
>>>>>> assert(current_thread == Thread::current(), "must be");
>>>>>>
>>>>>> - these comments can be removed
>>>>>>
>>>>>> 994 // Use current thread since function can be called from a
>>>>>> 995 // JavaThread or the VMThread.
>>>>>> 1053 // Use current thread since function can be called from a
>>>>>> 1054 // JavaThread or the VMThread.
>>>>>>
>>>>>> - these TLH constructions should be passing current_thread
>>>>>> (existing bug)
>>>>>>
>>>>>> 996 ThreadsListHandle tlh;
>>>>>> 1055 ThreadsListHandle tlh;
>>>>>>
>>>>>> - All ResourceMarks should be passing current_thread (existing bug)
>>>>>>
>>>>>>
>>>>>> Aside: there is a major inconsistency between the spec and
>>>>>> implementation for this method. I've traced the history to see how
>>>>>> this came about from JVMDI (ref JDK-4546581) but it never resulted
>>>>>> in the JVM TI specification clearly stating what the
>>>>>> waiters/waiter_count means. I will file a bug to have the spec
>>>>>> clarified to match the implementation (even though I think the
>>>>>> implementation is what is wrong). :(
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> All tests on submit repo and serviceability/jvmti and
>>>>>>> vmTestbase/nsk/jvmti have been passed with this change.
>>>>>>>
>>>>>>>
>>>>>>> ```
>>>>>>> // This monitor is owned so we have to find the owning
>>>>>>> JavaThread.
>>>>>>> owning_thread =
>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(), owner);
>>>>>>> - // Cannot assume (owning_thread != NULL) here because this
>>>>>>> function
>>>>>>> - // may not have been called at a safepoint and the
>>>>>>> owning_thread
>>>>>>> - // might not be suspended.
>>>>>>> - if (owning_thread != NULL) {
>>>>>>> - // The monitor's owner either has to be the current
>>>>>>> thread, at safepoint
>>>>>>> - // or it has to be suspended. Any of these conditions
>>>>>>> will prevent both
>>>>>>> - // contending and waiting threads from modifying the
>>>>>>> state of
>>>>>>> - // the monitor.
>>>>>>> - if (!at_safepoint &&
>>>>>>> !owning_thread->is_thread_fully_suspended(true, &debug_bits)) {
>>>>>>> - // Don't worry! This return of
>>>>>>> JVMTI_ERROR_THREAD_NOT_SUSPENDED
>>>>>>> - // will not make it back to the JVM/TI agent. The
>>>>>>> error code will
>>>>>>> - // get intercepted in
>>>>>>> JvmtiEnv::GetObjectMonitorUsage() which
>>>>>>> - // will retry the call via a VM_GetObjectMonitorUsage
>>>>>>> VM op.
>>>>>>> - return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
>>>>>>> - }
>>>>>>> - HandleMark hm;
>>>>>>> + assert(owning_thread != NULL, "owning JavaThread must not
>>>>>>> be NULL");
>>>>>>> Handle th(current_thread, owning_thread->threadObj());
>>>>>>> ret.owner = (jthread)jni_reference(calling_thread, th);
>>>>>>>
>>>>>>> ```
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>> On 2020/06/18 0:42, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> This fix is not enough.
>>>>>>>> The function JvmtiEnvBase::get_object_monitor_usage works in two
>>>>>>>> modes: in VMop and non-VMop.
>>>>>>>> The non-VMop mode has to be removed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/17/20 02:18, Yasumasa Suenaga wrote:
>>>>>>>>> (Change subject for RFR)
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I filed it to JBS and upload a webrev for it.
>>>>>>>>> Could you review it?
>>>>>>>>>
>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
>>>>>>>>> webrev:
>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
>>>>>>>>>
>>>>>>>>> This change has passed tests on submit repo.
>>>>>>>>> Also I tested it with serviceability/jvmti and
>>>>>>>>> vmTestbase/nsk/jvmti on Linux x64.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Yasumasa
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2020/06/17 14:37, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>> Yes. It seems we have a consensus.
>>>>>>>>>> Thank you for taking care about it.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Serguei
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/16/20 18:34, David Holmes wrote:
>>>>>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>>>>
>>>>>>>>>>> Go for it! :)
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
>>>>>>>>>>>> On 2020/06/17 8:47, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>>>> Hi Dan, David and Yasumasa,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 6/16/20 07:39, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>> On 6/15/20 9:28 PM, David Holmes wrote:
>>>>>>>>>>>>>>> On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>> On 6/15/20 7:19 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>> On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>> On 6/15/20 6:14 PM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
>>>>>>>>>>>>>>>>>>>> On 6/15/20 3:26 AM, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 2020/06/15 14:15, David Holmes wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I wonder why
>>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnvBase::get_object_monitor_usage()
>>>>>>>>>>>>>>>>>>>>>>>> (implementation of GetObjectMonitorUsage()) does
>>>>>>>>>>>>>>>>>>>>>>>> not perform at safepoint.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage will use a safepoint if the
>>>>>>>>>>>>>>>>>>>>>>> target is not suspended:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> jvmtiError
>>>>>>>>>>>>>>>>>>>>>>> JvmtiEnv::GetObjectMonitorUsage(jobject object,
>>>>>>>>>>>>>>>>>>>>>>> jvmtiMonitorUsage* info_ptr) {
>>>>>>>>>>>>>>>>>>>>>>> JavaThread* calling_thread =
>>>>>>>>>>>>>>>>>>>>>>> JavaThread::current();
>>>>>>>>>>>>>>>>>>>>>>> jvmtiError err =
>>>>>>>>>>>>>>>>>>>>>>> get_object_monitor_usage(calling_thread, object,
>>>>>>>>>>>>>>>>>>>>>>> info_ptr);
>>>>>>>>>>>>>>>>>>>>>>> if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
>>>>>>>>>>>>>>>>>>>>>>> // Some of the critical threads were not
>>>>>>>>>>>>>>>>>>>>>>> suspended. go to a safepoint and try again
>>>>>>>>>>>>>>>>>>>>>>> VM_GetObjectMonitorUsage op(this, calling_thread,
>>>>>>>>>>>>>>>>>>>>>>> object, info_ptr);
>>>>>>>>>>>>>>>>>>>>>>> VMThread::execute(&op);
>>>>>>>>>>>>>>>>>>>>>>> err = op.result();
>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>> return err;
>>>>>>>>>>>>>>>>>>>>>>> } /* end GetObject */
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I saw this code, so I guess there are some cases
>>>>>>>>>>>>>>>>>>>>>> when JVMTI_ERROR_THREAD_NOT_SUSPENDED is not
>>>>>>>>>>>>>>>>>>>>>> returned from get_object_monitor_usage().
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Monitor owner would be acquired from monitor
>>>>>>>>>>>>>>>>>>>>>>>> object at first [1], but it would perform
>>>>>>>>>>>>>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>>>>>>>>>>>>> If owner thread is not suspended, the owner
>>>>>>>>>>>>>>>>>>>>>>>> might be changed to others in subsequent code.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> For example, the owner might release the monitor
>>>>>>>>>>>>>>>>>>>>>>>> before [2].
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The expectation is that when we find an owner
>>>>>>>>>>>>>>>>>>>>>>> thread it is either suspended or not. If it is
>>>>>>>>>>>>>>>>>>>>>>> suspended then it cannot release the monitor. If
>>>>>>>>>>>>>>>>>>>>>>> it is not suspended we detect that and redo the
>>>>>>>>>>>>>>>>>>>>>>> whole query at a safepoint.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think the owner thread might resume
>>>>>>>>>>>>>>>>>>>>>> unfortunately after suspending check.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes you are right. I was thinking resuming also
>>>>>>>>>>>>>>>>>>>>> required a safepoint but it only requires the
>>>>>>>>>>>>>>>>>>>>> Threads_lock. So yes the code is wrong.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Which code is wrong?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, a rogue resume can happen when the
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() caller
>>>>>>>>>>>>>>>>>>>> has started the process of gathering the information
>>>>>>>>>>>>>>>>>>>> while not at a
>>>>>>>>>>>>>>>>>>>> safepoint. Thus the information returned by
>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>>>> might be stale, but that's a bug in the agent code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The code tries to make sure that it either collects
>>>>>>>>>>>>>>>>>>> data about a monitor owned by a thread that is
>>>>>>>>>>>>>>>>>>> suspended, or else it collects that data at a
>>>>>>>>>>>>>>>>>>> safepoint. But the owning thread can be resumed just
>>>>>>>>>>>>>>>>>>> after the code determined it was suspended. The
>>>>>>>>>>>>>>>>>>> monitor can then be released and the information
>>>>>>>>>>>>>>>>>>> gathered not only stale but potentially completely
>>>>>>>>>>>>>>>>>>> wrong as it could now be owned by a different thread
>>>>>>>>>>>>>>>>>>> and will report that thread's entry count.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If the agent is not using SuspendThread(), then as
>>>>>>>>>>>>>>>>>> soon as
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() returns to the caller the
>>>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>>>> can be stale. In fact as soon as the implementation
>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>> from the safepoint that gathered the info, the target
>>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>>> could have moved on.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That isn't the issue. That the info is stale is fine.
>>>>>>>>>>>>>>>>> But the expectation is that the information was
>>>>>>>>>>>>>>>>> actually an accurate snapshot of the state of the
>>>>>>>>>>>>>>>>> monitor at some point in time. The current code does
>>>>>>>>>>>>>>>>> not ensure that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please explain. I clearly don't understand why you think
>>>>>>>>>>>>>>>> the info
>>>>>>>>>>>>>>>> returned isn't "an accurate snapshot of the state of the
>>>>>>>>>>>>>>>> monitor
>>>>>>>>>>>>>>>> at some point in time".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Because it may not be a "snapshot" at all. There is no
>>>>>>>>>>>>>>> atomicity**. The reported owner thread may not own it any
>>>>>>>>>>>>>>> longer when the entry count is read, so straight away you
>>>>>>>>>>>>>>> may have the wrong entry count information. The set of
>>>>>>>>>>>>>>> threads trying to acquire the monitor, or wait on the
>>>>>>>>>>>>>>> monitor can change in unexpected ways. It would be
>>>>>>>>>>>>>>> possible for instance to report the same thread as being
>>>>>>>>>>>>>>> the owner, being blocked trying to enter the monitor, and
>>>>>>>>>>>>>>> being in the wait-set of the monitor - apparently all at
>>>>>>>>>>>>>>> the same time!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ** even if the owner is suspended we don't have complete
>>>>>>>>>>>>>>> atomicity because threads can join the set of threads
>>>>>>>>>>>>>>> trying to enter the monitor (unless they are all suspended).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consider the case when the monitor's owner is _not_
>>>>>>>>>>>>>> suspended:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - GetObjectMonitorUsage() uses a safepoint to gather the
>>>>>>>>>>>>>> info about
>>>>>>>>>>>>>> the object's monitor. Since we're at a safepoint, the
>>>>>>>>>>>>>> info that
>>>>>>>>>>>>>> we are gathering cannot change until we return from
>>>>>>>>>>>>>> the safepoint.
>>>>>>>>>>>>>> It is a snapshot and a valid one at that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consider the case when the monitor's owner is suspended:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - GetObjectMonitorUsage() will gather info about the
>>>>>>>>>>>>>> object's
>>>>>>>>>>>>>> monitor while _not_ at a safepoint. Assuming that no
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>> thread is suspended, then entry_count can change because
>>>>>>>>>>>>>> another thread can block on entry while we are gathering
>>>>>>>>>>>>>> info. waiter_count and waiters can change if a thread was
>>>>>>>>>>>>>> in a timed wait that has timed out and now that thread is
>>>>>>>>>>>>>> blocked on re-entry. I don't think that
>>>>>>>>>>>>>> notify_waiter_count
>>>>>>>>>>>>>> and notify_waiters can change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So in this case, the owner info and notify info is
>>>>>>>>>>>>>> stable,
>>>>>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consider the case when the monitor is not owned:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - GetObjectMonitorUsage() will start to gather info
>>>>>>>>>>>>>> about the
>>>>>>>>>>>>>> object's monitor while _not_ at a safepoint. If it
>>>>>>>>>>>>>> finds a
>>>>>>>>>>>>>> thread on the entry queue that is not suspended, then
>>>>>>>>>>>>>> it will
>>>>>>>>>>>>>> bail out and redo the info gather at a safepoint. I just
>>>>>>>>>>>>>> noticed that it doesn't check for suspension for the
>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>> on the waiters list so a timed Object.wait() call can
>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>> some confusion here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So in this case, the owner info is not stable if a thread
>>>>>>>>>>>>>> comes out of a timed wait and reenters the monitor. This
>>>>>>>>>>>>>> case is no different than if a "barger" thread comes in
>>>>>>>>>>>>>> after the NULL owner field is observed and enters the
>>>>>>>>>>>>>> monitor. We'll return that there is no owner, a list of
>>>>>>>>>>>>>> suspended pending entry thread and a list of waiting
>>>>>>>>>>>>>> threads. The reality is that the object's monitor is
>>>>>>>>>>>>>> owned by the "barger" that completely bypassed the entry
>>>>>>>>>>>>>> queue by virtue of seeing the NULL owner field at exactly
>>>>>>>>>>>>>> the right time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the owner field is only stable when we have an owner. If
>>>>>>>>>>>>>> that owner is not suspended, then the other fields are also
>>>>>>>>>>>>>> stable because we gathered the info at a safepoint. If the
>>>>>>>>>>>>>> owner is suspended, then the owner and notify info is stable,
>>>>>>>>>>>>>> but the entry_count and waiter info is not stable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we have a NULL owner field, then the info is only stable
>>>>>>>>>>>>>> if you have a non-suspended thread on the entry list. Ouch!
>>>>>>>>>>>>>> That's deterministic, but not without some work.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Okay so only when we gather the info at a safepoint is all
>>>>>>>>>>>>>> of it a valid and stable snapshot. Unfortunately, we only
>>>>>>>>>>>>>> do that at a safepoint when the owner thread is not suspended
>>>>>>>>>>>>>> or if owner == NULL and one of the entry threads is not
>>>>>>>>>>>>>> suspended. If either of those conditions is not true, then
>>>>>>>>>>>>>> the different pieces of info is unstable to varying degrees.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for this claim:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It would be possible for instance to report the same thread
>>>>>>>>>>>>>>> as being the owner, being blocked trying to enter the
>>>>>>>>>>>>>>> monitor,
>>>>>>>>>>>>>>> and being in the wait-set of the monitor - apparently all at
>>>>>>>>>>>>>>> the same time!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can't figure out a way to make that scenario work. If the
>>>>>>>>>>>>>> thread is seen as the owner and is not suspended, then we
>>>>>>>>>>>>>> gather info at a safepoint. If it is suspended, then it can't
>>>>>>>>>>>>>> then be seen as on the entry queue or on the wait queue since
>>>>>>>>>>>>>> it is suspended. If it is seen on the entry queue and is not
>>>>>>>>>>>>>> suspended, then we gather info at a safepoint. If it is
>>>>>>>>>>>>>> suspended on the entry queue, then it can't be seen on the
>>>>>>>>>>>>>> wait queue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the info instability of this API is bad, but it's not
>>>>>>>>>>>>>> quite that bad. :-) (That is a small mercy.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Handshaking is not going to make this situation any better
>>>>>>>>>>>>>> for GetObjectMonitorUsage(). If the monitor is owned and we
>>>>>>>>>>>>>> handshake with the owner, the stability or instability of
>>>>>>>>>>>>>> the other fields remains the same as when SuspendThread is
>>>>>>>>>>>>>> used. Handshaking with all threads won't make the data as
>>>>>>>>>>>>>> stable as when at a safepoint because individual threads
>>>>>>>>>>>>>> can resume execution after doing their handshake so there
>>>>>>>>>>>>>> will still be field instability.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Short version: GetObjectMonitorUsage() should only gather
>>>>>>>>>>>>>> data at a safepoint. Yes, I've changed my mind.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>> The advantages are:
>>>>>>>>>>>>> - the result is stable
>>>>>>>>>>>>> - the implementation can be simplified
>>>>>>>>>>>>>
>>>>>>>>>>>>> Performance impact is not very clear but should not be that
>>>>>>>>>>>>> big as suspending all the threads has some overhead too.
>>>>>>>>>>>>> I'm not sure if using handshakes can make performance better.
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, may I file it to JBS and fix it?
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Serguei
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The only way to make sure you don't have stale
>>>>>>>>>>>>>>>>>> information is
>>>>>>>>>>>>>>>>>> to use SuspendThread(), but it's not required. Perhaps
>>>>>>>>>>>>>>>>>> the doc
>>>>>>>>>>>>>>>>>> should have more clear about the possibility of
>>>>>>>>>>>>>>>>>> returning stale
>>>>>>>>>>>>>>>>>> info. That's a question for Robert F.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage says nothing about thread's
>>>>>>>>>>>>>>>>>>> being suspended so I can't see how this could be
>>>>>>>>>>>>>>>>>>> construed as an agent bug.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In your scenario above, you mention that the target
>>>>>>>>>>>>>>>>>> thread was
>>>>>>>>>>>>>>>>>> suspended, GetObjectMonitorUsage() was called while
>>>>>>>>>>>>>>>>>> the target
>>>>>>>>>>>>>>>>>> was suspended, and then the target thread was resumed
>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() checked for suspension, but
>>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() was able to gather the info.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> All three of those calls: SuspendThread(),
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage()
>>>>>>>>>>>>>>>>>> and ResumeThread() are made by the agent and the agent
>>>>>>>>>>>>>>>>>> should not
>>>>>>>>>>>>>>>>>> resume the target thread while also calling
>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage().
>>>>>>>>>>>>>>>>>> The calls were allowed to be made out of order so
>>>>>>>>>>>>>>>>>> agent bug.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Perhaps. I was thinking more generally about an
>>>>>>>>>>>>>>>>> independent resume, but you're right that doesn't
>>>>>>>>>>>>>>>>> really make a lot of sense. But when the spec says
>>>>>>>>>>>>>>>>> nothing about suspension ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And it is intentional that suspension is not required.
>>>>>>>>>>>>>>>> JVM/DI and JVM/PI
>>>>>>>>>>>>>>>> used to require suspension for these kinds of
>>>>>>>>>>>>>>>> get-the-info APIs. JVM/TI
>>>>>>>>>>>>>>>> intentionally was designed to not require suspension.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As I've said before, we could add a note about the data
>>>>>>>>>>>>>>>> being potentially
>>>>>>>>>>>>>>>> stale unless SuspendThread is used. I think of it like
>>>>>>>>>>>>>>>> stat(2). You can
>>>>>>>>>>>>>>>> fetch the file's info, but there's no guarantee that the
>>>>>>>>>>>>>>>> info is current
>>>>>>>>>>>>>>>> by the time you process what you got back. Is it too
>>>>>>>>>>>>>>>> much motherhood to
>>>>>>>>>>>>>>>> state that the data might be stale? I could go either
>>>>>>>>>>>>>>>> way...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Using a handshake on the owner thread will allow this
>>>>>>>>>>>>>>>>>>> to be fixed in the future without forcing/using any
>>>>>>>>>>>>>>>>>>> safepoints.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have to think about that which is why I'm avoiding
>>>>>>>>>>>>>>>>>> talking about
>>>>>>>>>>>>>>>>>> handshakes in this thread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Effectively the handshake can "suspend" the thread
>>>>>>>>>>>>>>>>> whilst the monitor is queried. In effect the operation
>>>>>>>>>>>>>>>>> would create a per-thread safepoint.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I "know" that, but I still need time to think about it
>>>>>>>>>>>>>>>> and probably
>>>>>>>>>>>>>>>> see the code to see if there are holes...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Semantically it is no different to the code actually
>>>>>>>>>>>>>>>>> suspending the owner thread, but it can't actually do
>>>>>>>>>>>>>>>>> that because suspends/resume don't nest.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yeah... we used have a suspend count back when we
>>>>>>>>>>>>>>>> tracked internal and
>>>>>>>>>>>>>>>> external suspends separately. That was a nightmare...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Dan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> JavaThread::is_ext_suspend_completed() is used to
>>>>>>>>>>>>>>>>>>>>>> check thread state, it returns `true` when the
>>>>>>>>>>>>>>>>>>>>>> thread is sleeping [3], or when it performs in
>>>>>>>>>>>>>>>>>>>>>> native [4].
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Sure but if the thread is actually suspended it
>>>>>>>>>>>>>>>>>>>>> can't continue execution in the VM or in Java code.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This appears to be an optimisation for the
>>>>>>>>>>>>>>>>>>>>>>> assumed common case where threads are first
>>>>>>>>>>>>>>>>>>>>>>> suspended and then the monitors are queried.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I agree with this, but I could find out it from
>>>>>>>>>>>>>>>>>>>>>> JVMTI spec - it just says "Get information about
>>>>>>>>>>>>>>>>>>>>>> the object's monitor."
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes it was just an implementation optimisation,
>>>>>>>>>>>>>>>>>>>>> nothing to do with the spec.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() might return incorrect
>>>>>>>>>>>>>>>>>>>>>> information in some case.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It starts with finding owner thread, but the owner
>>>>>>>>>>>>>>>>>>>>>> might be just before wakeup.
>>>>>>>>>>>>>>>>>>>>>> So I think it is more safe if
>>>>>>>>>>>>>>>>>>>>>> GetObjectMonitorUsage() is called at safepoint in
>>>>>>>>>>>>>>>>>>>>>> any case.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Except we're moving away from safepoints to using
>>>>>>>>>>>>>>>>>>>>> Handshakes, so this particular operation will
>>>>>>>>>>>>>>>>>>>>> require that the apparent owner is Handshake-safe
>>>>>>>>>>>>>>>>>>>>> (by entering a handshake with it) before querying
>>>>>>>>>>>>>>>>>>>>> the monitor. This would still be preferable I think
>>>>>>>>>>>>>>>>>>>>> to always using a safepoint for the entire operation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> However there is still a potential bug as the
>>>>>>>>>>>>>>>>>>>>>>> thread reported as the owner may not be suspended
>>>>>>>>>>>>>>>>>>>>>>> at the time we first see it, and may release the
>>>>>>>>>>>>>>>>>>>>>>> monitor, but then it may get suspended before we
>>>>>>>>>>>>>>>>>>>>>>> call:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> owning_thread =
>>>>>>>>>>>>>>>>>>>>>>> Threads::owning_thread_from_monitor_owner(tlh.list(),
>>>>>>>>>>>>>>>>>>>>>>> owner);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> and so we think it is still the monitor owner and
>>>>>>>>>>>>>>>>>>>>>>> proceed to query the monitor information in a
>>>>>>>>>>>>>>>>>>>>>>> racy way. This can't happen when suspension
>>>>>>>>>>>>>>>>>>>>>>> itself requires a safepoint as the current thread
>>>>>>>>>>>>>>>>>>>>>>> won't go to that safepoint during this code.
>>>>>>>>>>>>>>>>>>>>>>> However, if suspension is implemented via a
>>>>>>>>>>>>>>>>>>>>>>> direct handshake with the target thread then we
>>>>>>>>>>>>>>>>>>>>>>> have a problem.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>
More information about the serviceability-dev
mailing list