RFR(XS) 8036823: Stack trace sometimes shows 'locked' instead of 'waiting to lock'

Tue May 20 13:11:31 UTC 2014

Thanks Dan and David.

At this point, I would like withdraw this code review request.

-Zhengyu

On 5/20/2014 5:24 AM, David Holmes wrote:
>
>
> On 20/05/2014 3:28 PM, David Holmes wrote:
>> Hi Dan,
>>
>> On 20/05/2014 3:04 PM, Daniel D. Daugherty wrote:
>>> David,
>>>
>>> I'll have to politely disagree...
>>
>> That's why this code is so much fun ;-)
>>
>> You are of course correct. I misread this part of the inflate() code:
>>
>>            m->set_owner(mark->locker());
>>
>> to be setting the thread id as the owner, but in fact it is the
>> BasicLock address as you note.
>>
>> As I flagged with Zhengyu in email, and as per the July 2012 discussion
>> Kris Mok referred back to, I was wondering why this code did not do the
>> "obvious" thing - and indeed why no one suggested checking the owner
>> directly when I made the check_pending_monitor change back in 2006. Now
>> I know. :)
>>
>> However I don't think we need to be concerned about BiasedLocking
>> because that goes away as soon as we inflate - and we have inflated. So
>> _owner is either a Thread ID or a BasicLock address.
>>
>> But even so isn't the check:
>>
>> mark->monitor()->owner() != thread()
>>
>> perfectly valid because a BasicLock address will never match a thread
>> pointer?
>
> Ah but the thread being examined may in fact be the owner that has 
> this monitor stack-locked - which Dan also pointed out in his email. 
> So we do need to do additional checking for the BasicLock case.
>
> David
>
>> Thanks,
>> David
>>
>>> Here's the code in ObjectMonitor::exit() that handles the case
>>> when a thread finds itself exiting a monitor that was stack
>>> locked when it entered, but is now inflated:
>>>
>>> src/share/vm/runtime/objectMonitor.cpp:
>>>
>>>     912  void ATTR ObjectMonitor::exit(bool not_suspended, TRAPS) {
>>>     913     Thread * Self = THREAD ;
>>>     914     if (THREAD != _owner) {
>>>     915       if (THREAD->is_lock_owned((address) _owner)) {
>>>     916         // Transmute _owner from a BasicLock pointer to a 
>>> Thread
>>> address
>>> .
>>>     917         // We don't need to hold _mutex for this transition.
>>>     918         // Non-null to Non-null is safe as long as all readers
>>> can
>>>     919         // tolerate either flavor.
>>>     920         assert (_recursions == 0, "invariant") ;
>>>     921         _owner = THREAD ;
>>>     922         _recursions = 0 ;
>>>     923         OwnerIsThread = 1 ;
>>>     924       } else {
>>>     925         // NOTE: we need to handle unbalanced monitor 
>>> enter/exit
>>>     926         // in native code by throwing an exception.
>>>     927         // TODO: Throw an IllegalMonitorStateException ?
>>>     928         TEVENT (Exit - Throw IMSX) ;
>>>     929         assert(false, "Non-balanced monitor enter/exit!");
>>>     930         if (false) {
>>>     931 THROW(vmSymbols::java_lang_IllegalMonitorStateException());
>>>     932         }
>>>     933         return;
>>>     934       }
>>>     935     }
>>>
>>> so that _owner field when we did the stack dump still contained
>>> the BaseLock pointer instead of the JavaThread pointer...
>>>
>>> Dan
>>>
>>>
>>> On 5/19/14 8:09 PM, David Holmes wrote:
>>>> Dan,
>>>>
>>>> My understanding of the existing code and hence the modified code is
>>>> that we have already established that we are dealing with an inflated
>>>> monitor and hence the owner really must be the owning thread - not a
>>>> stack address for a BasicLock. Isn't that what mark->has_monitor()
>>>> establishes?
>>>>
>>>> Note that we can not have reached a safepoint inflation is in 
>>>> progress.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 20/05/2014 9:47 AM, Daniel D. Daugherty wrote:
>>>>> On 5/19/14 7:58 AM, Zhengyu Gu wrote:
>>>>>> This is a simple fix for incorrect lock state.
>>>>>>
>>>>>> The timing on setting thread's pending monitor can result stack 
>>>>>> trace
>>>>>> dump reporting incorrect lock state. The solution is to check the
>>>>>> monitor's owner, if the owner is other than the current thread, then
>>>>>> the monitor, is or is in process of being, set the pending 
>>>>>> monitor of
>>>>>> current thread.
>>>>>>
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8036823
>>>>>> Webrev: http://cr.openjdk.java.net/~zgu/8036823/webrev.00/
>>>>>
>>>>> src/share/vm/runtime/vframe.cpp
>>>>>      First off, I concur that the existing code has a problem.
>>>>>      However, I think the proposed solution has at least one
>>>>>      different problem.
>>>>>
>>>>>      Here's the original code:
>>>>>
>>>>>   183   // Print out all monitors that we have locked or are trying
>>>>> to lock
>>>>>   184   GrowableArray<MonitorInfo*>* mons = monitors();
>>>>>   185   if (!mons->is_empty()) {
>>>>>   186     bool found_first_monitor = false;
>>>>>   187     for (int index = (mons->length()-1); index >= 0; index--) {
>>>>>   188       MonitorInfo* monitor = mons->at(index);
>>>>>   189       if (monitor->eliminated() && is_compiled_frame()) { //
>>>>> Eliminated in compiled code
>>>>>   190         if (monitor->owner_is_scalar_replaced()) {
>>>>>   191           Klass* k =
>>>>> java_lang_Class::as_Klass(monitor->owner_klass());
>>>>>   192           st->print("\t- eliminated <owner is scalar 
>>>>> replaced> (a
>>>>> %s)", k->external_name());
>>>>>   193         } else {
>>>>>   194           oop obj = monitor->owner();
>>>>>   195           if (obj != NULL) {
>>>>>   196             print_locked_object_class_name(st, obj,
>>>>> "eliminated");
>>>>>   197           }
>>>>>   198         }
>>>>>   199         continue;
>>>>>   200       }
>>>>>   201       if (monitor->owner() != NULL) {
>>>>>   202
>>>>>   203         // First, assume we have the monitor locked. If we
>>>>> haven't
>>>>> found an
>>>>>   204         // owned monitor before and this is the first frame, 
>>>>> then
>>>>> we need to
>>>>>   205         // see if we have completed the lock or we are blocked
>>>>> trying to
>>>>>   206         // acquire it - we can only be blocked if the 
>>>>> monitor is
>>>>> inflated
>>>>>   207
>>>>>   208         const char *lock_state = "locked"; // assume we have 
>>>>> the
>>>>> monitor locked
>>>>>   209         if (!found_first_monitor && frame_count == 0) {
>>>>>   210           markOop mark = monitor->owner()->mark();
>>>>>   211           if (mark->has_monitor() &&
>>>>>   212               mark->monitor() ==
>>>>> thread()->current_pending_monitor()) {
>>>>>   213             lock_state = "waiting to lock";
>>>>>   214           }
>>>>>   215         }
>>>>>   216
>>>>>   217         found_first_monitor = true;
>>>>>   218         print_locked_object_class_name(st, monitor->owner(),
>>>>> lock_state);
>>>>>   219       }
>>>>>   220     }
>>>>>   221   }
>>>>>
>>>>>      The algorithm is pretty basic (no pun intended):
>>>>>
>>>>>      - for each monitor (BasicLock) in the Java frame
>>>>>        - if the monitor has been eliminated and this is
>>>>>          a compiled frame
>>>>>          - do compiler specific handling
>>>>>          - continue
>>>>>        - if the monitor has a owner
>>>>>          - set lock_state = "locked"
>>>>>          - if this is the first monitor and the top most frame
>>>>>            - set mark = associated Object's mark
>>>>>            - if the mark has an ObjectMonitor and
>>>>>                 the Java thread is pending on that monitor
>>>>>               - set lock_state = "waiting to lock"
>>>>>        - set found_first_monitor flag
>>>>>        - print info about current monitor including
>>>>>          the lock_state
>>>>>
>>>>>      A BasicLock is a Java monitor from the Java frame's point of
>>>>>      view. It might be implemented as a stack lock, a biased lock
>>>>>      or an inflated lock (ObjectMonitor).
>>>>>
>>>>>      The problem that I see with the algorithm is that it is
>>>>>      traversing a collection of BasicLocks and then asking
>>>>>      lock implementation specific questions in a an effort to
>>>>>      determine if the BasicLock is owned by the Java thread
>>>>>      whose frames we are traversing.
>>>>>
>>>>>      We already know that this code:
>>>>>
>>>>>          mark->monitor() == thread()->current_pending_monitor()
>>>>>
>>>>>      is specific to inflated monitors (ObjectMonitor) and it
>>>>>      has a flaw because the target thread can be stopped at
>>>>>      a safepoint before the _current_pending_monitor field
>>>>>      is set.
>>>>>
>>>>>      The proposed code:
>>>>>
>>>>>          mark->monitor()->owner() != thread()
>>>>>
>>>>>      compares the owner of the inflated monitor to the Java
>>>>>      thread whose frames we are traversing. The problem is
>>>>>      that the ObjectMonitor::_owner field is not always a
>>>>>      pointer to the JavaThread. Right after a stack lock
>>>>>      is inflated, the ObjectMonitor::_owner field is set
>>>>>      to the mark->locker() stack address; see:
>>>>>
>>>>>      src/share/vm/runtime/synchronizer.cpp:
>>>>>      line 1283: m->set_owner(mark->locker());
>>>>>
>>>>>      So the proposed code will report "waiting to lock" for a
>>>>>      Java monitor where the monitor is just transitioning
>>>>>      from a stack lock to an inflated lock. Of course, the
>>>>>      race is even tighter in this case. The target Java thread
>>>>>      (T1) has to have just acquired the uninflated monitor
>>>>>      and still be in the same frame in which that monitor was
>>>>>      acquired at the same time that another thread (T2) lost
>>>>>      the race to acquire the monitor. So T2 has to have just
>>>>>      inflated the monitor, T1 has still be in the same frame
>>>>>      in which it acquired the monitor and someone has to ask
>>>>>      for a thread dump. Pretty cool.
>>>>>
>>>>>      Another thought has occurred to me. T1 owns the monitor,
>>>>>      T2 is contending for the monitor and is starting to inflate
>>>>>      it, we request a thread dump. Is it possible to see the
>>>>>      BasicLock in T2's Java frame, see that the Object does not
>>>>>      yet have an inflated monitor (!mark->has_monitor()) and
>>>>>      report that T2 has the monitor "locked"?
>>>>>
>>>>>      This last scenario would again result in more than one
>>>>>      thread being reported as having the same monitor locked.
>>>>>
>>>>>      To sum up:
>>>>>
>>>>>      I think both the current code and the proposed code have
>>>>>      issues and we need to look at this a different way. We
>>>>>      probably something like:
>>>>>
>>>>>          MonitorInfo::is_locked_by_thread(JavaThread jt)
>>>>>
>>>>>      that encapsulates and isolates the logic needed to
>>>>>      determine if JavaThread 'jt' owns the BasicLock. It
>>>>>      will need to know how to check:
>>>>>
>>>>>      - stack lock ownership (see JavaThread::is_lock_owned())
>>>>>      - biased lock ownership (don't know how to do this one)
>>>>>      - inflated lock ownership (_owner field == jt)
>>>>>
>>>>>      The stack lock ownership check needs to be done before
>>>>>      the inflated lock ownership check. That's because the
>>>>>      _owner field in the inflated lock can be the
>>>>>      mark->locker() stack address when it is initially
>>>>>      inflated.
>>>>>
>>>>>      I couldn't figure out the biased locking ownership
>>>>>      check quickly so I don't know where it fits in the
>>>>>      checking ownership pecking order.
>>>>>
>>>>>
>>>>> Sorry this turned out to be not so simple... Welcome to
>>>>> the joy of locking...
>>>>>
>>>>> Dan
>>>>>
>>>>> P.S.
>>>>> Of course, now that we've discovered at least one
>>>>> misuse of current_pending_monitor(), we should probably
>>>>> audit the other uses of it also...
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Zhengyu
>>>>>
>>>