RFR(XS) 8036823: Stack trace sometimes shows 'locked' instead of 'waiting to lock'

Mon May 19 23:47:14 UTC 2014

On 5/19/14 7:58 AM, Zhengyu Gu wrote:
> This is a simple fix for incorrect lock state.
>
> The timing on setting thread's pending monitor can result stack trace 
> dump reporting incorrect lock state. The solution is to check the 
> monitor's owner, if the owner is other than the current thread, then 
> the monitor, is or is in process of being, set the pending monitor of 
> current thread.
>
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8036823
> Webrev: http://cr.openjdk.java.net/~zgu/8036823/webrev.00/

src/share/vm/runtime/vframe.cpp
     First off, I concur that the existing code has a problem.
     However, I think the proposed solution has at least one
     different problem.

     Here's the original code:

  183   // Print out all monitors that we have locked or are trying to lock
  184   GrowableArray<MonitorInfo*>* mons = monitors();
  185   if (!mons->is_empty()) {
  186     bool found_first_monitor = false;
  187     for (int index = (mons->length()-1); index >= 0; index--) {
  188       MonitorInfo* monitor = mons->at(index);
  189       if (monitor->eliminated() && is_compiled_frame()) { // 
Eliminated in compiled code
  190         if (monitor->owner_is_scalar_replaced()) {
  191           Klass* k = 
java_lang_Class::as_Klass(monitor->owner_klass());
  192           st->print("\t- eliminated <owner is scalar replaced> (a 
%s)", k->external_name());
  193         } else {
  194           oop obj = monitor->owner();
  195           if (obj != NULL) {
  196             print_locked_object_class_name(st, obj, "eliminated");
  197           }
  198         }
  199         continue;
  200       }
  201       if (monitor->owner() != NULL) {
  202
  203         // First, assume we have the monitor locked. If we haven't 
found an
  204         // owned monitor before and this is the first frame, then 
we need to
  205         // see if we have completed the lock or we are blocked 
trying to
  206         // acquire it - we can only be blocked if the monitor is 
inflated
  207
  208         const char *lock_state = "locked"; // assume we have the 
monitor locked
  209         if (!found_first_monitor && frame_count == 0) {
  210           markOop mark = monitor->owner()->mark();
  211           if (mark->has_monitor() &&
  212               mark->monitor() == 
thread()->current_pending_monitor()) {
  213             lock_state = "waiting to lock";
  214           }
  215         }
  216
  217         found_first_monitor = true;
  218         print_locked_object_class_name(st, monitor->owner(), 
lock_state);
  219       }
  220     }
  221   }

     The algorithm is pretty basic (no pun intended):

     - for each monitor (BasicLock) in the Java frame
       - if the monitor has been eliminated and this is
         a compiled frame
         - do compiler specific handling
         - continue
       - if the monitor has a owner
         - set lock_state = "locked"
         - if this is the first monitor and the top most frame
           - set mark = associated Object's mark
           - if the mark has an ObjectMonitor and
                the Java thread is pending on that monitor
              - set lock_state = "waiting to lock"
       - set found_first_monitor flag
       - print info about current monitor including
         the lock_state

     A BasicLock is a Java monitor from the Java frame's point of
     view. It might be implemented as a stack lock, a biased lock
     or an inflated lock (ObjectMonitor).

     The problem that I see with the algorithm is that it is
     traversing a collection of BasicLocks and then asking
     lock implementation specific questions in a an effort to
     determine if the BasicLock is owned by the Java thread
     whose frames we are traversing.

     We already know that this code:

         mark->monitor() == thread()->current_pending_monitor()

     is specific to inflated monitors (ObjectMonitor) and it
     has a flaw because the target thread can be stopped at
     a safepoint before the _current_pending_monitor field
     is set.

     The proposed code:

         mark->monitor()->owner() != thread()

     compares the owner of the inflated monitor to the Java
     thread whose frames we are traversing. The problem is
     that the ObjectMonitor::_owner field is not always a
     pointer to the JavaThread. Right after a stack lock
     is inflated, the ObjectMonitor::_owner field is set
     to the mark->locker() stack address; see:

     src/share/vm/runtime/synchronizer.cpp:
     line 1283: m->set_owner(mark->locker());

     So the proposed code will report "waiting to lock" for a
     Java monitor where the monitor is just transitioning
     from a stack lock to an inflated lock. Of course, the
     race is even tighter in this case. The target Java thread
     (T1) has to have just acquired the uninflated monitor
     and still be in the same frame in which that monitor was
     acquired at the same time that another thread (T2) lost
     the race to acquire the monitor. So T2 has to have just
     inflated the monitor, T1 has still be in the same frame
     in which it acquired the monitor and someone has to ask
     for a thread dump. Pretty cool.

     Another thought has occurred to me. T1 owns the monitor,
     T2 is contending for the monitor and is starting to inflate
     it, we request a thread dump. Is it possible to see the
     BasicLock in T2's Java frame, see that the Object does not
     yet have an inflated monitor (!mark->has_monitor()) and
     report that T2 has the monitor "locked"?

     This last scenario would again result in more than one
     thread being reported as having the same monitor locked.

     To sum up:

     I think both the current code and the proposed code have
     issues and we need to look at this a different way. We
     probably something like:

         MonitorInfo::is_locked_by_thread(JavaThread jt)

     that encapsulates and isolates the logic needed to
     determine if JavaThread 'jt' owns the BasicLock. It
     will need to know how to check:

     - stack lock ownership (see JavaThread::is_lock_owned())
     - biased lock ownership (don't know how to do this one)
     - inflated lock ownership (_owner field == jt)

     The stack lock ownership check needs to be done before
     the inflated lock ownership check. That's because the
     _owner field in the inflated lock can be the
     mark->locker() stack address when it is initially
     inflated.

     I couldn't figure out the biased locking ownership
     check quickly so I don't know where it fits in the
     checking ownership pecking order.

Sorry this turned out to be not so simple... Welcome to
the joy of locking...

Dan

P.S.
Of course, now that we've discovered at least one
misuse of current_pending_monitor(), we should probably
audit the other uses of it also...

>
> Thanks,
>
> -Zhengyu