RFR(XXS): 8077392 and 8131715 - fix for hang in Contended Locking "fast enter" bucket

Tue Mar 22 20:41:16 UTC 2016

Greetings,

I have fixes for the following two bugs:

     JDK-8077392 Stream fork/join tasks occasionally fail to complete
     https://bugs.openjdk.java.net/browse/JDK-8077392

     JDK-8131715 backout the fix for JDK-8079359 when JDK-8077392 is fixed
     https://bugs.openjdk.java.net/browse/JDK-8131715

Both fixes are very, very small and will be bundled together in the
same changeset for obvious reasons.

Here is the webrev URL:

http://cr.openjdk.java.net/~dcubed/8077392_8131715-webrev/0-jdk9-hs-rt/

While the fix for JDK-8077392 is a simple 1-liner, the explanation of
the race is much, much longer. I've attached the detailed evaluation
to this RFR; it is a copy of the same note that I added to
https://bugs.openjdk.java.net/browse/JDK-8077392, but the attached
copy has all the indentation white space intact. I don't know why
JBS likes to reformat the notes, but it does... :-(

Testing:

- the original failing test is running in a parallel stress config
   on my Solaris X64 server; just under 23 hours and just under
   3000 iterations without a failure in either instance; I'm planning
   to let the stress run go for at least 72 hours.
- RT/SVC nightly equivalent (in progress)

As always, comments, suggestions and/or questions are welcome.

Dan
-------------- next part --------------
This bug has finally been chased to ground. As expected, the bug
is a race condition that is only present in certain configurations.
This note is an attempt to describe the race and the conditions
under which the race can occur.

This race is due to a bug in ObjectSynchronizer::quick_enter()
which is a C2 function that was added by the "fast enter" bucket
for the Contended Locking project. See:

    JDK-8061553 Contended Locking fast enter bucket

so this bug is only present in configurations that use the Server
VM (C2); configurations that use the Client VM (C1) will not
observe this bug. Secondarily, Biased Locking must be enabled
for the race condition to manifest. By default Biased Locking is
enabled at 4 seconds so any hangs seen where the VM uptime
is less than 4 seconds are not likely to be due to this bug. Lastly,
there must be contention on the Java Monitor in question so there
must be two or more threads using the Java Monitor that has been
observed as "stranded".

Here's the conditions above in check list form with a few additional
conditions:

- Server Compiler/C2 is in use
- Biased Locking enabled (VM uptime >= 4 seconds)
- Java Monitor contention
- Without special options, this hang should only be observed in
  JDK9-B53 -> JDK9-B63; JDK-8061553 was promoted in
  JDK9-B53 and the fix to disable it (JDK-8079359) was
  promoted in JDK9-B64.
- So if your hang occurred before JDK9-B53 or in JDK9-B64
  or later, then this bug is not likely the cause.

If you think you have a hang that is caused by this bug, then
use the following diagnostic options:

    -XX:+UnlockExperimentalVMOptions
    -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1

The 'VerifyMatch=1' portion of the above diagnostic options will cause output
like the following when you've run into this bug:

INFO: unexpected locked object: - locked <0xfffffd7be95defe0> (a java.util.stream.Nodes$CollectorTask$OfDouble)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (synchronizer.cpp:2203), pid=19281, tid=95
# fatal error: exiting JavaThread=0x0000000004278800 unexpectedly owns ObjectMonitor=0x00000000016f2000
#
# JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_03_18_18_43-b00)

The diagnostic output above shows:

- the unexpected locked object (0xfffffd7be95defe0)
- the object's type (java.util.stream.Nodes$CollectorTask$OfDouble)
- the thread that owns the lock (0x0000000004278800),
- and the ObjectMonitor (0x00000000016f2000).

Please note that mis-behaving programs that use JNI locking can also run
into this diagnostic trap so I recommend careful use of these diagnostic
options.

Gory Code Details:

##
## JavaThread1 (JT1) - Part 1
##
The first JavaThread (JT1) in the race is executing this code
(when the -XX:-UseOptoBiasInlining is specified):

src/cpu/x86/vm/macroAssembler_x86.cpp:

int MacroAssembler::biased_locking_enter(Register lock_reg,
                                         Register obj_reg,
                                         Register swap_reg,
                                         Register tmp_reg,
                                         bool swap_reg_contains_mark,
                                         Label& done,
                                         Label* slow_case,
                                         BiasedLockingCounters* counters) {
<snip>

  movptr(tmp_reg, swap_reg);
  andptr(tmp_reg, markOopDesc::biased_lock_mask_in_place);
  cmpptr(tmp_reg, markOopDesc::biased_lock_pattern);
  jcc(Assembler::notEqual, cas_label);
  // The bias pattern is present in the object's header. Need to check
  // whether the bias owner and the epoch are both still current.

<JT1 has just made it past the above check so the object's header>
<has the bias pattern in it>

Note: When UseOptoBiasInlining is enabled (the default),
biased_locking_enter() is not used and the C2 ideal graph version of
the algorithm is used; for this note, -XX:-UseOptoBiasInlining is used
because it's easier to explain biased_locking_enter()'s assembly code
than the C2 ideal graph code. See PhaseMacroExpand::expand_lock_node()
for the C2 ideal graph code.


##
## JavaThread2 (JT2) - Part 1
##
The second JavaThread (JT2) is inflating the JavaMonitor associated
with this object so it is here (for example):

src/share/vm/runtime/synchronizer.cpp:

void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
<snip>

  lock->set_displaced_header(markOopDesc::unused_mark());
  ObjectSynchronizer::inflate(THREAD,
                              obj(),
                              inflate_cause_monitor_enter)->enter(THREAD);

Note: Don't be confused by the call to
"lock->set_displaced_header(markOopDesc::unused_mark())" above;
that's the BasicLock in JT2's context.

JT2 has finished the inflation part using the Biased Locking
"CASE: neutral" code:

src/share/vm/runtime/synchronizer.cpp

ObjectMonitor* ObjectSynchronizer::inflate(Thread * Self,
                                                     oop object,
                                                     const InflateCause cause) {
<snip>

    // CASE: neutral
    // TODO-FIXME: for entry we currently inflate and then try to CAS _owner.
    // If we know we're inflating for entry it's better to inflate by swinging a
    // pre-locked objectMonitor pointer into the object header.   A successful
    // CAS inflates the object *and* confers ownership to the inflating thread.
    // In the current implementation we use a 2-step mechanism where we CAS()
    // to inflate and then CAS() again to try to swing _owner from NULL to Self.
    // An inflateTry() method that we could call from fast_enter() and slow_enter()
    // would be useful.

    assert(mark->is_neutral(), "invariant");
    ObjectMonitor * m = omAlloc(Self);
    // prepare m for installation - set monitor to initial state
    m->Recycle();
    m->set_header(mark);
    m->set_owner(NULL);

and is now racing with JT1 for ownership of the Java Monitor in
ObjectMonitor::enter(). For our failure mode, JT2 loses the race
to JT1.


##
## JavaThread1 (JT1) - Part 2
##

src/cpu/x86/vm/macroAssembler_x86.cpp:

int MacroAssembler::biased_locking_enter(Register lock_reg,
<snip>

  andptr(tmp_reg, markOopDesc::biased_lock_mask_in_place);
  cmpptr(tmp_reg, markOopDesc::biased_lock_pattern);
  jcc(Assembler::notEqual, cas_label);
  // The bias pattern is present in the object's header. Need to check
  // whether the bias owner and the epoch are both still current.

<Once JT1 has made it past the 'cmpptr' above, the race with JT2's>
<inflation of the Java Monitor starts.>

  if (swap_reg_contains_mark) {
    null_check_offset = offset();
  }
  load_prototype_header(tmp_reg, obj_reg);
  orptr(tmp_reg, r15_thread);
  xorptr(tmp_reg, swap_reg);
  Register header_reg = tmp_reg;
  andptr(header_reg, ~((int) markOopDesc::age_mask_in_place));
  if (counters != NULL) {
    cond_inc32(Assembler::zero,
               ExternalAddress((address) counters->biased_lock_entry_count_addr()));
  }
  jcc(Assembler::equal, done);

<The above code block is for checking if the object is biased towards>
<the current thread. Since this is the first time this thread has locked>
<this object it cannot be biased towards this thread so this block does>
<not bail to the "done" label (with success).>

  // At this point we know that the header has the bias pattern and
  // that we are not the bias owner in the current epoch. We need to
  // figure out more details about the state of the header in order to
  // know what operations can be legally performed on the object's
  // header.

  // If the low three bits in the xor result aren't clear, that means
  // the prototype header is no longer biased and we have to revoke
  // the bias on this object.
  testptr(header_reg, markOopDesc::biased_lock_mask_in_place);
  jccb(Assembler::notZero, try_revoke_bias);

  // Biasing is still enabled for this data type. See whether the
  // epoch of the current bias is still valid, meaning that the epoch
  // bits of the mark word are equal to the epoch bits of the
  // prototype header. (Note that the prototype header's epoch bits
  // only change at a safepoint.) If not, attempt to rebias the object
  // toward the current thread. Note that we must be absolutely sure
  // that the current epoch is invalid in order to do this because
  // otherwise the manipulations it performs on the mark word are
  // illegal.
  testptr(header_reg, markOopDesc::epoch_mask_in_place);
  jccb(Assembler::notZero, try_rebias);

<The above two checks are for either revoking an existing bias>
<or rebiasing the object towards the current thread. For this>
<scenario, our object is biased neutral so those two blocks>
<do not matter.>

  // The epoch of the current bias is still valid but we know nothing
  // about the owner; it might be set or it might be clear. Try to
  // acquire the bias of the object using an atomic operation. If this
  // fails we will go in to the runtime to revoke the object's bias.
  // Note that we first construct the presumed unbiased header so we
  // don't accidentally blow away another thread's valid bias.
  NOT_LP64( movptr(swap_reg, saved_mark_addr); )
  andptr(swap_reg,
         markOopDesc::biased_lock_mask_in_place | markOopDesc::age_mask_in_place | markOopDesc::epoch_mask_in_place);
#ifdef _LP64
  movptr(tmp_reg, swap_reg);
  orptr(tmp_reg, r15_thread);
#else
  get_thread(tmp_reg);
  orptr(tmp_reg, swap_reg);
#endif
  if (os::is_MP()) {
    lock();
  }
  cmpxchgptr(tmp_reg, mark_addr); // compare tmp_reg and swap_reg
  // If the biasing toward our thread failed, this means that
  // another thread succeeded in biasing it toward itself and we
  // need to revoke that bias. The revocation will occur in the
  // interpreter runtime in the slow case.
  if (counters != NULL) {
    cond_inc32(Assembler::zero,
               ExternalAddress((address) counters->anonymously_biased_lock_entry_count_addr()));
  }
  if (slow_case != NULL) {
    jcc(Assembler::notZero, *slow_case);
  }
  jmp(done);

<The above block is where the inflation race with JT2 comes into play.>
<We (JT1) are trying to bias the object towards ourself with the>
<above cmpxchgptr(), but JT2 has updated the object's header>
<to refer to an ObjectMonitor, so the tmp_reg value that we are>
<comparing against the object's header will no longer match.>
<We fail the cmpxchgptr() and jump to the "done" label in the>
<caller of biased_locking_enter() which is fast_lock().>

<fast_lock() returns to its caller with the ZFlag set to zero which>
<results in a call to complete_monitor_enter_C() which results>
<in a call to ObjectSynchronizer::quick_enter() where we run>
<into our bug.>

src/share/vm/runtime/synchronizer.cpp:

bool ObjectSynchronizer::quick_enter(oop obj, Thread * Self,
                                     BasicLock * Lock) {
<snip>

<JT1 has reached quick_enter() as a last ditch attempt to>
<enter the JavaMonitor without doing expensive stuff like>
<thread state transitions...>

  if (mark->has_monitor()) {
<snip>

<This check is true because JT2 inflated the Java Monitor>
<while JT1 was in biased_locking_enter()...>

    if (owner == Self) {
<snip>

<This check is NOT true because the Java Monitor was>
<inflated by JT2 and the owner field is NOT set to another>
<JavaThread pointer in that case. Inflation can happen due>
<to operations other that Java Monitor enter so its not>
<appropriate for inflation to set the _owner field.>

    if (owner == NULL &&
        Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) {

<Our thread (JT1) sees the owner field as NULL and we are able>
<to make ourself the owner via the cmpxchg_ptr() call. This is all>
<good, but the code path that we are on didn't change the BasicLock's>
<_displaced_header field value to something other than NULL. This>
<means that our JavaThread's FIRST TIME entry of this Java Monitor>
<will be seen as a recursive enter which means when we exit the Java>
<Monitor we will treat the exit as a no-op. Oops!>

      assert(m->_recursions == 0, "invariant");
      assert(m->_owner == Self, "invariant");
      return true;
    }

To recap:

- JT1 calls C2 fast_lock() which calls biased_locking_enter()
- JT2 inflates the Java Monitor
- JT1 bails biased_locking_enter() after making it past the
  first check which results in an early bail from fast_lock()
- JT1 makes a slow path call to complete_monitor_enter_C()
- JT1 makes a last ditch call to quick_enter() before doing
  the real slow path work

The early bail code path in biased_locking_enter() and
fast_lock() results in the BasicLock's _displaced_header field
value remaining NULL which marks this entry as recursive.

If JT2's inflation had happened a little earlier, then JT1 would
have taken the first bail point in biased_locking_enter() which
would have resulted in a regular fast_lock() code path which
does initialize the BasicLock's _displaced_header field.

The fix for this problem is a 1-liner:

--- a/src/share/vm/runtime/synchronizer.cpp
+++ b/src/share/vm/runtime/synchronizer.cpp
@@ -229,6 +229,9 @@ bool ObjectSynchronizer::quick_enter(oop

     if (owner == NULL &&
         Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) {
+      // Make the displaced header non-NULL so this BasicLock is
+      // not seen as recursive.
+      Lock->set_displaced_header(markOopDesc::unused_mark());
       assert(m->_recursions == 0, "invariant");
       assert(m->_owner == Self, "invariant");
       return true;

So when quick_enter() succeeds at its last ditch optimization,
it needs to mark the BasicLock's _displaced_header field with
a non-zero value (like the other lock grabbing code paths).