RFR(S) 8077392 inspired thread dump improvements, comment, additions, new diagnostics (8130448)

Mon Jul 6 06:43:35 UTC 2015

Hi Dan,

Metacomment: Could I suggest that in a RFR your subject line has the 
form <bugid>: <synopsis>, rather than <synopsis>(bugid) - especially 
when the synopsis actually starts with a bug id :) Thanks.

Mostly okay but some major concerns around the can of worms that 
JavaThreadExitReleasesMonitors opens up.

On 4/07/2015 10:14 AM, Daniel D. Daugherty wrote:
> Greetings,
>
> The hunt for the following bug:
>
>      JDK-8077392 Stream fork/join tasks occasionally fail to complete
>
> and David C's work on the following bug:
>
>      JDK-8069412 Locks need better debug-printing support
>
> have inspired additional thread dump improvements, comment additions
> to some Java monitor code and some new diagnostic options.

src/cpu/x86/vm/macroAssembler_x86.cpp

1821        // update _owner from from BasicLock to thread

Typo: from from

2091        // the placeholder value. If that didn't work, then another
2092        // grabbed the lock so we're done (and exit was a success).

another -> another thread ?

2201       // If that didn't work, then another grabbed the lock
2202       // so we're done (and exit was a success).

another -> another thread ?

---

src/share/vm/oops/markOop.cpp

   40       st->print("NULL");  // this should not happen

Shouldn't that be an assert or guarantee in that case? Or at a minimum 
print something like: "NULL (but you should never see this!)"

  42       BasicLock * bl = (BasicLock *) mon->owner();

What information has told us this is a BasicLock* not a Thread* ? But as 
all you do is print bl why not just p2i(mon->owner()) ?

---

src/share/vm/runtime/globals.hpp

JavaThreadExitReleasesMonitors has me perplexed. The JNI DetachThread 
case seems to be a concession to badly written code that fails to 
release locks on error paths. The equivalent for a Java thread would 
again involve lock acquisition via JNI. But I don't recall ever seeing a 
bug report or RFE related to this. What seems to have motivated this new 
flag is the potential for a broken VM to actually leave the monitor 
locked (e.g. because compiled code failed to do the unlock on all 
paths). To address that I'd be inclined to simply have a guarantee in 
the Java thread exit path, rather than a flag to do the clean up and 
another flag to control a guarantee. I don't think two new command-line 
options (has this gone through CCC yet?) are justified.

Also note that allowing for JVM induced missing unlocks is something 
that can affect JNI attached threads as well as regular Java threads. 
I'll take this up in discussing thread.cpp below.

That aside, the comment, which was modelled on the JNI DetachThread 
variant, does not really work in this case:

2801           "JavaThread exit() releases monitors owned by thread") 
          \

JavaThread is not a programming concept but an internal JVM abstraction. 
I would suggest something like:

"Java thread termination releases monitors unexpectedly still owned by 
thread"

---

src/share/vm/runtime/objectMonitor.cpp
src/share/vm/runtime/objectMonitor.hpp

No comments

---

src/share/vm/runtime/synchronizer.cpp

If we find the unexpected locked monitor it would be useful to see more 
Java level information about the Thread (in the non-detaching case) and 
the object that is locked.

---

  src/share/vm/runtime/thread.cpp

As noted previously JNI attached threads can also be affected by JVM 
induced locking bugs. So the comment block:

1804   // 6282335 JNI DetachCurrentThread spec states that all Java monitors
1805   // held by this thread must be released.  A detach operation must 
only
1806   // get here if there are no Java frames on the stack.  Therefore, any
1807   // owned monitors at this point MUST be JNI-acquired monitors 
which are
1808   // pre-inflated and in the monitor cache.

is not strictly correct as it is presuming a correctly operating JVM. 
Further the logic now applied:

1816   if ((exit_type == jni_detach && JNIDetachReleasesMonitors) ||
1817       JavaThreadExitReleasesMonitors) {

is also not "correct" because if we have JavaThreadExitReleasesMonitors 
true while JNIDetachReleasesMonitors is false, we will in fact release 
any JNI acquired monitors because we can't make the distinction. Given 
we can't make the distinction I don't see any way for these separate 
flags to actually work well together. As I said above I'm more inclined 
to just add a guarantee to the non-detach case, than add two new flags. 
(Even though this means there won't be detection for such bugs when JNI 
attached threads encounter them - including the main thread :( ).

As I said this has me perplexed. :(

---

src/share/vm/runtime/vframe.cpp

170       // It would be nice to distinguish between "waiting on" and
171       // "waited on". Currently, "waiting on" here with a
172       // java.lang.Thread.State == "WAITING (on object monitor)"
173       // earlier in the output means that the monitor has not yet been
174       // notified and java.lang.Thread.State == "BLOCKED (on object
175       // monitor)" earlier in the output means that the monitor has
176       // been notified and the thread is blocked on reentry.

That's a very long sentence that can be quite difficult to parse and 
could probably be reworded to describe how the output should be 
interpreted eg:

// If earlier in the output we reported java.lang.Thread.State ==
// "WAITING (on object monitor)" and now we report "waited on", then
// we are still waiting for notification [or timeout?]. Otherwise if
// we earlier reported java.lang.Thread.State == "BLOCKED (on object
// monitor)", then we are actually waiting to reacquire the monitor
// lock. At this level we can't distinguish the two cases to report
// "waited on" rather than "waiting on" for the second case.

I wonder if we can prod deeper into VM state to try and make the 
distinction?

184       } else {
185         st->print_cr("\t- %s <no locals available>", wait_state);

The concept of "locals" is confusing here. Isn't the "local" in this 
case just the object that we have waited upon? In which case we should 
say "no object reference available". Though I would have thought/hoped 
there must be a way to find the object reference even in a compiled frame?

195   // Print out all monitors that we have locked, are trying to lock
196   // or are trying to relock after a wait().

that makes it sound like there are three cases when really a "relock" is 
just a specific case of lock. I would suggest:

    // Print out all monitors that we have locked, or are trying to
    // lock, including re-locking when returning from a wait().

252             lock_state = "waiting to relock";

I prefer "waiting to re-lock in wait()", if that isn't too long. 
Otherwise "relock" needs to be a concept people are familiar with.

260         if (VerboseStackTrace && mark != NULL) {
261           st->print("\t- lockbits=");
262           mark->print_on(st);

This really doesn't seem to warrant a new diagnostic flag, regardless of 
whether we think applying -XX:+Verbose is a good fit or not.

Thanks,
David
-------

> This work is being tracked by the following bug ID:
>
>      JDK-8130448 8077392 inspired thread dump improvements, comment
>                  additions, new diagnostics
>      https://bugs.openjdk.java.net/browse/JDK-8130448
>
> Here is the webrev URL:
>
> http://cr.openjdk.java.net/~dcubed/8130448-webrev/0-jdk9-hs-rt/
>
> Testing:
>
> - RBT vm.quick batches (in process)
> - JPRT test jobs
>
> Thanks, in advance, for any comments, questions or suggestions.
>
> Gory details about the changes are below...
>
> Dan
>
> 8130448 summary of changes:
>
> src/cpu/x86/vm/macroAssembler_x86.cpp
>    - comment additions for the assembly code
>
> src/share/vm/oops/markOop.cpp
>    - has_monitor() has to be checked before is_locked() since
>      the has_monitor() bits are a superset of the is_locked() bits
>    - code style fixes
>
> src/share/vm/runtime/globals.hpp
>    - add VerboseStackTrace diagnostic option
>    - add GuaranteeOnMonitorMismatch diagnostic option
>    - add JavaThreadExitReleasesMonitors product option;
>      this is the Java equivalent of JNIDetachReleasesMonitors
>
> src/share/vm/runtime/objectMonitor.cpp
>    - delete unused ObjectMonitor::try_enter()
>    - fix assert wording
>
> src/share/vm/runtime/objectMonitor.hpp
>    - delete unused ObjectMonitor::try_enter()
>
> src/share/vm/runtime/synchronizer.cpp
>    - add GuaranteeOnMonitorMismatch support with some
>      diagnostic output
>
> src/share/vm/runtime/thread.cpp
>    - add JavaThreadExitReleasesMonitors support
>
> src/share/vm/runtime/vframe.cpp
>    - clarify existing comments
>    - add comments to clarify what "waiting on" means
>    - add line to show when we are "waiting on", but
>      there are no locals available (previously we were
>      "waiting on" silently)
>    - add support for "waiting to relock" which can happen
>      for any of the monitors and not just the top monitor
>    - switch mark->print_on() verbose output to new
>      VerboseStackTrace switch; thread dumps are not enabled
>      with a specific switch so the '-XX:+Verbose' model of
>      being a modifier for another option does not fit