RFR(S) 8130448: thread dump improvements, comment, additions, new diagnostics inspired by 8077392

David Holmes david.holmes at oracle.com
Tue Jul 7 07:11:38 UTC 2015


Hi Dan,

Top posting as I'm lazy and running late :)

Okay ... I'm still uneasy about adding two additional flags for the 
"release monitors on exit" situation. I hear what you are saying about 
the cost of the monitor check - I had thought we kept easy access to the 
set of monitors owned by a thread but it seems not, so trawling the 
global monitor list is indeed an expensive operation. :( So okay but I 
don't like it :)

I also take your point about viewing the new flag as a superset of the 
JNI-detach case. That's fair enough, but of course that aspect needs to 
be documented.

I was left unclear about your position on the VerboseStackTrace. I agree 
the new output was likely put under a flag to avoid disturbing existing 
output formats. But I don't see that it warrants its own flag - to me 
using Verbose okay (though the fact it is a develop flag is limiting).

Thanks,
David

On 7/07/2015 6:47 AM, Daniel D. Daugherty wrote:
> On 7/6/15 12:43 AM, David Holmes wrote:
>> Hi Dan,
>>
>> Metacomment: Could I suggest that in a RFR your subject line has the
>> form <bugid>: <synopsis>, rather than <synopsis>(bugid) - especially
>> when the synopsis actually starts with a bug id :) Thanks.
>
> Yup that bug synopsis is/was a mess:
>
> Changed the synopsis line to :
>
>      thread dump improvements, comment additions, new diagnostics
> inspired by 8077392
>
> Also updated the e-mail thread subject line to your suggested format.
> Will try to remember to switch to that style in the future.
>
>
>> Mostly okay but some major concerns around the can of worms that
>> JavaThreadExitReleasesMonitors opens up.
>
> This is Java monitors so "can of worms" is familiar! :-)
>
>
>>
>> On 4/07/2015 10:14 AM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> The hunt for the following bug:
>>>
>>>      JDK-8077392 Stream fork/join tasks occasionally fail to complete
>>>
>>> and David C's work on the following bug:
>>>
>>>      JDK-8069412 Locks need better debug-printing support
>>>
>>> have inspired additional thread dump improvements, comment additions
>>> to some Java monitor code and some new diagnostic options.
>>
>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>
>> 1821        // update _owner from from BasicLock to thread
>>
>> Typo: from from
>
> Will fix.
>
>
>> 2091        // the placeholder value. If that didn't work, then another
>> 2092        // grabbed the lock so we're done (and exit was a success).
>>
>> another -> another thread ?
>
> Yes, "another thread" is better. Will fix.
>
>
>> 2201       // If that didn't work, then another grabbed the lock
>> 2202       // so we're done (and exit was a success).
>>
>> another -> another thread ?
>
> Yes, "another thread" is better. Will fix.
>
>
>> ---
>>
>> src/share/vm/oops/markOop.cpp
>>
>>   40       st->print("NULL");  // this should not happen
>>
>> Shouldn't that be an assert or guarantee in that case? Or at a minimum
>> print something like: "NULL (but you should never see this!)"
>
> This is a relocated line from David C's change:
>
> old 49:         st->print("monitor=NULL");
>
> I simply added a comment to it. I don't think I like the idea of the
> thread dump code assert()'ing or guarantee()'ing because there might
> be other useful info that would have been printed after the assert()
> or guarantee() failure.
>
> I do like the idea of changing the message to:
>
> 40:       st->print("NULL (this should never be seen!)");
>
>
>> 42       BasicLock * bl = (BasicLock *) mon->owner();
>>
>> What information has told us this is a BasicLock* not a Thread* ? But
>> as all you do is print bl why not just p2i(mon->owner()) ?
>
> Again, this is a relocated line from David C's change:
>
> old 51:         BasicLock * bl = (BasicLock *) mon->owner();
>
> I'm going to guess that David C had some other code in there before
> that needed/assumed that the field is a "BasicLock *", but that code
> didn't make it into his final version.
>
> I will drop the 'bl' local and update the p2i() call to use
> mon->owner() directly.
>
>
>> ---
>>
>> src/share/vm/runtime/globals.hpp
>>
>> JavaThreadExitReleasesMonitors has me perplexed. The JNI DetachThread
>> case seems to be a concession to badly written code that fails to
>> release locks on error paths.
>
> I don't think I would call the JNIDetachReleasesMonitors case a concession
> to badly written JNI code. 6282335 is very clear that this code was added
> to meet the JNI spec. The JNI spec words might be considered a
> concession...
>
> http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#DetachCurrentThread
>
>
>  > DetachCurrentThread
>  >
>  > jint DetachCurrentThread(JavaVM *vm);
>  >
>  > Detaches the current thread from a Java VM. All Java monitors
>  > held by this thread are released. All Java threads waiting for
>  > this thread to die are notified.
>  >
>  > The main thread can be detached from the VM.
>
>
>
>> The equivalent for a Java thread would again involve lock acquisition
>> via JNI. But I don't recall ever seeing a bug report or RFE related to
>> this. What seems to have motivated this new flag is the potential for
>> a broken VM to actually leave the monitor locked (e.g. because
>> compiled code failed to do the unlock on all paths).
>
> 8077392 and bugs like it are exactly the reason for adding these
> two options.
>
> JavaThreadExitReleasesMonitors is added to provide a work around
> until the underlying bug or bugs are fixed. This option is targeted
> at folks that are chasing their own bugs and don't care about this
> one.
>
> GuaranteeOnMonitorMismatch is added to help hunt 8077392 specifically
> and to sanity check for similar issues generally. I can see using the
> two options in combination with existing test suites as a stress test
> for the Java monitor subsystem.
>
>
>> To address that I'd be inclined to simply have a guarantee in the Java
>> thread exit path, rather than a flag to do the clean up and another
>> flag to control a guarantee.
>
> The check for an unexited Java monitor is not cheap.
>
> The existing check under the JNIDetachReleasesMonitors flag is
> positioned to only affect threads using JNI DetachCurrentThread()
> and that cost is justified as part of meeting the JNI spec.
>
> Adding the unexited Java monitor check to all exiting Java
> threads should not be done lightly due to the cost. If the
> VM is functioning correctly, then why impose this cost on
> all exiting Java threads?
>
>
>> I don't think two new command-line options (has this gone through CCC
>> yet?) are justified.
>
> No, this has not gone through CCC yet. My preference is to get
> code review feedback (like this) before submitting a CCC.
>
> I suppose I could have simply added more sub-options to the
> existing Java monitor subsystem knobs and switches, but it
> seemed to make sense to model JavaThreadExitReleasesMonitors
> on the existing JNIDetachReleasesMonitors.
>
>
>> Also note that allowing for JVM induced missing unlocks is something
>> that can affect JNI attached threads as well as regular Java threads.
>> I'll take this up in discussing thread.cpp below.
>>
>> That aside, the comment, which was modelled on the JNI DetachThread
>> variant, does not really work in this case:
>>
>> 2801           "JavaThread exit() releases monitors owned by thread")
>>          \
>>
>> JavaThread is not a programming concept but an internal JVM
>> abstraction. I would suggest something like:
>>
>> "Java thread termination releases monitors unexpectedly still owned by
>> thread"
>
> I'll switch to your description text if we end up keeping the
> new options.
>
>
>> ---
>>
>> src/share/vm/runtime/objectMonitor.cpp
>> src/share/vm/runtime/objectMonitor.hpp
>>
>> No comments
>>
>> ---
>>
>> src/share/vm/runtime/synchronizer.cpp
>>
>> If we find the unexpected locked monitor it would be useful to see
>> more Java level information about the Thread (in the non-detaching
>> case) and the object that is locked.
>
> I can probably steal some code from the thread dump side to
> give us more info about the Object associated with the Java
> monitor...
>
>
>> ---
>>
>>  src/share/vm/runtime/thread.cpp
>>
>> As noted previously JNI attached threads can also be affected by JVM
>> induced locking bugs. So the comment block:
>>
>> 1804   // 6282335 JNI DetachCurrentThread spec states that all Java
>> monitors
>> 1805   // held by this thread must be released.  A detach operation
>> must only
>> 1806   // get here if there are no Java frames on the stack.
>> Therefore, any
>> 1807   // owned monitors at this point MUST be JNI-acquired monitors
>> which are
>> 1808   // pre-inflated and in the monitor cache.
>>
>> is not strictly correct as it is presuming a correctly operating JVM.
>
> I actually disagree with the last two sentences in that comment block,
> but I didn't have a good rewrite in mind. I'll think about it more.
>
>
>
>> Further the logic now applied:
>>
>> 1816   if ((exit_type == jni_detach && JNIDetachReleasesMonitors) ||
>> 1817       JavaThreadExitReleasesMonitors) {
>>
>> is also not "correct" because if we have
>> JavaThreadExitReleasesMonitors true while JNIDetachReleasesMonitors is
>> false, we will in fact release any JNI acquired monitors because we
>> can't make the distinction.
>
> I don't think JNIDetachReleasesMonitors only has to deal with JNI-acquired
> monitors and I don't think that JavaThreadExitReleasesMonitors only has to
> deal with Java code acquired monitors. I see these options as applying to
> "how you got to the Java thread exit path" and not applying to "what needs
> to be cleaned up".
>
> I see JNIDetachReleasesMonitors as a way of cleaning up unexited Java
> monitors when we're executing the JNI DetachCurrentThread() code path.
> I don't really care why the Java monitors are unexited... and I think
> the spec wording is clear that all unexited Java monitors are cleaned
> up... not just JNI-acquired monitors.
>
> I see JavaThreadExitReleasesMonitors as a way of cleaning up unexited
> Java monitors in any code path where we are exiting the Java thread. I see
> JavaThreadExitReleasesMonitors as a superset of JNIDetachReleasesMonitors.
>
>
>> Given we can't make the distinction I don't see any way for these
>> separate flags to actually work well together.
>
> I think they work well together when JavaThreadExitReleasesMonitors
> is seen as a superset of JNIDetachReleasesMonitors. Of course, that
> hinges on my opinion that JNIDetachReleasesMonitors applies to any
> unexited Java monitor and not just JNI-acquired Java monitors.
>
>
>> As I said above I'm more inclined to just add a guarantee to the
>> non-detach case, than add two new flags. (Even though this means there
>> won't be detection for such bugs when JNI attached threads encounter
>> them - including the main thread :( ).
>
> I really don't want to add this check to all Java thread exit
> paths since it is expensive.
>
>
>> As I said this has me perplexed. :(
>
> Hopefully, I've explained it better now.
>
>
>> ---
>>
>> src/share/vm/runtime/vframe.cpp
>>
>> 170       // It would be nice to distinguish between "waiting on" and
>> 171       // "waited on". Currently, "waiting on" here with a
>> 172       // java.lang.Thread.State == "WAITING (on object monitor)"
>> 173       // earlier in the output means that the monitor has not yet
>> been
>> 174       // notified and java.lang.Thread.State == "BLOCKED (on object
>> 175       // monitor)" earlier in the output means that the monitor has
>> 176       // been notified and the thread is blocked on reentry.
>>
>> That's a very long sentence that can be quite difficult to parse and
>> could probably be reworded to describe how the output should be
>> interpreted eg:
>>
>> // If earlier in the output we reported java.lang.Thread.State ==
>> // "WAITING (on object monitor)" and now we report "waited on", then
>> // we are still waiting for notification [or timeout?]. Otherwise if
>> // we earlier reported java.lang.Thread.State == "BLOCKED (on object
>> // monitor)", then we are actually waiting to reacquire the monitor
>> // lock. At this level we can't distinguish the two cases to report
>> // "waited on" rather than "waiting on" for the second case.
>
> I'll take a look at rewriting the comment and may just take your
> version entirely.
>
>
>> I wonder if we can prod deeper into VM state to try and make the
>> distinction?
>
> I played around with doing just that, but it seems like the M&M
> code that fetches similar state probes the state stored in the
> java.lang.Thread object. I didn't want to go there.
>
>
>>
>> 184       } else {
>> 185         st->print_cr("\t- %s <no locals available>", wait_state);
>>
>> The concept of "locals" is confusing here. Isn't the "local" in this
>> case just the object that we have waited upon? In which case we should
>> say "no object reference available".
>
> I got the phrase "locals" from the compiler code that was trying to
> decode the object reference. I can change to "no object reference
> available".
> I definitely want to say _something_ here. The existing code is just
> silent.
>
>
>> Though I would have thought/hoped there must be a way to find the
>> object reference even in a compiled frame?
>
> Yes, if you deopt... I didn't want to go there.
>
>
>> 195   // Print out all monitors that we have locked, are trying to lock
>> 196   // or are trying to relock after a wait().
>>
>> that makes it sound like there are three cases when really a "relock"
>> is just a specific case of lock. I would suggest:
>>
>>    // Print out all monitors that we have locked, or are trying to
>>    // lock, including re-locking when returning from a wait().
>
> Making it more clear that the "re-locking" is coming from "wait()"
> is an improvement. I'll fix this.
>
>
>> 252             lock_state = "waiting to relock";
>>
>> I prefer "waiting to re-lock in wait()", if that isn't too long.
>> Otherwise "relock" needs to be a concept people are familiar with.
>
> I like "waiting to re-lock in wait()".
>
>
>> 260         if (VerboseStackTrace && mark != NULL) {
>> 261           st->print("\t- lockbits=");
>> 262           mark->print_on(st);
>>
>> This really doesn't seem to warrant a new diagnostic flag, regardless
>> of whether we think applying -XX:+Verbose is a good fit or not.
>
> I suspect that David C didn't want to change the default output format
> and neither do I. We have some tests that try to parse out specific
> things from thread dumps to verify certain bug fixes and we don't want
> to break those. I'm thinking of the "duplicate owner" bug fix that we
> fixed in thread dumps, but I can't pull up the bug ID (yet)...
>
> Thanks for the thorough review. I have to say that I wasn't expecting
> much comment on this review at all. :-)
>
>
> Dan
>
>
>>
>> Thanks,
>> David
>> -------
>>
>>> This work is being tracked by the following bug ID:
>>>
>>>      JDK-8130448 8077392 inspired thread dump improvements, comment
>>>                  additions, new diagnostics
>>>      https://bugs.openjdk.java.net/browse/JDK-8130448
>>>
>>> Here is the webrev URL:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8130448-webrev/0-jdk9-hs-rt/
>>>
>>> Testing:
>>>
>>> - RBT vm.quick batches (in process)
>>> - JPRT test jobs
>>>
>>> Thanks, in advance, for any comments, questions or suggestions.
>>>
>>> Gory details about the changes are below...
>>>
>>> Dan
>>>
>>> 8130448 summary of changes:
>>>
>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>    - comment additions for the assembly code
>>>
>>> src/share/vm/oops/markOop.cpp
>>>    - has_monitor() has to be checked before is_locked() since
>>>      the has_monitor() bits are a superset of the is_locked() bits
>>>    - code style fixes
>>>
>>> src/share/vm/runtime/globals.hpp
>>>    - add VerboseStackTrace diagnostic option
>>>    - add GuaranteeOnMonitorMismatch diagnostic option
>>>    - add JavaThreadExitReleasesMonitors product option;
>>>      this is the Java equivalent of JNIDetachReleasesMonitors
>>>
>>> src/share/vm/runtime/objectMonitor.cpp
>>>    - delete unused ObjectMonitor::try_enter()
>>>    - fix assert wording
>>>
>>> src/share/vm/runtime/objectMonitor.hpp
>>>    - delete unused ObjectMonitor::try_enter()
>>>
>>> src/share/vm/runtime/synchronizer.cpp
>>>    - add GuaranteeOnMonitorMismatch support with some
>>>      diagnostic output
>>>
>>> src/share/vm/runtime/thread.cpp
>>>    - add JavaThreadExitReleasesMonitors support
>>>
>>> src/share/vm/runtime/vframe.cpp
>>>    - clarify existing comments
>>>    - add comments to clarify what "waiting on" means
>>>    - add line to show when we are "waiting on", but
>>>      there are no locals available (previously we were
>>>      "waiting on" silently)
>>>    - add support for "waiting to relock" which can happen
>>>      for any of the monitors and not just the top monitor
>>>    - switch mark->print_on() verbose output to new
>>>      VerboseStackTrace switch; thread dumps are not enabled
>>>      with a specific switch so the '-XX:+Verbose' model of
>>>      being a modifier for another option does not fit
>


More information about the hotspot-runtime-dev mailing list