RFR(S) 8130448: thread dump improvements, comment, additions, new diagnostics inspired by 8077392
Daniel D. Daugherty
daniel.daugherty at oracle.com
Mon Jul 6 20:47:41 UTC 2015
On 7/6/15 12:43 AM, David Holmes wrote:
> Hi Dan,
>
> Metacomment: Could I suggest that in a RFR your subject line has the
> form <bugid>: <synopsis>, rather than <synopsis>(bugid) - especially
> when the synopsis actually starts with a bug id :) Thanks.
Yup that bug synopsis is/was a mess:
Changed the synopsis line to :
thread dump improvements, comment additions, new diagnostics
inspired by 8077392
Also updated the e-mail thread subject line to your suggested format.
Will try to remember to switch to that style in the future.
> Mostly okay but some major concerns around the can of worms that
> JavaThreadExitReleasesMonitors opens up.
This is Java monitors so "can of worms" is familiar! :-)
>
> On 4/07/2015 10:14 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> The hunt for the following bug:
>>
>> JDK-8077392 Stream fork/join tasks occasionally fail to complete
>>
>> and David C's work on the following bug:
>>
>> JDK-8069412 Locks need better debug-printing support
>>
>> have inspired additional thread dump improvements, comment additions
>> to some Java monitor code and some new diagnostic options.
>
> src/cpu/x86/vm/macroAssembler_x86.cpp
>
> 1821 // update _owner from from BasicLock to thread
>
> Typo: from from
Will fix.
> 2091 // the placeholder value. If that didn't work, then another
> 2092 // grabbed the lock so we're done (and exit was a success).
>
> another -> another thread ?
Yes, "another thread" is better. Will fix.
> 2201 // If that didn't work, then another grabbed the lock
> 2202 // so we're done (and exit was a success).
>
> another -> another thread ?
Yes, "another thread" is better. Will fix.
> ---
>
> src/share/vm/oops/markOop.cpp
>
> 40 st->print("NULL"); // this should not happen
>
> Shouldn't that be an assert or guarantee in that case? Or at a minimum
> print something like: "NULL (but you should never see this!)"
This is a relocated line from David C's change:
old 49: st->print("monitor=NULL");
I simply added a comment to it. I don't think I like the idea of the
thread dump code assert()'ing or guarantee()'ing because there might
be other useful info that would have been printed after the assert()
or guarantee() failure.
I do like the idea of changing the message to:
40: st->print("NULL (this should never be seen!)");
> 42 BasicLock * bl = (BasicLock *) mon->owner();
>
> What information has told us this is a BasicLock* not a Thread* ? But
> as all you do is print bl why not just p2i(mon->owner()) ?
Again, this is a relocated line from David C's change:
old 51: BasicLock * bl = (BasicLock *) mon->owner();
I'm going to guess that David C had some other code in there before
that needed/assumed that the field is a "BasicLock *", but that code
didn't make it into his final version.
I will drop the 'bl' local and update the p2i() call to use
mon->owner() directly.
> ---
>
> src/share/vm/runtime/globals.hpp
>
> JavaThreadExitReleasesMonitors has me perplexed. The JNI DetachThread
> case seems to be a concession to badly written code that fails to
> release locks on error paths.
I don't think I would call the JNIDetachReleasesMonitors case a concession
to badly written JNI code. 6282335 is very clear that this code was added
to meet the JNI spec. The JNI spec words might be considered a concession...
http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#DetachCurrentThread
> DetachCurrentThread
>
> jint DetachCurrentThread(JavaVM *vm);
>
> Detaches the current thread from a Java VM. All Java monitors
> held by this thread are released. All Java threads waiting for
> this thread to die are notified.
>
> The main thread can be detached from the VM.
> The equivalent for a Java thread would again involve lock acquisition
> via JNI. But I don't recall ever seeing a bug report or RFE related to
> this. What seems to have motivated this new flag is the potential for
> a broken VM to actually leave the monitor locked (e.g. because
> compiled code failed to do the unlock on all paths).
8077392 and bugs like it are exactly the reason for adding these
two options.
JavaThreadExitReleasesMonitors is added to provide a work around
until the underlying bug or bugs are fixed. This option is targeted
at folks that are chasing their own bugs and don't care about this
one.
GuaranteeOnMonitorMismatch is added to help hunt 8077392 specifically
and to sanity check for similar issues generally. I can see using the
two options in combination with existing test suites as a stress test
for the Java monitor subsystem.
> To address that I'd be inclined to simply have a guarantee in the Java
> thread exit path, rather than a flag to do the clean up and another
> flag to control a guarantee.
The check for an unexited Java monitor is not cheap.
The existing check under the JNIDetachReleasesMonitors flag is
positioned to only affect threads using JNI DetachCurrentThread()
and that cost is justified as part of meeting the JNI spec.
Adding the unexited Java monitor check to all exiting Java
threads should not be done lightly due to the cost. If the
VM is functioning correctly, then why impose this cost on
all exiting Java threads?
> I don't think two new command-line options (has this gone through CCC
> yet?) are justified.
No, this has not gone through CCC yet. My preference is to get
code review feedback (like this) before submitting a CCC.
I suppose I could have simply added more sub-options to the
existing Java monitor subsystem knobs and switches, but it
seemed to make sense to model JavaThreadExitReleasesMonitors
on the existing JNIDetachReleasesMonitors.
> Also note that allowing for JVM induced missing unlocks is something
> that can affect JNI attached threads as well as regular Java threads.
> I'll take this up in discussing thread.cpp below.
>
> That aside, the comment, which was modelled on the JNI DetachThread
> variant, does not really work in this case:
>
> 2801 "JavaThread exit() releases monitors owned by thread")
> \
>
> JavaThread is not a programming concept but an internal JVM
> abstraction. I would suggest something like:
>
> "Java thread termination releases monitors unexpectedly still owned by
> thread"
I'll switch to your description text if we end up keeping the
new options.
> ---
>
> src/share/vm/runtime/objectMonitor.cpp
> src/share/vm/runtime/objectMonitor.hpp
>
> No comments
>
> ---
>
> src/share/vm/runtime/synchronizer.cpp
>
> If we find the unexpected locked monitor it would be useful to see
> more Java level information about the Thread (in the non-detaching
> case) and the object that is locked.
I can probably steal some code from the thread dump side to
give us more info about the Object associated with the Java
monitor...
> ---
>
> src/share/vm/runtime/thread.cpp
>
> As noted previously JNI attached threads can also be affected by JVM
> induced locking bugs. So the comment block:
>
> 1804 // 6282335 JNI DetachCurrentThread spec states that all Java
> monitors
> 1805 // held by this thread must be released. A detach operation
> must only
> 1806 // get here if there are no Java frames on the stack.
> Therefore, any
> 1807 // owned monitors at this point MUST be JNI-acquired monitors
> which are
> 1808 // pre-inflated and in the monitor cache.
>
> is not strictly correct as it is presuming a correctly operating JVM.
I actually disagree with the last two sentences in that comment block,
but I didn't have a good rewrite in mind. I'll think about it more.
> Further the logic now applied:
>
> 1816 if ((exit_type == jni_detach && JNIDetachReleasesMonitors) ||
> 1817 JavaThreadExitReleasesMonitors) {
>
> is also not "correct" because if we have
> JavaThreadExitReleasesMonitors true while JNIDetachReleasesMonitors is
> false, we will in fact release any JNI acquired monitors because we
> can't make the distinction.
I don't think JNIDetachReleasesMonitors only has to deal with JNI-acquired
monitors and I don't think that JavaThreadExitReleasesMonitors only has to
deal with Java code acquired monitors. I see these options as applying to
"how you got to the Java thread exit path" and not applying to "what needs
to be cleaned up".
I see JNIDetachReleasesMonitors as a way of cleaning up unexited Java
monitors when we're executing the JNI DetachCurrentThread() code path.
I don't really care why the Java monitors are unexited... and I think
the spec wording is clear that all unexited Java monitors are cleaned
up... not just JNI-acquired monitors.
I see JavaThreadExitReleasesMonitors as a way of cleaning up unexited
Java monitors in any code path where we are exiting the Java thread. I see
JavaThreadExitReleasesMonitors as a superset of JNIDetachReleasesMonitors.
> Given we can't make the distinction I don't see any way for these
> separate flags to actually work well together.
I think they work well together when JavaThreadExitReleasesMonitors
is seen as a superset of JNIDetachReleasesMonitors. Of course, that
hinges on my opinion that JNIDetachReleasesMonitors applies to any
unexited Java monitor and not just JNI-acquired Java monitors.
> As I said above I'm more inclined to just add a guarantee to the
> non-detach case, than add two new flags. (Even though this means there
> won't be detection for such bugs when JNI attached threads encounter
> them - including the main thread :( ).
I really don't want to add this check to all Java thread exit
paths since it is expensive.
> As I said this has me perplexed. :(
Hopefully, I've explained it better now.
> ---
>
> src/share/vm/runtime/vframe.cpp
>
> 170 // It would be nice to distinguish between "waiting on" and
> 171 // "waited on". Currently, "waiting on" here with a
> 172 // java.lang.Thread.State == "WAITING (on object monitor)"
> 173 // earlier in the output means that the monitor has not yet
> been
> 174 // notified and java.lang.Thread.State == "BLOCKED (on object
> 175 // monitor)" earlier in the output means that the monitor has
> 176 // been notified and the thread is blocked on reentry.
>
> That's a very long sentence that can be quite difficult to parse and
> could probably be reworded to describe how the output should be
> interpreted eg:
>
> // If earlier in the output we reported java.lang.Thread.State ==
> // "WAITING (on object monitor)" and now we report "waited on", then
> // we are still waiting for notification [or timeout?]. Otherwise if
> // we earlier reported java.lang.Thread.State == "BLOCKED (on object
> // monitor)", then we are actually waiting to reacquire the monitor
> // lock. At this level we can't distinguish the two cases to report
> // "waited on" rather than "waiting on" for the second case.
I'll take a look at rewriting the comment and may just take your
version entirely.
> I wonder if we can prod deeper into VM state to try and make the
> distinction?
I played around with doing just that, but it seems like the M&M
code that fetches similar state probes the state stored in the
java.lang.Thread object. I didn't want to go there.
>
> 184 } else {
> 185 st->print_cr("\t- %s <no locals available>", wait_state);
>
> The concept of "locals" is confusing here. Isn't the "local" in this
> case just the object that we have waited upon? In which case we should
> say "no object reference available".
I got the phrase "locals" from the compiler code that was trying to
decode the object reference. I can change to "no object reference
available".
I definitely want to say _something_ here. The existing code is just silent.
> Though I would have thought/hoped there must be a way to find the
> object reference even in a compiled frame?
Yes, if you deopt... I didn't want to go there.
> 195 // Print out all monitors that we have locked, are trying to lock
> 196 // or are trying to relock after a wait().
>
> that makes it sound like there are three cases when really a "relock"
> is just a specific case of lock. I would suggest:
>
> // Print out all monitors that we have locked, or are trying to
> // lock, including re-locking when returning from a wait().
Making it more clear that the "re-locking" is coming from "wait()"
is an improvement. I'll fix this.
> 252 lock_state = "waiting to relock";
>
> I prefer "waiting to re-lock in wait()", if that isn't too long.
> Otherwise "relock" needs to be a concept people are familiar with.
I like "waiting to re-lock in wait()".
> 260 if (VerboseStackTrace && mark != NULL) {
> 261 st->print("\t- lockbits=");
> 262 mark->print_on(st);
>
> This really doesn't seem to warrant a new diagnostic flag, regardless
> of whether we think applying -XX:+Verbose is a good fit or not.
I suspect that David C didn't want to change the default output format
and neither do I. We have some tests that try to parse out specific
things from thread dumps to verify certain bug fixes and we don't want
to break those. I'm thinking of the "duplicate owner" bug fix that we
fixed in thread dumps, but I can't pull up the bug ID (yet)...
Thanks for the thorough review. I have to say that I wasn't expecting
much comment on this review at all. :-)
Dan
>
> Thanks,
> David
> -------
>
>> This work is being tracked by the following bug ID:
>>
>> JDK-8130448 8077392 inspired thread dump improvements, comment
>> additions, new diagnostics
>> https://bugs.openjdk.java.net/browse/JDK-8130448
>>
>> Here is the webrev URL:
>>
>> http://cr.openjdk.java.net/~dcubed/8130448-webrev/0-jdk9-hs-rt/
>>
>> Testing:
>>
>> - RBT vm.quick batches (in process)
>> - JPRT test jobs
>>
>> Thanks, in advance, for any comments, questions or suggestions.
>>
>> Gory details about the changes are below...
>>
>> Dan
>>
>> 8130448 summary of changes:
>>
>> src/cpu/x86/vm/macroAssembler_x86.cpp
>> - comment additions for the assembly code
>>
>> src/share/vm/oops/markOop.cpp
>> - has_monitor() has to be checked before is_locked() since
>> the has_monitor() bits are a superset of the is_locked() bits
>> - code style fixes
>>
>> src/share/vm/runtime/globals.hpp
>> - add VerboseStackTrace diagnostic option
>> - add GuaranteeOnMonitorMismatch diagnostic option
>> - add JavaThreadExitReleasesMonitors product option;
>> this is the Java equivalent of JNIDetachReleasesMonitors
>>
>> src/share/vm/runtime/objectMonitor.cpp
>> - delete unused ObjectMonitor::try_enter()
>> - fix assert wording
>>
>> src/share/vm/runtime/objectMonitor.hpp
>> - delete unused ObjectMonitor::try_enter()
>>
>> src/share/vm/runtime/synchronizer.cpp
>> - add GuaranteeOnMonitorMismatch support with some
>> diagnostic output
>>
>> src/share/vm/runtime/thread.cpp
>> - add JavaThreadExitReleasesMonitors support
>>
>> src/share/vm/runtime/vframe.cpp
>> - clarify existing comments
>> - add comments to clarify what "waiting on" means
>> - add line to show when we are "waiting on", but
>> there are no locals available (previously we were
>> "waiting on" silently)
>> - add support for "waiting to relock" which can happen
>> for any of the monitors and not just the top monitor
>> - switch mark->print_on() verbose output to new
>> VerboseStackTrace switch; thread dumps are not enabled
>> with a specific switch so the '-XX:+Verbose' model of
>> being a modifier for another option does not fit
More information about the hotspot-runtime-dev
mailing list