RFR(S) 8130448: thread dump improvements, comment, additions, new diagnostics inspired by 8077392
Daniel D. Daugherty
daniel.daugherty at oracle.com
Thu Jul 9 13:00:25 UTC 2015
On 7/8/15 11:13 PM, David Holmes wrote:
> On 8/07/2015 3:28 AM, Daniel D. Daugherty wrote:
>> On 7/7/15 1:11 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Top posting as I'm lazy and running late :)
>>
>> No problem with the top posting. It makes your remaining
>> concerns earlier to see and address.
>>
>>
>>> Okay ... I'm still uneasy about adding two additional flags for the
>>> "release monitors on exit" situation. I hear what you are saying about
>>> the cost of the monitor check - I had thought we kept easy access to
>>> the set of monitors owned by a thread but it seems not, so trawling
>>> the global monitor list is indeed an expensive operation. :( So okay
>>> but I don't like it :)
>>
>> Since JavaThreadExitReleasesMonitors and GuaranteeOnMonitorMismatch
>> are targeted at finding/working around bugs in the Java Monitor
>> implementation, I'm going to take a look at moving the flag support
>> down into Java Monitor subsystem's implementation specific options.
>> That will get the options out of the global space and they won't
>> have to be counted against the "cap and trade" limit :-)
>>
>>
>>> I also take your point about viewing the new flag as a superset of the
>>> JNI-detach case. That's fair enough, but of course that aspect needs
>>> to be documented.
>>
>> I'm starting to wonder why we have the 'JNIDetachReleasesMonitors'
>> option at all. Is there a good reason to be able to turn off this
>> piece of code?
>
> My recollection is that hotspot did not implement that part of the
> spec. Simply switching the behaviour with no way to restore the
> original behaviour could break existing apps. Hence a flag, on by
> default.
That's very likely the case. I'll poke around in my historical
archive and see what shakes out.
> Of course by now apps should be spec compliant themselves so this
> seems a flag that should itself be deprecated in 9 and targetted for
> removal in 10. (That will help with the 'conservation of flags'
> problem :) ).
Sounds like a good RFE.
Dan
>
>>
>>
>>> I was left unclear about your position on the VerboseStackTrace. I
>>> agree the new output was likely put under a flag to avoid disturbing
>>> existing output formats. But I don't see that it warrants its own flag
>>> - to me using Verbose okay (though the fact it is a develop flag is
>>> limiting).
>>
>> Thanks for reminding me that '-XX:+Verbose' is a develop flag.
>> That's another reason for not using that flag. Combined with
>> the fact that '-XX:+Verbose' is considered a modifier to other
>> flags and thread dumps aren't enabled/triggered by a flag means
>> (to me anyway) that '-XX:+Verbose' is still not a good match.
>>
>> However, I have a vague memory that the Java Monitor subsystem
>> has its own "verbose" flag. I'll take a look at switching to
>> that...
>
> Okay
>
>> Will be working on the next round...
>
> Thanks,
> David
>
>> Dan
>>
>>
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 7/07/2015 6:47 AM, Daniel D. Daugherty wrote:
>>>> On 7/6/15 12:43 AM, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>> Metacomment: Could I suggest that in a RFR your subject line has the
>>>>> form <bugid>: <synopsis>, rather than <synopsis>(bugid) - especially
>>>>> when the synopsis actually starts with a bug id :) Thanks.
>>>>
>>>> Yup that bug synopsis is/was a mess:
>>>>
>>>> Changed the synopsis line to :
>>>>
>>>> thread dump improvements, comment additions, new diagnostics
>>>> inspired by 8077392
>>>>
>>>> Also updated the e-mail thread subject line to your suggested format.
>>>> Will try to remember to switch to that style in the future.
>>>>
>>>>
>>>>> Mostly okay but some major concerns around the can of worms that
>>>>> JavaThreadExitReleasesMonitors opens up.
>>>>
>>>> This is Java monitors so "can of worms" is familiar! :-)
>>>>
>>>>
>>>>>
>>>>> On 4/07/2015 10:14 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> The hunt for the following bug:
>>>>>>
>>>>>> JDK-8077392 Stream fork/join tasks occasionally fail to
>>>>>> complete
>>>>>>
>>>>>> and David C's work on the following bug:
>>>>>>
>>>>>> JDK-8069412 Locks need better debug-printing support
>>>>>>
>>>>>> have inspired additional thread dump improvements, comment additions
>>>>>> to some Java monitor code and some new diagnostic options.
>>>>>
>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>>>
>>>>> 1821 // update _owner from from BasicLock to thread
>>>>>
>>>>> Typo: from from
>>>>
>>>> Will fix.
>>>>
>>>>
>>>>> 2091 // the placeholder value. If that didn't work, then
>>>>> another
>>>>> 2092 // grabbed the lock so we're done (and exit was a
>>>>> success).
>>>>>
>>>>> another -> another thread ?
>>>>
>>>> Yes, "another thread" is better. Will fix.
>>>>
>>>>
>>>>> 2201 // If that didn't work, then another grabbed the lock
>>>>> 2202 // so we're done (and exit was a success).
>>>>>
>>>>> another -> another thread ?
>>>>
>>>> Yes, "another thread" is better. Will fix.
>>>>
>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/oops/markOop.cpp
>>>>>
>>>>> 40 st->print("NULL"); // this should not happen
>>>>>
>>>>> Shouldn't that be an assert or guarantee in that case? Or at a
>>>>> minimum
>>>>> print something like: "NULL (but you should never see this!)"
>>>>
>>>> This is a relocated line from David C's change:
>>>>
>>>> old 49: st->print("monitor=NULL");
>>>>
>>>> I simply added a comment to it. I don't think I like the idea of the
>>>> thread dump code assert()'ing or guarantee()'ing because there might
>>>> be other useful info that would have been printed after the assert()
>>>> or guarantee() failure.
>>>>
>>>> I do like the idea of changing the message to:
>>>>
>>>> 40: st->print("NULL (this should never be seen!)");
>>>>
>>>>
>>>>> 42 BasicLock * bl = (BasicLock *) mon->owner();
>>>>>
>>>>> What information has told us this is a BasicLock* not a Thread* ? But
>>>>> as all you do is print bl why not just p2i(mon->owner()) ?
>>>>
>>>> Again, this is a relocated line from David C's change:
>>>>
>>>> old 51: BasicLock * bl = (BasicLock *) mon->owner();
>>>>
>>>> I'm going to guess that David C had some other code in there before
>>>> that needed/assumed that the field is a "BasicLock *", but that code
>>>> didn't make it into his final version.
>>>>
>>>> I will drop the 'bl' local and update the p2i() call to use
>>>> mon->owner() directly.
>>>>
>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/globals.hpp
>>>>>
>>>>> JavaThreadExitReleasesMonitors has me perplexed. The JNI DetachThread
>>>>> case seems to be a concession to badly written code that fails to
>>>>> release locks on error paths.
>>>>
>>>> I don't think I would call the JNIDetachReleasesMonitors case a
>>>> concession
>>>> to badly written JNI code. 6282335 is very clear that this code was
>>>> added
>>>> to meet the JNI spec. The JNI spec words might be considered a
>>>> concession...
>>>>
>>>> http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#DetachCurrentThread
>>>>
>>>>
>>>>
>>>>
>>>> > DetachCurrentThread
>>>> >
>>>> > jint DetachCurrentThread(JavaVM *vm);
>>>> >
>>>> > Detaches the current thread from a Java VM. All Java monitors
>>>> > held by this thread are released. All Java threads waiting for
>>>> > this thread to die are notified.
>>>> >
>>>> > The main thread can be detached from the VM.
>>>>
>>>>
>>>>
>>>>> The equivalent for a Java thread would again involve lock acquisition
>>>>> via JNI. But I don't recall ever seeing a bug report or RFE
>>>>> related to
>>>>> this. What seems to have motivated this new flag is the potential for
>>>>> a broken VM to actually leave the monitor locked (e.g. because
>>>>> compiled code failed to do the unlock on all paths).
>>>>
>>>> 8077392 and bugs like it are exactly the reason for adding these
>>>> two options.
>>>>
>>>> JavaThreadExitReleasesMonitors is added to provide a work around
>>>> until the underlying bug or bugs are fixed. This option is targeted
>>>> at folks that are chasing their own bugs and don't care about this
>>>> one.
>>>>
>>>> GuaranteeOnMonitorMismatch is added to help hunt 8077392 specifically
>>>> and to sanity check for similar issues generally. I can see using the
>>>> two options in combination with existing test suites as a stress test
>>>> for the Java monitor subsystem.
>>>>
>>>>
>>>>> To address that I'd be inclined to simply have a guarantee in the
>>>>> Java
>>>>> thread exit path, rather than a flag to do the clean up and another
>>>>> flag to control a guarantee.
>>>>
>>>> The check for an unexited Java monitor is not cheap.
>>>>
>>>> The existing check under the JNIDetachReleasesMonitors flag is
>>>> positioned to only affect threads using JNI DetachCurrentThread()
>>>> and that cost is justified as part of meeting the JNI spec.
>>>>
>>>> Adding the unexited Java monitor check to all exiting Java
>>>> threads should not be done lightly due to the cost. If the
>>>> VM is functioning correctly, then why impose this cost on
>>>> all exiting Java threads?
>>>>
>>>>
>>>>> I don't think two new command-line options (has this gone through CCC
>>>>> yet?) are justified.
>>>>
>>>> No, this has not gone through CCC yet. My preference is to get
>>>> code review feedback (like this) before submitting a CCC.
>>>>
>>>> I suppose I could have simply added more sub-options to the
>>>> existing Java monitor subsystem knobs and switches, but it
>>>> seemed to make sense to model JavaThreadExitReleasesMonitors
>>>> on the existing JNIDetachReleasesMonitors.
>>>>
>>>>
>>>>> Also note that allowing for JVM induced missing unlocks is something
>>>>> that can affect JNI attached threads as well as regular Java threads.
>>>>> I'll take this up in discussing thread.cpp below.
>>>>>
>>>>> That aside, the comment, which was modelled on the JNI DetachThread
>>>>> variant, does not really work in this case:
>>>>>
>>>>> 2801 "JavaThread exit() releases monitors owned by thread")
>>>>> \
>>>>>
>>>>> JavaThread is not a programming concept but an internal JVM
>>>>> abstraction. I would suggest something like:
>>>>>
>>>>> "Java thread termination releases monitors unexpectedly still
>>>>> owned by
>>>>> thread"
>>>>
>>>> I'll switch to your description text if we end up keeping the
>>>> new options.
>>>>
>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/objectMonitor.cpp
>>>>> src/share/vm/runtime/objectMonitor.hpp
>>>>>
>>>>> No comments
>>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/synchronizer.cpp
>>>>>
>>>>> If we find the unexpected locked monitor it would be useful to see
>>>>> more Java level information about the Thread (in the non-detaching
>>>>> case) and the object that is locked.
>>>>
>>>> I can probably steal some code from the thread dump side to
>>>> give us more info about the Object associated with the Java
>>>> monitor...
>>>>
>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/thread.cpp
>>>>>
>>>>> As noted previously JNI attached threads can also be affected by JVM
>>>>> induced locking bugs. So the comment block:
>>>>>
>>>>> 1804 // 6282335 JNI DetachCurrentThread spec states that all Java
>>>>> monitors
>>>>> 1805 // held by this thread must be released. A detach operation
>>>>> must only
>>>>> 1806 // get here if there are no Java frames on the stack.
>>>>> Therefore, any
>>>>> 1807 // owned monitors at this point MUST be JNI-acquired monitors
>>>>> which are
>>>>> 1808 // pre-inflated and in the monitor cache.
>>>>>
>>>>> is not strictly correct as it is presuming a correctly operating JVM.
>>>>
>>>> I actually disagree with the last two sentences in that comment block,
>>>> but I didn't have a good rewrite in mind. I'll think about it more.
>>>>
>>>>
>>>>
>>>>> Further the logic now applied:
>>>>>
>>>>> 1816 if ((exit_type == jni_detach && JNIDetachReleasesMonitors) ||
>>>>> 1817 JavaThreadExitReleasesMonitors) {
>>>>>
>>>>> is also not "correct" because if we have
>>>>> JavaThreadExitReleasesMonitors true while
>>>>> JNIDetachReleasesMonitors is
>>>>> false, we will in fact release any JNI acquired monitors because we
>>>>> can't make the distinction.
>>>>
>>>> I don't think JNIDetachReleasesMonitors only has to deal with
>>>> JNI-acquired
>>>> monitors and I don't think that JavaThreadExitReleasesMonitors only
>>>> has to
>>>> deal with Java code acquired monitors. I see these options as
>>>> applying to
>>>> "how you got to the Java thread exit path" and not applying to "what
>>>> needs
>>>> to be cleaned up".
>>>>
>>>> I see JNIDetachReleasesMonitors as a way of cleaning up unexited Java
>>>> monitors when we're executing the JNI DetachCurrentThread() code path.
>>>> I don't really care why the Java monitors are unexited... and I think
>>>> the spec wording is clear that all unexited Java monitors are cleaned
>>>> up... not just JNI-acquired monitors.
>>>>
>>>> I see JavaThreadExitReleasesMonitors as a way of cleaning up unexited
>>>> Java monitors in any code path where we are exiting the Java thread.
>>>> I see
>>>> JavaThreadExitReleasesMonitors as a superset of
>>>> JNIDetachReleasesMonitors.
>>>>
>>>>
>>>>> Given we can't make the distinction I don't see any way for these
>>>>> separate flags to actually work well together.
>>>>
>>>> I think they work well together when JavaThreadExitReleasesMonitors
>>>> is seen as a superset of JNIDetachReleasesMonitors. Of course, that
>>>> hinges on my opinion that JNIDetachReleasesMonitors applies to any
>>>> unexited Java monitor and not just JNI-acquired Java monitors.
>>>>
>>>>
>>>>> As I said above I'm more inclined to just add a guarantee to the
>>>>> non-detach case, than add two new flags. (Even though this means
>>>>> there
>>>>> won't be detection for such bugs when JNI attached threads encounter
>>>>> them - including the main thread :( ).
>>>>
>>>> I really don't want to add this check to all Java thread exit
>>>> paths since it is expensive.
>>>>
>>>>
>>>>> As I said this has me perplexed. :(
>>>>
>>>> Hopefully, I've explained it better now.
>>>>
>>>>
>>>>> ---
>>>>>
>>>>> src/share/vm/runtime/vframe.cpp
>>>>>
>>>>> 170 // It would be nice to distinguish between "waiting on" and
>>>>> 171 // "waited on". Currently, "waiting on" here with a
>>>>> 172 // java.lang.Thread.State == "WAITING (on object monitor)"
>>>>> 173 // earlier in the output means that the monitor has not yet
>>>>> been
>>>>> 174 // notified and java.lang.Thread.State == "BLOCKED (on
>>>>> object
>>>>> 175 // monitor)" earlier in the output means that the
>>>>> monitor has
>>>>> 176 // been notified and the thread is blocked on reentry.
>>>>>
>>>>> That's a very long sentence that can be quite difficult to parse and
>>>>> could probably be reworded to describe how the output should be
>>>>> interpreted eg:
>>>>>
>>>>> // If earlier in the output we reported java.lang.Thread.State ==
>>>>> // "WAITING (on object monitor)" and now we report "waited on", then
>>>>> // we are still waiting for notification [or timeout?]. Otherwise if
>>>>> // we earlier reported java.lang.Thread.State == "BLOCKED (on object
>>>>> // monitor)", then we are actually waiting to reacquire the monitor
>>>>> // lock. At this level we can't distinguish the two cases to report
>>>>> // "waited on" rather than "waiting on" for the second case.
>>>>
>>>> I'll take a look at rewriting the comment and may just take your
>>>> version entirely.
>>>>
>>>>
>>>>> I wonder if we can prod deeper into VM state to try and make the
>>>>> distinction?
>>>>
>>>> I played around with doing just that, but it seems like the M&M
>>>> code that fetches similar state probes the state stored in the
>>>> java.lang.Thread object. I didn't want to go there.
>>>>
>>>>
>>>>>
>>>>> 184 } else {
>>>>> 185 st->print_cr("\t- %s <no locals available>", wait_state);
>>>>>
>>>>> The concept of "locals" is confusing here. Isn't the "local" in this
>>>>> case just the object that we have waited upon? In which case we
>>>>> should
>>>>> say "no object reference available".
>>>>
>>>> I got the phrase "locals" from the compiler code that was trying to
>>>> decode the object reference. I can change to "no object reference
>>>> available".
>>>> I definitely want to say _something_ here. The existing code is just
>>>> silent.
>>>>
>>>>
>>>>> Though I would have thought/hoped there must be a way to find the
>>>>> object reference even in a compiled frame?
>>>>
>>>> Yes, if you deopt... I didn't want to go there.
>>>>
>>>>
>>>>> 195 // Print out all monitors that we have locked, are trying to
>>>>> lock
>>>>> 196 // or are trying to relock after a wait().
>>>>>
>>>>> that makes it sound like there are three cases when really a "relock"
>>>>> is just a specific case of lock. I would suggest:
>>>>>
>>>>> // Print out all monitors that we have locked, or are trying to
>>>>> // lock, including re-locking when returning from a wait().
>>>>
>>>> Making it more clear that the "re-locking" is coming from "wait()"
>>>> is an improvement. I'll fix this.
>>>>
>>>>
>>>>> 252 lock_state = "waiting to relock";
>>>>>
>>>>> I prefer "waiting to re-lock in wait()", if that isn't too long.
>>>>> Otherwise "relock" needs to be a concept people are familiar with.
>>>>
>>>> I like "waiting to re-lock in wait()".
>>>>
>>>>
>>>>> 260 if (VerboseStackTrace && mark != NULL) {
>>>>> 261 st->print("\t- lockbits=");
>>>>> 262 mark->print_on(st);
>>>>>
>>>>> This really doesn't seem to warrant a new diagnostic flag, regardless
>>>>> of whether we think applying -XX:+Verbose is a good fit or not.
>>>>
>>>> I suspect that David C didn't want to change the default output format
>>>> and neither do I. We have some tests that try to parse out specific
>>>> things from thread dumps to verify certain bug fixes and we don't want
>>>> to break those. I'm thinking of the "duplicate owner" bug fix that we
>>>> fixed in thread dumps, but I can't pull up the bug ID (yet)...
>>>>
>>>> Thanks for the thorough review. I have to say that I wasn't expecting
>>>> much comment on this review at all. :-)
>>>>
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -------
>>>>>
>>>>>> This work is being tracked by the following bug ID:
>>>>>>
>>>>>> JDK-8130448 8077392 inspired thread dump improvements, comment
>>>>>> additions, new diagnostics
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8130448
>>>>>>
>>>>>> Here is the webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8130448-webrev/0-jdk9-hs-rt/
>>>>>>
>>>>>> Testing:
>>>>>>
>>>>>> - RBT vm.quick batches (in process)
>>>>>> - JPRT test jobs
>>>>>>
>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>
>>>>>> Gory details about the changes are below...
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> 8130448 summary of changes:
>>>>>>
>>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>>>> - comment additions for the assembly code
>>>>>>
>>>>>> src/share/vm/oops/markOop.cpp
>>>>>> - has_monitor() has to be checked before is_locked() since
>>>>>> the has_monitor() bits are a superset of the is_locked() bits
>>>>>> - code style fixes
>>>>>>
>>>>>> src/share/vm/runtime/globals.hpp
>>>>>> - add VerboseStackTrace diagnostic option
>>>>>> - add GuaranteeOnMonitorMismatch diagnostic option
>>>>>> - add JavaThreadExitReleasesMonitors product option;
>>>>>> this is the Java equivalent of JNIDetachReleasesMonitors
>>>>>>
>>>>>> src/share/vm/runtime/objectMonitor.cpp
>>>>>> - delete unused ObjectMonitor::try_enter()
>>>>>> - fix assert wording
>>>>>>
>>>>>> src/share/vm/runtime/objectMonitor.hpp
>>>>>> - delete unused ObjectMonitor::try_enter()
>>>>>>
>>>>>> src/share/vm/runtime/synchronizer.cpp
>>>>>> - add GuaranteeOnMonitorMismatch support with some
>>>>>> diagnostic output
>>>>>>
>>>>>> src/share/vm/runtime/thread.cpp
>>>>>> - add JavaThreadExitReleasesMonitors support
>>>>>>
>>>>>> src/share/vm/runtime/vframe.cpp
>>>>>> - clarify existing comments
>>>>>> - add comments to clarify what "waiting on" means
>>>>>> - add line to show when we are "waiting on", but
>>>>>> there are no locals available (previously we were
>>>>>> "waiting on" silently)
>>>>>> - add support for "waiting to relock" which can happen
>>>>>> for any of the monitors and not just the top monitor
>>>>>> - switch mark->print_on() verbose output to new
>>>>>> VerboseStackTrace switch; thread dumps are not enabled
>>>>>> with a specific switch so the '-XX:+Verbose' model of
>>>>>> being a modifier for another option does not fit
>>>>
>>
More information about the hotspot-runtime-dev
mailing list