RFR(S) 8130448: thread dump improvements, comment, additions, new diagnostics inspired by 8077392

Mon Jul 13 06:58:18 UTC 2015

On 9/07/2015 11:00 PM, Daniel D. Daugherty wrote:
> On 7/8/15 11:13 PM, David Holmes wrote:
>> On 8/07/2015 3:28 AM, Daniel D. Daugherty wrote:
>>> On 7/7/15 1:11 AM, David Holmes wrote:
>>>> Hi Dan,
>>>>
>>>> Top posting as I'm lazy and running late :)
>>>
>>> No problem with the top posting. It makes your remaining
>>> concerns earlier to see and address.
>>>
>>>
>>>> Okay ... I'm still uneasy about adding two additional flags for the
>>>> "release monitors on exit" situation. I hear what you are saying about
>>>> the cost of the monitor check - I had thought we kept easy access to
>>>> the set of monitors owned by a thread but it seems not, so trawling
>>>> the global monitor list is indeed an expensive operation. :( So okay
>>>> but I don't like it :)
>>>
>>> Since JavaThreadExitReleasesMonitors and GuaranteeOnMonitorMismatch
>>> are targeted at finding/working around bugs in the Java Monitor
>>> implementation, I'm going to take a look at moving the flag support
>>> down into Java Monitor subsystem's implementation specific options.
>>> That will get the options out of the global space and they won't
>>> have to be counted against the "cap and trade" limit :-)
>>>
>>>
>>>> I also take your point about viewing the new flag as a superset of the
>>>> JNI-detach case. That's fair enough, but of course that aspect needs
>>>> to be documented.
>>>
>>> I'm starting to wonder why we have the 'JNIDetachReleasesMonitors'
>>> option at all. Is there a good reason to be able to turn off this
>>> piece of code?
>>
>> My recollection is that hotspot did not implement that part of the
>> spec. Simply switching the behaviour with no way to restore the
>> original behaviour could break existing apps. Hence a flag, on by
>> default.
>
> That's very likely the case. I'll poke around in my historical
> archive and see what shakes out.

https://bugs.openjdk.java.net/browse/JDK-6336479

>
>> Of course by now apps should be spec compliant themselves so this
>> seems a flag that should itself be deprecated in 9 and targetted for
>> removal in 10. (That will help with the 'conservation of flags'
>> problem :) ).
>
> Sounds like a good RFE.

Filed: https://bugs.openjdk.java.net/browse/JDK-8131045

Cheers,
David


> Dan
>
>
>>
>>>
>>>
>>>> I was left unclear about your position on the VerboseStackTrace. I
>>>> agree the new output was likely put under a flag to avoid disturbing
>>>> existing output formats. But I don't see that it warrants its own flag
>>>> - to me using Verbose okay (though the fact it is a develop flag is
>>>> limiting).
>>>
>>> Thanks for reminding me that '-XX:+Verbose' is a develop flag.
>>> That's another reason for not using that flag. Combined with
>>> the fact that '-XX:+Verbose' is considered a modifier to other
>>> flags and thread dumps aren't enabled/triggered by a flag means
>>> (to me anyway) that '-XX:+Verbose' is still not a good match.
>>>
>>> However, I have a vague memory that the Java Monitor subsystem
>>> has its own "verbose" flag. I'll take a look at switching to
>>> that...
>>
>> Okay
>>
>>> Will be working on the next round...
>>
>> Thanks,
>> David
>>
>>> Dan
>>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 7/07/2015 6:47 AM, Daniel D. Daugherty wrote:
>>>>> On 7/6/15 12:43 AM, David Holmes wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> Metacomment: Could I suggest that in a RFR your subject line has the
>>>>>> form <bugid>: <synopsis>, rather than <synopsis>(bugid) - especially
>>>>>> when the synopsis actually starts with a bug id :) Thanks.
>>>>>
>>>>> Yup that bug synopsis is/was a mess:
>>>>>
>>>>> Changed the synopsis line to :
>>>>>
>>>>>      thread dump improvements, comment additions, new diagnostics
>>>>> inspired by 8077392
>>>>>
>>>>> Also updated the e-mail thread subject line to your suggested format.
>>>>> Will try to remember to switch to that style in the future.
>>>>>
>>>>>
>>>>>> Mostly okay but some major concerns around the can of worms that
>>>>>> JavaThreadExitReleasesMonitors opens up.
>>>>>
>>>>> This is Java monitors so "can of worms" is familiar! :-)
>>>>>
>>>>>
>>>>>>
>>>>>> On 4/07/2015 10:14 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> The hunt for the following bug:
>>>>>>>
>>>>>>>      JDK-8077392 Stream fork/join tasks occasionally fail to
>>>>>>> complete
>>>>>>>
>>>>>>> and David C's work on the following bug:
>>>>>>>
>>>>>>>      JDK-8069412 Locks need better debug-printing support
>>>>>>>
>>>>>>> have inspired additional thread dump improvements, comment additions
>>>>>>> to some Java monitor code and some new diagnostic options.
>>>>>>
>>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>>>>
>>>>>> 1821        // update _owner from from BasicLock to thread
>>>>>>
>>>>>> Typo: from from
>>>>>
>>>>> Will fix.
>>>>>
>>>>>
>>>>>> 2091        // the placeholder value. If that didn't work, then
>>>>>> another
>>>>>> 2092        // grabbed the lock so we're done (and exit was a
>>>>>> success).
>>>>>>
>>>>>> another -> another thread ?
>>>>>
>>>>> Yes, "another thread" is better. Will fix.
>>>>>
>>>>>
>>>>>> 2201       // If that didn't work, then another grabbed the lock
>>>>>> 2202       // so we're done (and exit was a success).
>>>>>>
>>>>>> another -> another thread ?
>>>>>
>>>>> Yes, "another thread" is better. Will fix.
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/oops/markOop.cpp
>>>>>>
>>>>>>   40       st->print("NULL");  // this should not happen
>>>>>>
>>>>>> Shouldn't that be an assert or guarantee in that case? Or at a
>>>>>> minimum
>>>>>> print something like: "NULL (but you should never see this!)"
>>>>>
>>>>> This is a relocated line from David C's change:
>>>>>
>>>>> old 49:         st->print("monitor=NULL");
>>>>>
>>>>> I simply added a comment to it. I don't think I like the idea of the
>>>>> thread dump code assert()'ing or guarantee()'ing because there might
>>>>> be other useful info that would have been printed after the assert()
>>>>> or guarantee() failure.
>>>>>
>>>>> I do like the idea of changing the message to:
>>>>>
>>>>> 40:       st->print("NULL (this should never be seen!)");
>>>>>
>>>>>
>>>>>> 42       BasicLock * bl = (BasicLock *) mon->owner();
>>>>>>
>>>>>> What information has told us this is a BasicLock* not a Thread* ? But
>>>>>> as all you do is print bl why not just p2i(mon->owner()) ?
>>>>>
>>>>> Again, this is a relocated line from David C's change:
>>>>>
>>>>> old 51:         BasicLock * bl = (BasicLock *) mon->owner();
>>>>>
>>>>> I'm going to guess that David C had some other code in there before
>>>>> that needed/assumed that the field is a "BasicLock *", but that code
>>>>> didn't make it into his final version.
>>>>>
>>>>> I will drop the 'bl' local and update the p2i() call to use
>>>>> mon->owner() directly.
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/runtime/globals.hpp
>>>>>>
>>>>>> JavaThreadExitReleasesMonitors has me perplexed. The JNI DetachThread
>>>>>> case seems to be a concession to badly written code that fails to
>>>>>> release locks on error paths.
>>>>>
>>>>> I don't think I would call the JNIDetachReleasesMonitors case a
>>>>> concession
>>>>> to badly written JNI code. 6282335 is very clear that this code was
>>>>> added
>>>>> to meet the JNI spec. The JNI spec words might be considered a
>>>>> concession...
>>>>>
>>>>> http://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#DetachCurrentThread
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  > DetachCurrentThread
>>>>>  >
>>>>>  > jint DetachCurrentThread(JavaVM *vm);
>>>>>  >
>>>>>  > Detaches the current thread from a Java VM. All Java monitors
>>>>>  > held by this thread are released. All Java threads waiting for
>>>>>  > this thread to die are notified.
>>>>>  >
>>>>>  > The main thread can be detached from the VM.
>>>>>
>>>>>
>>>>>
>>>>>> The equivalent for a Java thread would again involve lock acquisition
>>>>>> via JNI. But I don't recall ever seeing a bug report or RFE
>>>>>> related to
>>>>>> this. What seems to have motivated this new flag is the potential for
>>>>>> a broken VM to actually leave the monitor locked (e.g. because
>>>>>> compiled code failed to do the unlock on all paths).
>>>>>
>>>>> 8077392 and bugs like it are exactly the reason for adding these
>>>>> two options.
>>>>>
>>>>> JavaThreadExitReleasesMonitors is added to provide a work around
>>>>> until the underlying bug or bugs are fixed. This option is targeted
>>>>> at folks that are chasing their own bugs and don't care about this
>>>>> one.
>>>>>
>>>>> GuaranteeOnMonitorMismatch is added to help hunt 8077392 specifically
>>>>> and to sanity check for similar issues generally. I can see using the
>>>>> two options in combination with existing test suites as a stress test
>>>>> for the Java monitor subsystem.
>>>>>
>>>>>
>>>>>> To address that I'd be inclined to simply have a guarantee in the
>>>>>> Java
>>>>>> thread exit path, rather than a flag to do the clean up and another
>>>>>> flag to control a guarantee.
>>>>>
>>>>> The check for an unexited Java monitor is not cheap.
>>>>>
>>>>> The existing check under the JNIDetachReleasesMonitors flag is
>>>>> positioned to only affect threads using JNI DetachCurrentThread()
>>>>> and that cost is justified as part of meeting the JNI spec.
>>>>>
>>>>> Adding the unexited Java monitor check to all exiting Java
>>>>> threads should not be done lightly due to the cost. If the
>>>>> VM is functioning correctly, then why impose this cost on
>>>>> all exiting Java threads?
>>>>>
>>>>>
>>>>>> I don't think two new command-line options (has this gone through CCC
>>>>>> yet?) are justified.
>>>>>
>>>>> No, this has not gone through CCC yet. My preference is to get
>>>>> code review feedback (like this) before submitting a CCC.
>>>>>
>>>>> I suppose I could have simply added more sub-options to the
>>>>> existing Java monitor subsystem knobs and switches, but it
>>>>> seemed to make sense to model JavaThreadExitReleasesMonitors
>>>>> on the existing JNIDetachReleasesMonitors.
>>>>>
>>>>>
>>>>>> Also note that allowing for JVM induced missing unlocks is something
>>>>>> that can affect JNI attached threads as well as regular Java threads.
>>>>>> I'll take this up in discussing thread.cpp below.
>>>>>>
>>>>>> That aside, the comment, which was modelled on the JNI DetachThread
>>>>>> variant, does not really work in this case:
>>>>>>
>>>>>> 2801           "JavaThread exit() releases monitors owned by thread")
>>>>>>          \
>>>>>>
>>>>>> JavaThread is not a programming concept but an internal JVM
>>>>>> abstraction. I would suggest something like:
>>>>>>
>>>>>> "Java thread termination releases monitors unexpectedly still
>>>>>> owned by
>>>>>> thread"
>>>>>
>>>>> I'll switch to your description text if we end up keeping the
>>>>> new options.
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/runtime/objectMonitor.cpp
>>>>>> src/share/vm/runtime/objectMonitor.hpp
>>>>>>
>>>>>> No comments
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/runtime/synchronizer.cpp
>>>>>>
>>>>>> If we find the unexpected locked monitor it would be useful to see
>>>>>> more Java level information about the Thread (in the non-detaching
>>>>>> case) and the object that is locked.
>>>>>
>>>>> I can probably steal some code from the thread dump side to
>>>>> give us more info about the Object associated with the Java
>>>>> monitor...
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>>  src/share/vm/runtime/thread.cpp
>>>>>>
>>>>>> As noted previously JNI attached threads can also be affected by JVM
>>>>>> induced locking bugs. So the comment block:
>>>>>>
>>>>>> 1804   // 6282335 JNI DetachCurrentThread spec states that all Java
>>>>>> monitors
>>>>>> 1805   // held by this thread must be released.  A detach operation
>>>>>> must only
>>>>>> 1806   // get here if there are no Java frames on the stack.
>>>>>> Therefore, any
>>>>>> 1807   // owned monitors at this point MUST be JNI-acquired monitors
>>>>>> which are
>>>>>> 1808   // pre-inflated and in the monitor cache.
>>>>>>
>>>>>> is not strictly correct as it is presuming a correctly operating JVM.
>>>>>
>>>>> I actually disagree with the last two sentences in that comment block,
>>>>> but I didn't have a good rewrite in mind. I'll think about it more.
>>>>>
>>>>>
>>>>>
>>>>>> Further the logic now applied:
>>>>>>
>>>>>> 1816   if ((exit_type == jni_detach && JNIDetachReleasesMonitors) ||
>>>>>> 1817       JavaThreadExitReleasesMonitors) {
>>>>>>
>>>>>> is also not "correct" because if we have
>>>>>> JavaThreadExitReleasesMonitors true while
>>>>>> JNIDetachReleasesMonitors is
>>>>>> false, we will in fact release any JNI acquired monitors because we
>>>>>> can't make the distinction.
>>>>>
>>>>> I don't think JNIDetachReleasesMonitors only has to deal with
>>>>> JNI-acquired
>>>>> monitors and I don't think that JavaThreadExitReleasesMonitors only
>>>>> has to
>>>>> deal with Java code acquired monitors. I see these options as
>>>>> applying to
>>>>> "how you got to the Java thread exit path" and not applying to "what
>>>>> needs
>>>>> to be cleaned up".
>>>>>
>>>>> I see JNIDetachReleasesMonitors as a way of cleaning up unexited Java
>>>>> monitors when we're executing the JNI DetachCurrentThread() code path.
>>>>> I don't really care why the Java monitors are unexited... and I think
>>>>> the spec wording is clear that all unexited Java monitors are cleaned
>>>>> up... not just JNI-acquired monitors.
>>>>>
>>>>> I see JavaThreadExitReleasesMonitors as a way of cleaning up unexited
>>>>> Java monitors in any code path where we are exiting the Java thread.
>>>>> I see
>>>>> JavaThreadExitReleasesMonitors as a superset of
>>>>> JNIDetachReleasesMonitors.
>>>>>
>>>>>
>>>>>> Given we can't make the distinction I don't see any way for these
>>>>>> separate flags to actually work well together.
>>>>>
>>>>> I think they work well together when JavaThreadExitReleasesMonitors
>>>>> is seen as a superset of JNIDetachReleasesMonitors. Of course, that
>>>>> hinges on my opinion that JNIDetachReleasesMonitors applies to any
>>>>> unexited Java monitor and not just JNI-acquired Java monitors.
>>>>>
>>>>>
>>>>>> As I said above I'm more inclined to just add a guarantee to the
>>>>>> non-detach case, than add two new flags. (Even though this means
>>>>>> there
>>>>>> won't be detection for such bugs when JNI attached threads encounter
>>>>>> them - including the main thread :( ).
>>>>>
>>>>> I really don't want to add this check to all Java thread exit
>>>>> paths since it is expensive.
>>>>>
>>>>>
>>>>>> As I said this has me perplexed. :(
>>>>>
>>>>> Hopefully, I've explained it better now.
>>>>>
>>>>>
>>>>>> ---
>>>>>>
>>>>>> src/share/vm/runtime/vframe.cpp
>>>>>>
>>>>>> 170       // It would be nice to distinguish between "waiting on" and
>>>>>> 171       // "waited on". Currently, "waiting on" here with a
>>>>>> 172       // java.lang.Thread.State == "WAITING (on object monitor)"
>>>>>> 173       // earlier in the output means that the monitor has not yet
>>>>>> been
>>>>>> 174       // notified and java.lang.Thread.State == "BLOCKED (on
>>>>>> object
>>>>>> 175       // monitor)" earlier in the output means that the
>>>>>> monitor has
>>>>>> 176       // been notified and the thread is blocked on reentry.
>>>>>>
>>>>>> That's a very long sentence that can be quite difficult to parse and
>>>>>> could probably be reworded to describe how the output should be
>>>>>> interpreted eg:
>>>>>>
>>>>>> // If earlier in the output we reported java.lang.Thread.State ==
>>>>>> // "WAITING (on object monitor)" and now we report "waited on", then
>>>>>> // we are still waiting for notification [or timeout?]. Otherwise if
>>>>>> // we earlier reported java.lang.Thread.State == "BLOCKED (on object
>>>>>> // monitor)", then we are actually waiting to reacquire the monitor
>>>>>> // lock. At this level we can't distinguish the two cases to report
>>>>>> // "waited on" rather than "waiting on" for the second case.
>>>>>
>>>>> I'll take a look at rewriting the comment and may just take your
>>>>> version entirely.
>>>>>
>>>>>
>>>>>> I wonder if we can prod deeper into VM state to try and make the
>>>>>> distinction?
>>>>>
>>>>> I played around with doing just that, but it seems like the M&M
>>>>> code that fetches similar state probes the state stored in the
>>>>> java.lang.Thread object. I didn't want to go there.
>>>>>
>>>>>
>>>>>>
>>>>>> 184       } else {
>>>>>> 185         st->print_cr("\t- %s <no locals available>", wait_state);
>>>>>>
>>>>>> The concept of "locals" is confusing here. Isn't the "local" in this
>>>>>> case just the object that we have waited upon? In which case we
>>>>>> should
>>>>>> say "no object reference available".
>>>>>
>>>>> I got the phrase "locals" from the compiler code that was trying to
>>>>> decode the object reference. I can change to "no object reference
>>>>> available".
>>>>> I definitely want to say _something_ here. The existing code is just
>>>>> silent.
>>>>>
>>>>>
>>>>>> Though I would have thought/hoped there must be a way to find the
>>>>>> object reference even in a compiled frame?
>>>>>
>>>>> Yes, if you deopt... I didn't want to go there.
>>>>>
>>>>>
>>>>>> 195   // Print out all monitors that we have locked, are trying to
>>>>>> lock
>>>>>> 196   // or are trying to relock after a wait().
>>>>>>
>>>>>> that makes it sound like there are three cases when really a "relock"
>>>>>> is just a specific case of lock. I would suggest:
>>>>>>
>>>>>>    // Print out all monitors that we have locked, or are trying to
>>>>>>    // lock, including re-locking when returning from a wait().
>>>>>
>>>>> Making it more clear that the "re-locking" is coming from "wait()"
>>>>> is an improvement. I'll fix this.
>>>>>
>>>>>
>>>>>> 252             lock_state = "waiting to relock";
>>>>>>
>>>>>> I prefer "waiting to re-lock in wait()", if that isn't too long.
>>>>>> Otherwise "relock" needs to be a concept people are familiar with.
>>>>>
>>>>> I like "waiting to re-lock in wait()".
>>>>>
>>>>>
>>>>>> 260         if (VerboseStackTrace && mark != NULL) {
>>>>>> 261           st->print("\t- lockbits=");
>>>>>> 262           mark->print_on(st);
>>>>>>
>>>>>> This really doesn't seem to warrant a new diagnostic flag, regardless
>>>>>> of whether we think applying -XX:+Verbose is a good fit or not.
>>>>>
>>>>> I suspect that David C didn't want to change the default output format
>>>>> and neither do I. We have some tests that try to parse out specific
>>>>> things from thread dumps to verify certain bug fixes and we don't want
>>>>> to break those. I'm thinking of the "duplicate owner" bug fix that we
>>>>> fixed in thread dumps, but I can't pull up the bug ID (yet)...
>>>>>
>>>>> Thanks for the thorough review. I have to say that I wasn't expecting
>>>>> much comment on this review at all. :-)
>>>>>
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -------
>>>>>>
>>>>>>> This work is being tracked by the following bug ID:
>>>>>>>
>>>>>>>      JDK-8130448 8077392 inspired thread dump improvements, comment
>>>>>>>                  additions, new diagnostics
>>>>>>>      https://bugs.openjdk.java.net/browse/JDK-8130448
>>>>>>>
>>>>>>> Here is the webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8130448-webrev/0-jdk9-hs-rt/
>>>>>>>
>>>>>>> Testing:
>>>>>>>
>>>>>>> - RBT vm.quick batches (in process)
>>>>>>> - JPRT test jobs
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, questions or suggestions.
>>>>>>>
>>>>>>> Gory details about the changes are below...
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> 8130448 summary of changes:
>>>>>>>
>>>>>>> src/cpu/x86/vm/macroAssembler_x86.cpp
>>>>>>>    - comment additions for the assembly code
>>>>>>>
>>>>>>> src/share/vm/oops/markOop.cpp
>>>>>>>    - has_monitor() has to be checked before is_locked() since
>>>>>>>      the has_monitor() bits are a superset of the is_locked() bits
>>>>>>>    - code style fixes
>>>>>>>
>>>>>>> src/share/vm/runtime/globals.hpp
>>>>>>>    - add VerboseStackTrace diagnostic option
>>>>>>>    - add GuaranteeOnMonitorMismatch diagnostic option
>>>>>>>    - add JavaThreadExitReleasesMonitors product option;
>>>>>>>      this is the Java equivalent of JNIDetachReleasesMonitors
>>>>>>>
>>>>>>> src/share/vm/runtime/objectMonitor.cpp
>>>>>>>    - delete unused ObjectMonitor::try_enter()
>>>>>>>    - fix assert wording
>>>>>>>
>>>>>>> src/share/vm/runtime/objectMonitor.hpp
>>>>>>>    - delete unused ObjectMonitor::try_enter()
>>>>>>>
>>>>>>> src/share/vm/runtime/synchronizer.cpp
>>>>>>>    - add GuaranteeOnMonitorMismatch support with some
>>>>>>>      diagnostic output
>>>>>>>
>>>>>>> src/share/vm/runtime/thread.cpp
>>>>>>>    - add JavaThreadExitReleasesMonitors support
>>>>>>>
>>>>>>> src/share/vm/runtime/vframe.cpp
>>>>>>>    - clarify existing comments
>>>>>>>    - add comments to clarify what "waiting on" means
>>>>>>>    - add line to show when we are "waiting on", but
>>>>>>>      there are no locals available (previously we were
>>>>>>>      "waiting on" silently)
>>>>>>>    - add support for "waiting to relock" which can happen
>>>>>>>      for any of the monitors and not just the top monitor
>>>>>>>    - switch mark->print_on() verbose output to new
>>>>>>>      VerboseStackTrace switch; thread dumps are not enabled
>>>>>>>      with a specific switch so the '-XX:+Verbose' model of
>>>>>>>      being a modifier for another option does not fit
>>>>>
>>>
>