RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)

Fri Nov 13 13:40:12 UTC 2015

On 13/11/2015 10:08 PM, Frederic Parain wrote:
>
>
> On 13/11/2015 05:52, David Holmes wrote:
>> Hi Coleen,
>>
>> On 13/11/2015 7:41 AM, Coleen Phillimore wrote:
>>>
>>> Hi,  I should learn my lesson that Frederic, Bertrand and David are
>>> generally right.
>>>
>>> My restructuring went afoul of the STEP framework, and so I have made
>>> VMError::print_vm_info() report the safe things that VMError::report()
>>> reports.  Hopefully, this will address the concern about safety.   It
>>> also allows us to remove the lock and static buffer in the VM.info
>>> function.   The duplication isn't as bad as I thought and is a better
>>> solution than piggy backing from VMError::report().
>>>
>>> Please review:
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>
>> This looks good to me. A few small queries:
>>
>> 888   if (Universe::is_fully_initialized()) {
>>
>> I don't think it is possible to run a Dcmd before the VM is fully
>> initialized.
>
> It is not possible to invoke a Dcmd from outside of the VM (using the
> attachAPI or the PlatformMBeanServer) before the VM is fully initialized.
> However it is still possible to invoke a Dcmd directly from VM code.

Not sure what that means. Where/why would VM code execute a Dcmd on 
itself ??

Cheers,
David

> The Dcmd framework is initialized in Management::init(), which is called
> before universe_post_init() is invoked. I don't know exactly the
> dependencies
> of Universe::heap()->print_on_error(), but the test looks reasonable.
>
> Fred
>
>>
>> 889     Universe::heap()->print_on_error(st);
>>
>> Is this safe? I'm not sure what it does - print heap memory regions ??
>>
>>  891     st->print_cr("Polling page: " INTPTR_FORMAT,
>> p2i(os::get_polling_page()));
>>
>> Is this useful ??
>>
>> Thanks,
>> David
>>
>>> I tested with jcmd <pid> VM.info concurrently and with an inserted crash
>>> during VM.info call to make sure the original hs_err_pid report comes
>>> out okay.
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>>>>
>>>>
>>>> On 11/10/15 7:12 PM, David Holmes wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> I still do not like the intertwining of the info and error code, but
>>>>> I won't say more on that.
>>>>
>>>> Yes, I understand the approach has good arguments on both sides. I'm
>>>> trying to find a compromise that can move this feature forward.
>>>>>
>>>>> However you may now have a concurrency problem:
>>>>>
>>>>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>>>>  843 void VMError::report_info(outputStream* st) {
>>>>>  844
>>>>>  845   // don't allocate large buffer on stack
>>>>>  846   static char buf[O_BUFLEN];
>>>>>
>>>>> Can you have concurrent info requests in flight? For the info case
>>>>> the buffer could be allocated on the stack I think, or even
>>>>> dynamically. (Aside: will this be executed in the service thread?)
>>>>
>>>> There is a lock added to protect the VM.info dcmd, so there would be
>>>> only sequential access to the static buffer.  I don't think this is
>>>> executed in the Service Thread.  I see no evidence of this anyway.
>>>>
>>>>>
>>>>> report_process could possibly be refactored further as there's a lot
>>>>> of initial stuff that won't apply to the info case:
>>>>>  - we won't print all threads
>>>>>  - the VM state will always be "not at safepoint"
>>>>>  - The Universe must be initialized (not clear at what point during
>>>>> shutdown a dcmd can still come in?)
>>>>>  - we don't print owned locks
>>>>
>>>> Yes, I can do this.  It actually improves how this looks.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>
>>>>> That's all from me.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've done some refactoring and cleanup on this which the refactoring
>>>>>> enabled.   Again, it's important that the VM.info output be mostly
>>>>>> the
>>>>>> same order as the hs_err_pid* report file so that sustaining can
>>>>>> easily
>>>>>> match and find output that they are looking for.   It's also
>>>>>> important
>>>>>> not to copy this code because there may be new information added
>>>>>> that we
>>>>>> also want in the VM.info output.  If any future information that we
>>>>>> want
>>>>>> to print seems unsafe, the vm_info_cmd can be used to exclude it
>>>>>> or it
>>>>>> can be printed only in the error reporting function. Rejecting the
>>>>>> entire change because of the theoretical unsafe addition of code
>>>>>> isn't
>>>>>> the right thing to do, although yeah, I've argued the other way in
>>>>>> the
>>>>>> past.  This is something that is helpful to sustaining and our
>>>>>> licensees.
>>>>>>
>>>>>> In this patch, the functions that VM.info calls are safe while the
>>>>>> JVM
>>>>>> is running, but the stepping is required because at crash time, they
>>>>>> might not be.  Although I've never seen the error reporting crash in
>>>>>> these places.
>>>>>>
>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>
>>>>>> This refactoring has been run through RBT.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>>>>> Also note that some of the crash information is not interesting for
>>>>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>>>>> registers/stack data for any specific thread. If people want thread
>>>>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>>>>> system state.
>>>>>>>
>>>>>>> I'm a little surprised that thread info is not of interest, but
>>>>>>> as you
>>>>>>> say that is available by other means  - which was part of my earlier
>>>>>>> point about there already being safe means to get some of the info.
>>>>>>>
>>>>>>> If this is really only about the small subset of safe "system
>>>>>>> information" that VMError reports, then I see even less
>>>>>>> motivation to
>>>>>>> piggy-back VM-info onto VMerror and entangle the two. If each
>>>>>>> block of
>>>>>>> information can be considered a single "step" in the error case then
>>>>>>> we can simply factor out the actual printing parts and share
>>>>>>> them. If
>>>>>>> the error case wants fine-grained steps then we can't do that.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>> Kind Regards
>>>>>>>> /Mattis
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Coleen Phillimore
>>>>>>>> Sent: den 3 november 2015 16:25
>>>>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>>>>> hs_err print-out
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've done a bit of refactoring with this change and have tested it
>>>>>>>> with and without intentional crashes to the VM.info reporting.
>>>>>>>> If it
>>>>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>>>>> where it comes from in the hs_err file.  And the hs_err_pid file is
>>>>>>>> correctly produced.  If we can make the jcmd come with a warning
>>>>>>>> that
>>>>>>>> it *could* potentially crash the vm, I think this feature is a good
>>>>>>>> addition for sustaining.
>>>>>>>>
>>>>>>>> Without crashing, VM.info gives a lot of information that we
>>>>>>>> normally
>>>>>>>> give with the hs_err file.  The reason for the careful STEP'ping in
>>>>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>>>>> count on anything.  The VM.info command is designed to be run on a
>>>>>>>> moderately stable JVM so is less likely to crash on the things that
>>>>>>>> we're printing out.
>>>>>>>>
>>>>>>>> I think we should fix the Threads walking code that Fred pointed
>>>>>>>> out
>>>>>>>> to take the Threads_lock, but I don't think we should *copy*
>>>>>>>> code for
>>>>>>>> VM.info or copy print_on_error functions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>>>>> Thanks Thomas,
>>>>>>>>>>
>>>>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>>>>> if we
>>>>>>>>>> are not careful.
>>>>>>>>>>
>>>>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious
>>>>>>>>>> source of
>>>>>>>>>> potential crashes if you take into account the fact that the
>>>>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>>>>
>>>>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>>>>
>>>>>>>>> I'm not sure of why we do not print the threads when there is
>>>>>>>>> notion
>>>>>>>>> of current thread (and I would wonder whether we should print
>>>>>>>>> them for
>>>>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>>>>> crash.
>>>>>>>>>
>>>>>>>>> However, I still do not like it from a design point of view for
>>>>>>>>> the
>>>>>>>>> other reasons stated in my e-mail.
>>>>>>>>>
>>>>>>>>> Bertrand.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>>>>> what
>>>>>>>>>> it calls) has been designed.
>>>>>>>>>>
>>>>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient unless
>>>>>>>>>> you
>>>>>>>>>> pass that argument to all layers. This is the biggest design
>>>>>>>>>> issue.
>>>>>>>>>> For instance all calls to print_on_error could have this
>>>>>>>>>> additional
>>>>>>>>>> argument so that the different components can adapt their
>>>>>>>>>> behavior
>>>>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>>>>> case,
>>>>>>>>>> but the same if true for all methods called by VMError::report. I
>>>>>>>>>> would go as far as adding a comment in all print_on_error methods
>>>>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>>>>> locks
>>>>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>>>>> VMError:report is supposed to skip.
>>>>>>>>>> The current design, where the various components may not be aware
>>>>>>>>>> that they are called for two different purposes, is not robust
>>>>>>>>>> enough.
>>>>>>>>>>
>>>>>>>>>> Note also that being 'robust enough for the "crash" case' is
>>>>>>>>>> not a
>>>>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>>>>> of the
>>>>>>>>>> info STEP crashes the VM instead of just bringing us to the next
>>>>>>>>>> error reporting STEP. Now, as stated before, I could agree if we
>>>>>>>>>> use
>>>>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Bertrand
>>>>>>>>>>
>>>>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>>>>
>>>>>>>>>>> However, there are some things to keep in mind when doing this:
>>>>>>>>>>>
>>>>>>>>>>> Error handling is different between the "info-only" and "crash"
>>>>>>>>>>> case. In
>>>>>>>>>>> "info-only" case, VMError::report() is called without enclosing
>>>>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>>>>> handler
>>>>>>>>>>> intact, not with the secondary signal handler. So, normal signal
>>>>>>>>>>> handling works, which is good. But it also means that any error
>>>>>>>>>>> inside one of the error reporting STEPs will bring down the VM,
>>>>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>>>>> ("Error
>>>>>>>>>>> occurred during error reporting").
>>>>>>>>>>>
>>>>>>>>>>> The problem is that a number of error reporting steps are a bit
>>>>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>>>>> happen
>>>>>>>>>>> is that the STEP will be interrupted. These risky steps
>>>>>>>>>>> should not
>>>>>>>>>>> be executed in the "only-info" case.
>>>>>>>>>>>
>>>>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>>>>> "_vm_info_cmd"
>>>>>>>>>>> and conditionally excludes dangerous or pointless error
>>>>>>>>>>> reporting
>>>>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>>>>
>>>>>>>>>>> As far as I can see, the other concerns concern only the "crash"
>>>>>>>>>>> case and simply do not apply to the "info-only" case. If an
>>>>>>>>>>> error
>>>>>>>>>>> reporting step is robust enough for the "crash" case (low stack
>>>>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it
>>>>>>>>>>> should be
>>>>>>>>>>> safe enough for the "info" case.
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards, Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>>>>> <mattis.castegren at oracle.com
>>>>>>>>>>> <mailto:mattis.castegren at oracle.com>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>      Hi
>>>>>>>>>>>
>>>>>>>>>>>      Of course, when I wrote "Review and push this fix" I meant
>>>>>>>>>>> doing a
>>>>>>>>>>>      thorough technical review. We do not want VM.info to be
>>>>>>>>>>> unsafe, and
>>>>>>>>>>>      I will let David reply to the technical questions that has
>>>>>>>>>>> been
>>>>>>>>>>>      raised. I know he did consider some of these issues, but
>>>>>>>>>>> last
>>>>>>>>>>> week
>>>>>>>>>>>      he was on Java one, and this week he is out. I will let him
>>>>>>>>>>> comment
>>>>>>>>>>>      once he gets back.
>>>>>>>>>>>
>>>>>>>>>>>      Kind Regards
>>>>>>>>>>>      /Mattis
>>>>>>>>>>>
>>>>>>>>>>>      -----Original Message-----
>>>>>>>>>>>      From: Bertrand Delsart
>>>>>>>>>>>      Sent: den 2 november 2015 10:13
>>>>>>>>>>>      To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>>      hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>      Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>>>>> to get
>>>>>>>>>>>      hs_err print-out
>>>>>>>>>>>
>>>>>>>>>>>      On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>>>>       > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>>>>       >> Hi
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> I agree that there may be things we could add to
>>>>>>>>>>> VM.info
>>>>>>>>>>> that we
>>>>>>>>>>>       >> couldn't add in an hs_err file due to the fact that we
>>>>>>>>>>> are
>>>>>>>>>>> crashing.
>>>>>>>>>>>       >> However, I really do believe that we should use the
>>>>>>>>>>> same
>>>>>>>>>>> code.
>>>>>>>>>>> The
>>>>>>>>>>>       >> goal of the hs_err file is exactly the same as the
>>>>>>>>>>> goal of
>>>>>>>>>>> VM.info,
>>>>>>>>>>>       >> to print as much information about the system as
>>>>>>>>>>> possible so
>>>>>>>>>>> that
>>>>>>>>>>>       >> when we look at an hs_err file or the output from
>>>>>>>>>>> VM.info,
>>>>>>>>>>> we get as
>>>>>>>>>>>       >> much information about the system as possible. That way
>>>>>>>>>>> we can
>>>>>>>>>>>      reduce
>>>>>>>>>>>       >> the need to go back and forth with the customer to ask
>>>>>>>>>>> for more
>>>>>>>>>>>      information.
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> If we enumerate what we need in VM.info, the list
>>>>>>>>>>> would look
>>>>>>>>>>> very
>>>>>>>>>>>       >> much like what we already do in VMError, and
>>>>>>>>>>> therefore I
>>>>>>>>>>> think we
>>>>>>>>>>>       >> should use the same code. If we find something that we
>>>>>>>>>>> would
>>>>>>>>>>> want,
>>>>>>>>>>>       >> that is not currently printed in hs_err files, I
>>>>>>>>>>> think we
>>>>>>>>>>> have
>>>>>>>>>>>      two options:
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> 1) Add it to hs_err files as well. If it is
>>>>>>>>>>> interesting when
>>>>>>>>>>>       >> gathering customer information before a crash, it
>>>>>>>>>>> probably is
>>>>>>>>>>>       >> interesting after a crash
>>>>>>>>>>>       >> 2) If it for some reason cannot be captured when
>>>>>>>>>>> crashing,
>>>>>>>>>>> add it to
>>>>>>>>>>>       >> the code with the condition that we don't print it
>>>>>>>>>>> during
>>>>>>>>>>> crash
>>>>>>>>>>>      time.
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> I would therefore suggest the following plan of action
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> 1) Review and push this fix to see if there are any
>>>>>>>>>>> technical issues
>>>>>>>>>>>       >> with regards to thread safety, etc. I know David has
>>>>>>>>>>> had
>>>>>>>>>>> this in
>>>>>>>>>>>      mind
>>>>>>>>>>>       >> when doing the change. After that, we have a command
>>>>>>>>>>> that
>>>>>>>>>>>      support can
>>>>>>>>>>>       >> use.
>>>>>>>>>>>       >> 2) Think about if there is any information missing,
>>>>>>>>>>> both
>>>>>>>>>>> from hs_err
>>>>>>>>>>>       >> files and from VM.info. Here, we may want to ask
>>>>>>>>>>> support
>>>>>>>>>>> about if
>>>>>>>>>>>       >> there is any additional information they need to ask
>>>>>>>>>>> for
>>>>>>>>>>> when they
>>>>>>>>>>>       >> get this data. If possible, this information should be
>>>>>>>>>>> added
>>>>>>>>>>> to both
>>>>>>>>>>>       >> hs_err files and VM.info, we don't want support to
>>>>>>>>>>> have to
>>>>>>>>>>> ask for
>>>>>>>>>>>       >> VM.info if we already have an hs_err file. Of
>>>>>>>>>>> course, if
>>>>>>>>>>> there is
>>>>>>>>>>>       >> anything Dev would want to see when analyzing
>>>>>>>>>>> crashes in
>>>>>>>>>>> testing, we
>>>>>>>>>>>       >> should add that too.
>>>>>>>>>>>       >
>>>>>>>>>>>       > I have no issue with VMError and VM.Info using the same
>>>>>>>>>>> code
>>>>>>>>>>> (where
>>>>>>>>>>>       > appropriate), but I think it wrong to try and implement
>>>>>>>>>>> VM.Info
>>>>>>>>>>>       > directly using the actual VMError mechanism. It is
>>>>>>>>>>> crude and
>>>>>>>>>>> makes it
>>>>>>>>>>>       > difficult to make independent changes to either
>>>>>>>>>>> facility.
>>>>>>>>>>>       >
>>>>>>>>>>>       > Sorry.
>>>>>>>>>>>       >
>>>>>>>>>>>       > David
>>>>>>>>>>>
>>>>>>>>>>>      Furthermore, we cannot just "Review and push this fix to
>>>>>>>>>>> see if
>>>>>>>>>>>      there are any technical issues". Two reviewers have already
>>>>>>>>>>> stated
>>>>>>>>>>>      that there are issues and these issues must be addressed
>>>>>>>>>>> before
>>>>>>>>>>> pushing.
>>>>>>>>>>>
>>>>>>>>>>>      The answer cannot just be that we need this feature, it
>>>>>>>>>>> must
>>>>>>>>>>> address
>>>>>>>>>>>      the technical issues that were reported.
>>>>>>>>>>>
>>>>>>>>>>>      One possible answer is that we all agree that VMInfo
>>>>>>>>>>> would be
>>>>>>>>>>> unsafe.
>>>>>>>>>>>      The method would have to be renamed UnsafeVMInfo and it
>>>>>>>>>>> must be
>>>>>>>>>>>      clear that its 'impact' would be very risky, potentially
>>>>>>>>>>> crashing
>>>>>>>>>>>      the JVM.
>>>>>>>>>>>
>>>>>>>>>>>      If this is sufficient for your needs then we could agree
>>>>>>>>>>> with
>>>>>>>>>>> it. My
>>>>>>>>>>>      only additional requirement would be to clearly mark
>>>>>>>>>>> that the
>>>>>>>>>>> method
>>>>>>>>>>>      has been called so that we do not waste time on
>>>>>>>>>>> non-issues if
>>>>>>>>>>>      calling VMInfo leads to a crash (or, even worse but less
>>>>>>>>>>> likely, to
>>>>>>>>>>>      a silent data corruption that may lead to problems later).
>>>>>>>>>>> The bare
>>>>>>>>>>>      minimum would be to double check that a crash during an
>>>>>>>>>>> execution of
>>>>>>>>>>>      VM.info is clearly identifiable (this is worth double
>>>>>>>>>>> checking
>>>>>>>>>>>      because of the use of static buffers for the error
>>>>>>>>>>> reporting,
>>>>>>>>>>> see
>>>>>>>>>>>      below).
>>>>>>>>>>>
>>>>>>>>>>>      If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>>>>> wonder
>>>>>>>>>>>      whether a safe version of it is still a "small
>>>>>>>>>>> enhancement".
>>>>>>>>>>> A JEP
>>>>>>>>>>>      may be required.
>>>>>>>>>>>
>>>>>>>>>>>      VMError::report and the methods it calls (print_on_error)
>>>>>>>>>>> have been
>>>>>>>>>>>      written with assumptions clearly stated in the comments
>>>>>>>>>>> above
>>>>>>>>>>>      VMError::report (see below the two most important ones).
>>>>>>>>>>> HotSpot
>>>>>>>>>>>      error reporting strategy differed from JRockit. This
>>>>>>>>>>> explains
>>>>>>>>>>> why
>>>>>>>>>>>      the JRockit approach cannot be applied as is.
>>>>>>>>>>>
>>>>>>>>>>>      I agree with David that you can try leveraging some of the
>>>>>>>>>>> logic
>>>>>>>>>>>      from VMError::report. However, to be sure there is no
>>>>>>>>>>>      misunderstanding, let's be clearer about David's "(where
>>>>>>>>>>>      appropriate)": we would for instance have to review all the
>>>>>>>>>>> called
>>>>>>>>>>>      methods, check which ones purposefully ignore locking
>>>>>>>>>>> (potentially
>>>>>>>>>>>      crashing) and provide alternate code for the unsafe ones
>>>>>>>>>>> (which may
>>>>>>>>>>>      not be trivial in some cases). As a simple example, a
>>>>>>>>>>> method
>>>>>>>>>>> called
>>>>>>>>>>>      "print_on_error" should not be called when we are not in an
>>>>>>>>>>>      erroneous condition. However, in some components of the
>>>>>>>>>>> JVM, the
>>>>>>>>>>> low
>>>>>>>>>>>      level implementations of 'print_on_error' and
>>>>>>>>>>> 'print_info_safely'
>>>>>>>>>>>      could share the same code.
>>>>>>>>>>>
>>>>>>>>>>>      Regards,
>>>>>>>>>>>
>>>>>>>>>>>      Bertrand.
>>>>>>>>>>>
>>>>>>>>>>>      Here are the important comments about VMError::report
>>>>>>>>>>> assumptions:
>>>>>>>>>>>
>>>>>>>>>>>      // In general, a hang in error handler is much worse than a
>>>>>>>>>>> crash or
>>>>>>>>>>>      internal
>>>>>>>>>>>      // error, as it's harder to recover from a hang.
>>>>>>>>>>> Deadlock can
>>>>>>>>>>> happen
>>>>>>>>>>>      if we
>>>>>>>>>>>      // try to grab a lock that is already owned by current
>>>>>>>>>>> thread,
>>>>>>>>>>> or if the
>>>>>>>>>>>      // owner is blocked forever (e.g. in
>>>>>>>>>>> os::infinite_sleep()). If
>>>>>>>>>>>      possible, the
>>>>>>>>>>>      // error handler and all the functions it called should
>>>>>>>>>>> avoid
>>>>>>>>>>>      grabbing any
>>>>>>>>>>>      // lock. An important thing to notice is that memory
>>>>>>>>>>> allocation
>>>>>>>>>>> needs a
>>>>>>>>>>>      lock.
>>>>>>>>>>>      //
>>>>>>>>>>>      // We should avoid using large stack allocated buffers.
>>>>>>>>>>> Many
>>>>>>>>>>> errors
>>>>>>>>>>>      happen
>>>>>>>>>>>      // when stack space is already low. Making things even
>>>>>>>>>>> worse is
>>>>>>>>>>> that
>>>>>>>>>>>      there
>>>>>>>>>>>      // could be nested report_and_die() calls on stack (see
>>>>>>>>>>> above).
>>>>>>>>>>> Only one
>>>>>>>>>>>      // thread can report error, so large buffers are statically
>>>>>>>>>>> allocated in
>>>>>>>>>>>      data
>>>>>>>>>>>      // segment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>       >
>>>>>>>>>>>       >
>>>>>>>>>>>       >> Kind Regards
>>>>>>>>>>>       >> /Mattis
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> -----Original Message-----
>>>>>>>>>>>       >> From: David Holmes
>>>>>>>>>>>       >> Sent: den 29 oktober 2015 23:20
>>>>>>>>>>>       >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>>       >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>       >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>> VM.info to
>>>>>>>>>>> get
>>>>>>>>>>>       >> hs_err print-out
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>>>>       >>> Some background: We have this command in JRockit. The
>>>>>>>>>>>      information you
>>>>>>>>>>>       >>> gather when you crash to give a good summary of what
>>>>>>>>>>> system
>>>>>>>>>>> you run
>>>>>>>>>>>       >>> on is pretty much exactly the information you need to
>>>>>>>>>>> get a
>>>>>>>>>>> good
>>>>>>>>>>>       >>> summary on a system that has not crashed. The JRockit
>>>>>>>>>>> command is
>>>>>>>>>>>       >>> extremely useful for support, and saves a lot of work
>>>>>>>>>>> going
>>>>>>>>>>>      back and
>>>>>>>>>>>       >>> forth asking about system information.
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>> Also, if we write something new in the hs_err file,
>>>>>>>>>>> like if
>>>>>>>>>>>      there has
>>>>>>>>>>>       >>> been any out of memory errors, we often would want
>>>>>>>>>>> the same
>>>>>>>>>>>       >>> information in the VM.info output. From my
>>>>>>>>>>> experience in
>>>>>>>>>>>       >>> Sustaining/Support, I can't think of any
>>>>>>>>>>> information you
>>>>>>>>>>> would want
>>>>>>>>>>>       >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>>>>> file
>>>>>>>>>>> and the
>>>>>>>>>>>       >>> other way around, apart from details about the crash
>>>>>>>>>>> (obviously).
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>> I don't see a reason to exactly enumerate what
>>>>>>>>>>> information
>>>>>>>>>>> VM.info
>>>>>>>>>>>       >>> should provide. From a sustaining/support
>>>>>>>>>>> perspective, we
>>>>>>>>>>> want a
>>>>>>>>>>>       >>> one-stop command to gather as much useful
>>>>>>>>>>> information as
>>>>>>>>>>> possible,
>>>>>>>>>>>       >>> which is the same idea we have for the hs_err file.
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> The reason to enumerate what is required is to see
>>>>>>>>>>> where
>>>>>>>>>>> that
>>>>>>>>>>>       >> information already exists and can be collected
>>>>>>>>>>> from. The
>>>>>>>>>>> VMError
>>>>>>>>>>>       >> report has to be very careful about what it does and
>>>>>>>>>>> how it
>>>>>>>>>>> does it
>>>>>>>>>>>       >> because of the fact we may have crashed and the general
>>>>>>>>>>> process
>>>>>>>>>>>      state
>>>>>>>>>>>       >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>>>>> information is
>>>>>>>>>>>       >> required from the available sources in whatever
>>>>>>>>>>> order or
>>>>>>>>>>> format
>>>>>>>>>>>      desired.
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> I have no problem with this command, just how it has
>>>>>>>>>>> been
>>>>>>>>>>>      proposed to
>>>>>>>>>>>       >> implement it.
>>>>>>>>>>>       >>
>>>>>>>>>>>       >> David
>>>>>>>>>>>       >>
>>>>>>>>>>>       >>> Kind Regards
>>>>>>>>>>>       >>> /Mattis
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>> -----Original Message-----
>>>>>>>>>>>       >>> From: David Holmes
>>>>>>>>>>>       >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>>>>       >>> To: Coleen Phillimore;
>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>       >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>> VM.info
>>>>>>>>>>> to get
>>>>>>>>>>>       >>> hs_err print-out
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>>>>       >>>>
>>>>>>>>>>>       >>>>
>>>>>>>>>>>       >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>>>>       >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>       >>>>>> So I actually disagree.  I don't think there should
>>>>>>>>>>> be an
>>>>>>>>>>>       >>>>>> additional separate mechanism to get the same
>>>>>>>>>>> information
>>>>>>>>>>>      that we
>>>>>>>>>>>       >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>>>>> feature like
>>>>>>>>>>>       >>>>>> this for a long time.
>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>       >>>>>> I pre-reviewed this change and I thought it looked
>>>>>>>>>>> good in
>>>>>>>>>>>      general.
>>>>>>>>>>>       >>>>>> I didn't see the thread iteration problem that Fred
>>>>>>>>>>> refers to
>>>>>>>>>>>       >>>>>> below, but I think the individual problems can be
>>>>>>>>>>> fixed.
>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>       >>>>>> The last thing I want is this code to be copied
>>>>>>>>>>> somewhere else.
>>>>>>>>>>>       >>>>>
>>>>>>>>>>>       >>>>> Factored as needed not copied. VMError is not an
>>>>>>>>>>> "info"
>>>>>>>>>>> reporting
>>>>>>>>>>>       >>>>> mechanism.
>>>>>>>>>>>       >>>>
>>>>>>>>>>>       >>>> If you look at the things that are reported in each
>>>>>>>>>>> "STEP",
>>>>>>>>>>>      there's a
>>>>>>>>>>>       >>>> small amount of code and the order is important.
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>> The order is less important in an "info" request I
>>>>>>>>>>> would
>>>>>>>>>>> think.
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>>> I'd like the vm info to use the same order and report
>>>>>>>>>>> what it
>>>>>>>>>>>      can do
>>>>>>>>>>>       >>>> safely.   Refactoring 5 lines of code into functions
>>>>>>>>>>> doesn't
>>>>>>>>>>> make
>>>>>>>>>>>       >>>> sense.
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>> I need to consider exactly what it is the "info"
>>>>>>>>>>> needs to
>>>>>>>>>>> report in
>>>>>>>>>>>       >>> more detail. There are existing facilities (system
>>>>>>>>>>> properties,
>>>>>>>>>>>       >>> management
>>>>>>>>>>>       >>> APIs) for various bits of runtime information, which
>>>>>>>>>>> VMError can't
>>>>>>>>>>>       >>> utilize but a Dcmd can.
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>> David
>>>>>>>>>>>       >>>
>>>>>>>>>>>       >>>> Coleen
>>>>>>>>>>>       >>>>
>>>>>>>>>>>       >>>>>
>>>>>>>>>>>       >>>>> David
>>>>>>>>>>>       >>>>>
>>>>>>>>>>>       >>>>>> Coleen
>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>       >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>>>>       >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>>>>> should
>>>>>>>>>>> not be
>>>>>>>>>>>       >>>>>>> piggy-backed onto VMError handling for the reasons
>>>>>>>>>>> stated
>>>>>>>>>>>      (and the
>>>>>>>>>>>       >>>>>>> error handling logic is complicated enough as it
>>>>>>>>>>> is!). For
>>>>>>>>>>>      things
>>>>>>>>>>>       >>>>>>> like thread lists there are already safe
>>>>>>>>>>> management
>>>>>>>>>>>      functions that
>>>>>>>>>>>       >>>>>>> can/should be used.
>>>>>>>>>>>       >>>>>>>
>>>>>>>>>>>       >>>>>>> Thanks,
>>>>>>>>>>>       >>>>>>> David H.
>>>>>>>>>>>       >>>>>>>
>>>>>>>>>>>       >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>>>>       >>>>>>>> Hi David,
>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>       >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>>>>> concerned
>>>>>>>>>>> with the
>>>>>>>>>>>       >>>>>>>> fact that the diagnostic command is calling
>>>>>>>>>>> VMError::report().
>>>>>>>>>>>       >>>>>>>> This method has been implemented to be executed
>>>>>>>>>>> in the
>>>>>>>>>>>      particular
>>>>>>>>>>>       >>>>>>>> context of fatal errors, and its usage while the
>>>>>>>>>>> VM is
>>>>>>>>>>> running
>>>>>>>>>>>       >>>>>>>> normally seems dangerous.
>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>       >>>>>>>> For instance, VMError::report() consciously
>>>>>>>>>>> avoids
>>>>>>>>>>>      grabbing locks
>>>>>>>>>>>       >>>>>>>> because of the risk of deadlock during the error
>>>>>>>>>>> reporting.
>>>>>>>>>>>       >>>>>>>> The consequence is that some data structures are
>>>>>>>>>>> browsed in an
>>>>>>>>>>>       >>>>>>>> unsafe way. One example: VMError::report() calls
>>>>>>>>>>>       >>>>>>>> Threads::print_on_error() which iterates over the
>>>>>>>>>>> thread list
>>>>>>>>>>>       >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>       >>>>>>>> The implementation of the diagnostic command
>>>>>>>>>>> seems
>>>>>>>>>>> also to
>>>>>>>>>>>       >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>>>>> VMError::report()
>>>>>>>>>>>       >>>>>>>> method. Have you considered implementing a new
>>>>>>>>>>> MT-safe
>>>>>>>>>>>      reporting
>>>>>>>>>>>       >>>>>>>> method rather than trying to modify the special
>>>>>>>>>>>      VMError::report()
>>>>>>>>>>>       >>>>>>>> methods? (Note that some code factorization
>>>>>>>>>>> between
>>>>>>>>>>>       >>>>>>>> VMError::report() and this new method should be
>>>>>>>>>>> possible).
>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>       >>>>>>>> Thanks,
>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>       >>>>>>>> Fred
>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>       >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>>>>       >>>>>>>>> Hi!
>>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>>       >>>>>>>>> Please review my change for this small
>>>>>>>>>>> enhancement.
>>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>>       >>>>>>>>> bug:
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>>>>       >>>>>>>>> webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>>       >>>>>>>>> Cheers,
>>>>>>>>>>>       >>>>>>>>> -Buck
>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>       >>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      --
>>>>>>>>>>>      Bertrand Delsart, Grenoble Engineering
>>>>>>>>>>> Center
>>>>>>>>>>>      Oracle,         180 av. de l'Europe, ZIRST de
>>>>>>>>>>> Montbonnot
>>>>>>>>>>>      38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>>>>      bertrand.delsart at oracle.com
>>>>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>>>>                Phone : +33 4 76 18 81 23
>>>>>>>>>>>
>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>
>>>>>>>>>>>      NOTICE: This email message is for the sole use of the
>>>>>>>>>>> intended
>>>>>>>>>>>      recipient(s) and may contain confidential and privileged
>>>>>>>>>>>      information. Any unauthorized review, use, disclosure or
>>>>>>>>>>>      distribution is prohibited. If you are not the intended
>>>>>>>>>>> recipient,
>>>>>>>>>>>      please contact the sender by reply email and destroy all
>>>>>>>>>>> copies of
>>>>>>>>>>>      the original message.
>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>