RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)

Fri Nov 13 12:08:51 UTC 2015

On 13/11/2015 05:52, David Holmes wrote:
> Hi Coleen,
>
> On 13/11/2015 7:41 AM, Coleen Phillimore wrote:
>>
>> Hi,  I should learn my lesson that Frederic, Bertrand and David are
>> generally right.
>>
>> My restructuring went afoul of the STEP framework, and so I have made
>> VMError::print_vm_info() report the safe things that VMError::report()
>> reports.  Hopefully, this will address the concern about safety.   It
>> also allows us to remove the lock and static buffer in the VM.info
>> function.   The duplication isn't as bad as I thought and is a better
>> solution than piggy backing from VMError::report().
>>
>> Please review:
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>
> This looks good to me. A few small queries:
>
> 888   if (Universe::is_fully_initialized()) {
>
> I don't think it is possible to run a Dcmd before the VM is fully 
> initialized.

It is not possible to invoke a Dcmd from outside of the VM (using the
attachAPI or the PlatformMBeanServer) before the VM is fully initialized.
However it is still possible to invoke a Dcmd directly from VM code.
The Dcmd framework is initialized in Management::init(), which is called
before universe_post_init() is invoked. I don't know exactly the 
dependencies
of Universe::heap()->print_on_error(), but the test looks reasonable.

Fred

>
> 889     Universe::heap()->print_on_error(st);
>
> Is this safe? I'm not sure what it does - print heap memory regions ??
>
>  891     st->print_cr("Polling page: " INTPTR_FORMAT, 
> p2i(os::get_polling_page()));
>
> Is this useful ??
>
> Thanks,
> David
>
>> I tested with jcmd <pid> VM.info concurrently and with an inserted crash
>> during VM.info call to make sure the original hs_err_pid report comes
>> out okay.
>>
>> Thanks,
>> Coleen
>>
>> On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>>>
>>>
>>> On 11/10/15 7:12 PM, David Holmes wrote:
>>>> Hi Coleen,
>>>>
>>>> I still do not like the intertwining of the info and error code, but
>>>> I won't say more on that.
>>>
>>> Yes, I understand the approach has good arguments on both sides. I'm
>>> trying to find a compromise that can move this feature forward.
>>>>
>>>> However you may now have a concurrency problem:
>>>>
>>>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>>>  843 void VMError::report_info(outputStream* st) {
>>>>  844
>>>>  845   // don't allocate large buffer on stack
>>>>  846   static char buf[O_BUFLEN];
>>>>
>>>> Can you have concurrent info requests in flight? For the info case
>>>> the buffer could be allocated on the stack I think, or even
>>>> dynamically. (Aside: will this be executed in the service thread?)
>>>
>>> There is a lock added to protect the VM.info dcmd, so there would be
>>> only sequential access to the static buffer.  I don't think this is
>>> executed in the Service Thread.  I see no evidence of this anyway.
>>>
>>>>
>>>> report_process could possibly be refactored further as there's a lot
>>>> of initial stuff that won't apply to the info case:
>>>>  - we won't print all threads
>>>>  - the VM state will always be "not at safepoint"
>>>>  - The Universe must be initialized (not clear at what point during
>>>> shutdown a dcmd can still come in?)
>>>>  - we don't print owned locks
>>>
>>> Yes, I can do this.  It actually improves how this looks.
>>>
>>> Thanks,
>>> Coleen
>>>>
>>>> That's all from me.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've done some refactoring and cleanup on this which the refactoring
>>>>> enabled.   Again, it's important that the VM.info output be mostly 
>>>>> the
>>>>> same order as the hs_err_pid* report file so that sustaining can 
>>>>> easily
>>>>> match and find output that they are looking for.   It's also 
>>>>> important
>>>>> not to copy this code because there may be new information added
>>>>> that we
>>>>> also want in the VM.info output.  If any future information that we
>>>>> want
>>>>> to print seems unsafe, the vm_info_cmd can be used to exclude it 
>>>>> or it
>>>>> can be printed only in the error reporting function. Rejecting the
>>>>> entire change because of the theoretical unsafe addition of code 
>>>>> isn't
>>>>> the right thing to do, although yeah, I've argued the other way in 
>>>>> the
>>>>> past.  This is something that is helpful to sustaining and our
>>>>> licensees.
>>>>>
>>>>> In this patch, the functions that VM.info calls are safe while the 
>>>>> JVM
>>>>> is running, but the stepping is required because at crash time, they
>>>>> might not be.  Although I've never seen the error reporting crash in
>>>>> these places.
>>>>>
>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>
>>>>> This refactoring has been run through RBT.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>>>> Also note that some of the crash information is not interesting for
>>>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>>>> registers/stack data for any specific thread. If people want thread
>>>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>>>> system state.
>>>>>>
>>>>>> I'm a little surprised that thread info is not of interest, but 
>>>>>> as you
>>>>>> say that is available by other means  - which was part of my earlier
>>>>>> point about there already being safe means to get some of the info.
>>>>>>
>>>>>> If this is really only about the small subset of safe "system
>>>>>> information" that VMError reports, then I see even less 
>>>>>> motivation to
>>>>>> piggy-back VM-info onto VMerror and entangle the two. If each 
>>>>>> block of
>>>>>> information can be considered a single "step" in the error case then
>>>>>> we can simply factor out the actual printing parts and share 
>>>>>> them. If
>>>>>> the error case wants fine-grained steps then we can't do that.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> Kind Regards
>>>>>>> /Mattis
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Coleen Phillimore
>>>>>>> Sent: den 3 november 2015 16:25
>>>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>>>> hs_err print-out
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've done a bit of refactoring with this change and have tested it
>>>>>>> with and without intentional crashes to the VM.info reporting.  
>>>>>>> If it
>>>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>>>> where it comes from in the hs_err file.  And the hs_err_pid file is
>>>>>>> correctly produced.  If we can make the jcmd come with a warning 
>>>>>>> that
>>>>>>> it *could* potentially crash the vm, I think this feature is a good
>>>>>>> addition for sustaining.
>>>>>>>
>>>>>>> Without crashing, VM.info gives a lot of information that we 
>>>>>>> normally
>>>>>>> give with the hs_err file.  The reason for the careful STEP'ping in
>>>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>>>> count on anything.  The VM.info command is designed to be run on a
>>>>>>> moderately stable JVM so is less likely to crash on the things that
>>>>>>> we're printing out.
>>>>>>>
>>>>>>> I think we should fix the Threads walking code that Fred pointed 
>>>>>>> out
>>>>>>> to take the Threads_lock, but I don't think we should *copy* 
>>>>>>> code for
>>>>>>> VM.info or copy print_on_error functions.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>>>> Thanks Thomas,
>>>>>>>>>
>>>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>>>> if we
>>>>>>>>> are not careful.
>>>>>>>>>
>>>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious 
>>>>>>>>> source of
>>>>>>>>> potential crashes if you take into account the fact that the
>>>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>>>
>>>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>>>
>>>>>>>> I'm not sure of why we do not print the threads when there is 
>>>>>>>> notion
>>>>>>>> of current thread (and I would wonder whether we should print
>>>>>>>> them for
>>>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>>>> crash.
>>>>>>>>
>>>>>>>> However, I still do not like it from a design point of view for 
>>>>>>>> the
>>>>>>>> other reasons stated in my e-mail.
>>>>>>>>
>>>>>>>> Bertrand.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>>>> what
>>>>>>>>> it calls) has been designed.
>>>>>>>>>
>>>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient unless 
>>>>>>>>> you
>>>>>>>>> pass that argument to all layers. This is the biggest design 
>>>>>>>>> issue.
>>>>>>>>> For instance all calls to print_on_error could have this 
>>>>>>>>> additional
>>>>>>>>> argument so that the different components can adapt their 
>>>>>>>>> behavior
>>>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>>>> case,
>>>>>>>>> but the same if true for all methods called by VMError::report. I
>>>>>>>>> would go as far as adding a comment in all print_on_error methods
>>>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>>>> locks
>>>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>>>> VMError:report is supposed to skip.
>>>>>>>>> The current design, where the various components may not be aware
>>>>>>>>> that they are called for two different purposes, is not robust
>>>>>>>>> enough.
>>>>>>>>>
>>>>>>>>> Note also that being 'robust enough for the "crash" case' is 
>>>>>>>>> not a
>>>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>>>> of the
>>>>>>>>> info STEP crashes the VM instead of just bringing us to the next
>>>>>>>>> error reporting STEP. Now, as stated before, I could agree if we
>>>>>>>>> use
>>>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Bertrand
>>>>>>>>>
>>>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>>>
>>>>>>>>>> However, there are some things to keep in mind when doing this:
>>>>>>>>>>
>>>>>>>>>> Error handling is different between the "info-only" and "crash"
>>>>>>>>>> case. In
>>>>>>>>>> "info-only" case, VMError::report() is called without enclosing
>>>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>>>> handler
>>>>>>>>>> intact, not with the secondary signal handler. So, normal signal
>>>>>>>>>> handling works, which is good. But it also means that any error
>>>>>>>>>> inside one of the error reporting STEPs will bring down the VM,
>>>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>>>> ("Error
>>>>>>>>>> occurred during error reporting").
>>>>>>>>>>
>>>>>>>>>> The problem is that a number of error reporting steps are a bit
>>>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>>>> happen
>>>>>>>>>> is that the STEP will be interrupted. These risky steps 
>>>>>>>>>> should not
>>>>>>>>>> be executed in the "only-info" case.
>>>>>>>>>>
>>>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>>>> "_vm_info_cmd"
>>>>>>>>>> and conditionally excludes dangerous or pointless error 
>>>>>>>>>> reporting
>>>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>>>
>>>>>>>>>> As far as I can see, the other concerns concern only the "crash"
>>>>>>>>>> case and simply do not apply to the "info-only" case. If an 
>>>>>>>>>> error
>>>>>>>>>> reporting step is robust enough for the "crash" case (low stack
>>>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it 
>>>>>>>>>> should be
>>>>>>>>>> safe enough for the "info" case.
>>>>>>>>>>
>>>>>>>>>> Kind Regards, Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>>>> <mattis.castegren at oracle.com 
>>>>>>>>>> <mailto:mattis.castegren at oracle.com>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>      Hi
>>>>>>>>>>
>>>>>>>>>>      Of course, when I wrote "Review and push this fix" I meant
>>>>>>>>>> doing a
>>>>>>>>>>      thorough technical review. We do not want VM.info to be
>>>>>>>>>> unsafe, and
>>>>>>>>>>      I will let David reply to the technical questions that has
>>>>>>>>>> been
>>>>>>>>>>      raised. I know he did consider some of these issues, but 
>>>>>>>>>> last
>>>>>>>>>> week
>>>>>>>>>>      he was on Java one, and this week he is out. I will let him
>>>>>>>>>> comment
>>>>>>>>>>      once he gets back.
>>>>>>>>>>
>>>>>>>>>>      Kind Regards
>>>>>>>>>>      /Mattis
>>>>>>>>>>
>>>>>>>>>>      -----Original Message-----
>>>>>>>>>>      From: Bertrand Delsart
>>>>>>>>>>      Sent: den 2 november 2015 10:13
>>>>>>>>>>      To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>      hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>      Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>>>> to get
>>>>>>>>>>      hs_err print-out
>>>>>>>>>>
>>>>>>>>>>      On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>>>       > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>>>       >> Hi
>>>>>>>>>>       >>
>>>>>>>>>>       >> I agree that there may be things we could add to 
>>>>>>>>>> VM.info
>>>>>>>>>> that we
>>>>>>>>>>       >> couldn't add in an hs_err file due to the fact that we
>>>>>>>>>> are
>>>>>>>>>> crashing.
>>>>>>>>>>       >> However, I really do believe that we should use the 
>>>>>>>>>> same
>>>>>>>>>> code.
>>>>>>>>>> The
>>>>>>>>>>       >> goal of the hs_err file is exactly the same as the
>>>>>>>>>> goal of
>>>>>>>>>> VM.info,
>>>>>>>>>>       >> to print as much information about the system as
>>>>>>>>>> possible so
>>>>>>>>>> that
>>>>>>>>>>       >> when we look at an hs_err file or the output from
>>>>>>>>>> VM.info,
>>>>>>>>>> we get as
>>>>>>>>>>       >> much information about the system as possible. That way
>>>>>>>>>> we can
>>>>>>>>>>      reduce
>>>>>>>>>>       >> the need to go back and forth with the customer to ask
>>>>>>>>>> for more
>>>>>>>>>>      information.
>>>>>>>>>>       >>
>>>>>>>>>>       >> If we enumerate what we need in VM.info, the list
>>>>>>>>>> would look
>>>>>>>>>> very
>>>>>>>>>>       >> much like what we already do in VMError, and 
>>>>>>>>>> therefore I
>>>>>>>>>> think we
>>>>>>>>>>       >> should use the same code. If we find something that we
>>>>>>>>>> would
>>>>>>>>>> want,
>>>>>>>>>>       >> that is not currently printed in hs_err files, I 
>>>>>>>>>> think we
>>>>>>>>>> have
>>>>>>>>>>      two options:
>>>>>>>>>>       >>
>>>>>>>>>>       >> 1) Add it to hs_err files as well. If it is
>>>>>>>>>> interesting when
>>>>>>>>>>       >> gathering customer information before a crash, it
>>>>>>>>>> probably is
>>>>>>>>>>       >> interesting after a crash
>>>>>>>>>>       >> 2) If it for some reason cannot be captured when
>>>>>>>>>> crashing,
>>>>>>>>>> add it to
>>>>>>>>>>       >> the code with the condition that we don't print it 
>>>>>>>>>> during
>>>>>>>>>> crash
>>>>>>>>>>      time.
>>>>>>>>>>       >>
>>>>>>>>>>       >> I would therefore suggest the following plan of action
>>>>>>>>>>       >>
>>>>>>>>>>       >> 1) Review and push this fix to see if there are any
>>>>>>>>>> technical issues
>>>>>>>>>>       >> with regards to thread safety, etc. I know David has 
>>>>>>>>>> had
>>>>>>>>>> this in
>>>>>>>>>>      mind
>>>>>>>>>>       >> when doing the change. After that, we have a command 
>>>>>>>>>> that
>>>>>>>>>>      support can
>>>>>>>>>>       >> use.
>>>>>>>>>>       >> 2) Think about if there is any information missing, 
>>>>>>>>>> both
>>>>>>>>>> from hs_err
>>>>>>>>>>       >> files and from VM.info. Here, we may want to ask 
>>>>>>>>>> support
>>>>>>>>>> about if
>>>>>>>>>>       >> there is any additional information they need to ask 
>>>>>>>>>> for
>>>>>>>>>> when they
>>>>>>>>>>       >> get this data. If possible, this information should be
>>>>>>>>>> added
>>>>>>>>>> to both
>>>>>>>>>>       >> hs_err files and VM.info, we don't want support to
>>>>>>>>>> have to
>>>>>>>>>> ask for
>>>>>>>>>>       >> VM.info if we already have an hs_err file. Of 
>>>>>>>>>> course, if
>>>>>>>>>> there is
>>>>>>>>>>       >> anything Dev would want to see when analyzing 
>>>>>>>>>> crashes in
>>>>>>>>>> testing, we
>>>>>>>>>>       >> should add that too.
>>>>>>>>>>       >
>>>>>>>>>>       > I have no issue with VMError and VM.Info using the same
>>>>>>>>>> code
>>>>>>>>>> (where
>>>>>>>>>>       > appropriate), but I think it wrong to try and implement
>>>>>>>>>> VM.Info
>>>>>>>>>>       > directly using the actual VMError mechanism. It is
>>>>>>>>>> crude and
>>>>>>>>>> makes it
>>>>>>>>>>       > difficult to make independent changes to either 
>>>>>>>>>> facility.
>>>>>>>>>>       >
>>>>>>>>>>       > Sorry.
>>>>>>>>>>       >
>>>>>>>>>>       > David
>>>>>>>>>>
>>>>>>>>>>      Furthermore, we cannot just "Review and push this fix to
>>>>>>>>>> see if
>>>>>>>>>>      there are any technical issues". Two reviewers have already
>>>>>>>>>> stated
>>>>>>>>>>      that there are issues and these issues must be addressed
>>>>>>>>>> before
>>>>>>>>>> pushing.
>>>>>>>>>>
>>>>>>>>>>      The answer cannot just be that we need this feature, it 
>>>>>>>>>> must
>>>>>>>>>> address
>>>>>>>>>>      the technical issues that were reported.
>>>>>>>>>>
>>>>>>>>>>      One possible answer is that we all agree that VMInfo 
>>>>>>>>>> would be
>>>>>>>>>> unsafe.
>>>>>>>>>>      The method would have to be renamed UnsafeVMInfo and it
>>>>>>>>>> must be
>>>>>>>>>>      clear that its 'impact' would be very risky, potentially
>>>>>>>>>> crashing
>>>>>>>>>>      the JVM.
>>>>>>>>>>
>>>>>>>>>>      If this is sufficient for your needs then we could agree 
>>>>>>>>>> with
>>>>>>>>>> it. My
>>>>>>>>>>      only additional requirement would be to clearly mark 
>>>>>>>>>> that the
>>>>>>>>>> method
>>>>>>>>>>      has been called so that we do not waste time on 
>>>>>>>>>> non-issues if
>>>>>>>>>>      calling VMInfo leads to a crash (or, even worse but less
>>>>>>>>>> likely, to
>>>>>>>>>>      a silent data corruption that may lead to problems later).
>>>>>>>>>> The bare
>>>>>>>>>>      minimum would be to double check that a crash during an
>>>>>>>>>> execution of
>>>>>>>>>>      VM.info is clearly identifiable (this is worth double
>>>>>>>>>> checking
>>>>>>>>>>      because of the use of static buffers for the error 
>>>>>>>>>> reporting,
>>>>>>>>>> see
>>>>>>>>>>      below).
>>>>>>>>>>
>>>>>>>>>>      If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>>>> wonder
>>>>>>>>>>      whether a safe version of it is still a "small 
>>>>>>>>>> enhancement".
>>>>>>>>>> A JEP
>>>>>>>>>>      may be required.
>>>>>>>>>>
>>>>>>>>>>      VMError::report and the methods it calls (print_on_error)
>>>>>>>>>> have been
>>>>>>>>>>      written with assumptions clearly stated in the comments 
>>>>>>>>>> above
>>>>>>>>>>      VMError::report (see below the two most important ones).
>>>>>>>>>> HotSpot
>>>>>>>>>>      error reporting strategy differed from JRockit. This 
>>>>>>>>>> explains
>>>>>>>>>> why
>>>>>>>>>>      the JRockit approach cannot be applied as is.
>>>>>>>>>>
>>>>>>>>>>      I agree with David that you can try leveraging some of the
>>>>>>>>>> logic
>>>>>>>>>>      from VMError::report. However, to be sure there is no
>>>>>>>>>>      misunderstanding, let's be clearer about David's "(where
>>>>>>>>>>      appropriate)": we would for instance have to review all the
>>>>>>>>>> called
>>>>>>>>>>      methods, check which ones purposefully ignore locking
>>>>>>>>>> (potentially
>>>>>>>>>>      crashing) and provide alternate code for the unsafe ones
>>>>>>>>>> (which may
>>>>>>>>>>      not be trivial in some cases). As a simple example, a 
>>>>>>>>>> method
>>>>>>>>>> called
>>>>>>>>>>      "print_on_error" should not be called when we are not in an
>>>>>>>>>>      erroneous condition. However, in some components of the
>>>>>>>>>> JVM, the
>>>>>>>>>> low
>>>>>>>>>>      level implementations of 'print_on_error' and
>>>>>>>>>> 'print_info_safely'
>>>>>>>>>>      could share the same code.
>>>>>>>>>>
>>>>>>>>>>      Regards,
>>>>>>>>>>
>>>>>>>>>>      Bertrand.
>>>>>>>>>>
>>>>>>>>>>      Here are the important comments about VMError::report
>>>>>>>>>> assumptions:
>>>>>>>>>>
>>>>>>>>>>      // In general, a hang in error handler is much worse than a
>>>>>>>>>> crash or
>>>>>>>>>>      internal
>>>>>>>>>>      // error, as it's harder to recover from a hang. 
>>>>>>>>>> Deadlock can
>>>>>>>>>> happen
>>>>>>>>>>      if we
>>>>>>>>>>      // try to grab a lock that is already owned by current
>>>>>>>>>> thread,
>>>>>>>>>> or if the
>>>>>>>>>>      // owner is blocked forever (e.g. in
>>>>>>>>>> os::infinite_sleep()). If
>>>>>>>>>>      possible, the
>>>>>>>>>>      // error handler and all the functions it called should 
>>>>>>>>>> avoid
>>>>>>>>>>      grabbing any
>>>>>>>>>>      // lock. An important thing to notice is that memory
>>>>>>>>>> allocation
>>>>>>>>>> needs a
>>>>>>>>>>      lock.
>>>>>>>>>>      //
>>>>>>>>>>      // We should avoid using large stack allocated buffers. 
>>>>>>>>>> Many
>>>>>>>>>> errors
>>>>>>>>>>      happen
>>>>>>>>>>      // when stack space is already low. Making things even
>>>>>>>>>> worse is
>>>>>>>>>> that
>>>>>>>>>>      there
>>>>>>>>>>      // could be nested report_and_die() calls on stack (see
>>>>>>>>>> above).
>>>>>>>>>> Only one
>>>>>>>>>>      // thread can report error, so large buffers are statically
>>>>>>>>>> allocated in
>>>>>>>>>>      data
>>>>>>>>>>      // segment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>       >
>>>>>>>>>>       >
>>>>>>>>>>       >> Kind Regards
>>>>>>>>>>       >> /Mattis
>>>>>>>>>>       >>
>>>>>>>>>>       >> -----Original Message-----
>>>>>>>>>>       >> From: David Holmes
>>>>>>>>>>       >> Sent: den 29 oktober 2015 23:20
>>>>>>>>>>       >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>       >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>       >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>> VM.info to
>>>>>>>>>> get
>>>>>>>>>>       >> hs_err print-out
>>>>>>>>>>       >>
>>>>>>>>>>       >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>>>       >>> Some background: We have this command in JRockit. The
>>>>>>>>>>      information you
>>>>>>>>>>       >>> gather when you crash to give a good summary of what
>>>>>>>>>> system
>>>>>>>>>> you run
>>>>>>>>>>       >>> on is pretty much exactly the information you need to
>>>>>>>>>> get a
>>>>>>>>>> good
>>>>>>>>>>       >>> summary on a system that has not crashed. The JRockit
>>>>>>>>>> command is
>>>>>>>>>>       >>> extremely useful for support, and saves a lot of work
>>>>>>>>>> going
>>>>>>>>>>      back and
>>>>>>>>>>       >>> forth asking about system information.
>>>>>>>>>>       >>>
>>>>>>>>>>       >>> Also, if we write something new in the hs_err file,
>>>>>>>>>> like if
>>>>>>>>>>      there has
>>>>>>>>>>       >>> been any out of memory errors, we often would want
>>>>>>>>>> the same
>>>>>>>>>>       >>> information in the VM.info output. From my 
>>>>>>>>>> experience in
>>>>>>>>>>       >>> Sustaining/Support, I can't think of any 
>>>>>>>>>> information you
>>>>>>>>>> would want
>>>>>>>>>>       >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>>>> file
>>>>>>>>>> and the
>>>>>>>>>>       >>> other way around, apart from details about the crash
>>>>>>>>>> (obviously).
>>>>>>>>>>       >>>
>>>>>>>>>>       >>> I don't see a reason to exactly enumerate what
>>>>>>>>>> information
>>>>>>>>>> VM.info
>>>>>>>>>>       >>> should provide. From a sustaining/support
>>>>>>>>>> perspective, we
>>>>>>>>>> want a
>>>>>>>>>>       >>> one-stop command to gather as much useful 
>>>>>>>>>> information as
>>>>>>>>>> possible,
>>>>>>>>>>       >>> which is the same idea we have for the hs_err file.
>>>>>>>>>>       >>
>>>>>>>>>>       >> The reason to enumerate what is required is to see 
>>>>>>>>>> where
>>>>>>>>>> that
>>>>>>>>>>       >> information already exists and can be collected 
>>>>>>>>>> from. The
>>>>>>>>>> VMError
>>>>>>>>>>       >> report has to be very careful about what it does and
>>>>>>>>>> how it
>>>>>>>>>> does it
>>>>>>>>>>       >> because of the fact we may have crashed and the general
>>>>>>>>>> process
>>>>>>>>>>      state
>>>>>>>>>>       >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>>>> information is
>>>>>>>>>>       >> required from the available sources in whatever 
>>>>>>>>>> order or
>>>>>>>>>> format
>>>>>>>>>>      desired.
>>>>>>>>>>       >>
>>>>>>>>>>       >> I have no problem with this command, just how it has 
>>>>>>>>>> been
>>>>>>>>>>      proposed to
>>>>>>>>>>       >> implement it.
>>>>>>>>>>       >>
>>>>>>>>>>       >> David
>>>>>>>>>>       >>
>>>>>>>>>>       >>> Kind Regards
>>>>>>>>>>       >>> /Mattis
>>>>>>>>>>       >>>
>>>>>>>>>>       >>> -----Original Message-----
>>>>>>>>>>       >>> From: David Holmes
>>>>>>>>>>       >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>>>       >>> To: Coleen Phillimore;
>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>       >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>> VM.info
>>>>>>>>>> to get
>>>>>>>>>>       >>> hs_err print-out
>>>>>>>>>>       >>>
>>>>>>>>>>       >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>>>       >>>>
>>>>>>>>>>       >>>>
>>>>>>>>>>       >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>>>       >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>>>       >>>>>>
>>>>>>>>>>       >>>>>> So I actually disagree.  I don't think there should
>>>>>>>>>> be an
>>>>>>>>>>       >>>>>> additional separate mechanism to get the same
>>>>>>>>>> information
>>>>>>>>>>      that we
>>>>>>>>>>       >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>>>> feature like
>>>>>>>>>>       >>>>>> this for a long time.
>>>>>>>>>>       >>>>>>
>>>>>>>>>>       >>>>>> I pre-reviewed this change and I thought it looked
>>>>>>>>>> good in
>>>>>>>>>>      general.
>>>>>>>>>>       >>>>>> I didn't see the thread iteration problem that Fred
>>>>>>>>>> refers to
>>>>>>>>>>       >>>>>> below, but I think the individual problems can be
>>>>>>>>>> fixed.
>>>>>>>>>>       >>>>>>
>>>>>>>>>>       >>>>>> The last thing I want is this code to be copied
>>>>>>>>>> somewhere else.
>>>>>>>>>>       >>>>>
>>>>>>>>>>       >>>>> Factored as needed not copied. VMError is not an
>>>>>>>>>> "info"
>>>>>>>>>> reporting
>>>>>>>>>>       >>>>> mechanism.
>>>>>>>>>>       >>>>
>>>>>>>>>>       >>>> If you look at the things that are reported in each
>>>>>>>>>> "STEP",
>>>>>>>>>>      there's a
>>>>>>>>>>       >>>> small amount of code and the order is important.
>>>>>>>>>>       >>>
>>>>>>>>>>       >>> The order is less important in an "info" request I 
>>>>>>>>>> would
>>>>>>>>>> think.
>>>>>>>>>>       >>>
>>>>>>>>>>       >>>> I'd like the vm info to use the same order and report
>>>>>>>>>> what it
>>>>>>>>>>      can do
>>>>>>>>>>       >>>> safely.   Refactoring 5 lines of code into functions
>>>>>>>>>> doesn't
>>>>>>>>>> make
>>>>>>>>>>       >>>> sense.
>>>>>>>>>>       >>>
>>>>>>>>>>       >>> I need to consider exactly what it is the "info"
>>>>>>>>>> needs to
>>>>>>>>>> report in
>>>>>>>>>>       >>> more detail. There are existing facilities (system
>>>>>>>>>> properties,
>>>>>>>>>>       >>> management
>>>>>>>>>>       >>> APIs) for various bits of runtime information, which
>>>>>>>>>> VMError can't
>>>>>>>>>>       >>> utilize but a Dcmd can.
>>>>>>>>>>       >>>
>>>>>>>>>>       >>> David
>>>>>>>>>>       >>>
>>>>>>>>>>       >>>> Coleen
>>>>>>>>>>       >>>>
>>>>>>>>>>       >>>>>
>>>>>>>>>>       >>>>> David
>>>>>>>>>>       >>>>>
>>>>>>>>>>       >>>>>> Coleen
>>>>>>>>>>       >>>>>>
>>>>>>>>>>       >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>>>       >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>>>> should
>>>>>>>>>> not be
>>>>>>>>>>       >>>>>>> piggy-backed onto VMError handling for the reasons
>>>>>>>>>> stated
>>>>>>>>>>      (and the
>>>>>>>>>>       >>>>>>> error handling logic is complicated enough as it
>>>>>>>>>> is!). For
>>>>>>>>>>      things
>>>>>>>>>>       >>>>>>> like thread lists there are already safe 
>>>>>>>>>> management
>>>>>>>>>>      functions that
>>>>>>>>>>       >>>>>>> can/should be used.
>>>>>>>>>>       >>>>>>>
>>>>>>>>>>       >>>>>>> Thanks,
>>>>>>>>>>       >>>>>>> David H.
>>>>>>>>>>       >>>>>>>
>>>>>>>>>>       >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>>>       >>>>>>>> Hi David,
>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>       >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>>>> concerned
>>>>>>>>>> with the
>>>>>>>>>>       >>>>>>>> fact that the diagnostic command is calling
>>>>>>>>>> VMError::report().
>>>>>>>>>>       >>>>>>>> This method has been implemented to be executed
>>>>>>>>>> in the
>>>>>>>>>>      particular
>>>>>>>>>>       >>>>>>>> context of fatal errors, and its usage while the
>>>>>>>>>> VM is
>>>>>>>>>> running
>>>>>>>>>>       >>>>>>>> normally seems dangerous.
>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>       >>>>>>>> For instance, VMError::report() consciously 
>>>>>>>>>> avoids
>>>>>>>>>>      grabbing locks
>>>>>>>>>>       >>>>>>>> because of the risk of deadlock during the error
>>>>>>>>>> reporting.
>>>>>>>>>>       >>>>>>>> The consequence is that some data structures are
>>>>>>>>>> browsed in an
>>>>>>>>>>       >>>>>>>> unsafe way. One example: VMError::report() calls
>>>>>>>>>>       >>>>>>>> Threads::print_on_error() which iterates over the
>>>>>>>>>> thread list
>>>>>>>>>>       >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>       >>>>>>>> The implementation of the diagnostic command 
>>>>>>>>>> seems
>>>>>>>>>> also to
>>>>>>>>>>       >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>>>> VMError::report()
>>>>>>>>>>       >>>>>>>> method. Have you considered implementing a new
>>>>>>>>>> MT-safe
>>>>>>>>>>      reporting
>>>>>>>>>>       >>>>>>>> method rather than trying to modify the special
>>>>>>>>>>      VMError::report()
>>>>>>>>>>       >>>>>>>> methods? (Note that some code factorization 
>>>>>>>>>> between
>>>>>>>>>>       >>>>>>>> VMError::report() and this new method should be
>>>>>>>>>> possible).
>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>       >>>>>>>> Thanks,
>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>       >>>>>>>> Fred
>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>       >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>>>       >>>>>>>>> Hi!
>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>       >>>>>>>>> Please review my change for this small
>>>>>>>>>> enhancement.
>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>       >>>>>>>>> bug:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>>>       >>>>>>>>> webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>       >>>>>>>>> Cheers,
>>>>>>>>>>       >>>>>>>>> -Buck
>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>       >>>>>>
>>>>>>>>>>       >>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      --
>>>>>>>>>>      Bertrand Delsart, Grenoble Engineering
>>>>>>>>>> Center
>>>>>>>>>>      Oracle,         180 av. de l'Europe, ZIRST de
>>>>>>>>>> Montbonnot
>>>>>>>>>>      38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>>>      bertrand.delsart at oracle.com
>>>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>>>                Phone : +33 4 76 18 81 23
>>>>>>>>>>
>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>>>>>>>>
>>>>>>>>>>      NOTICE: This email message is for the sole use of the
>>>>>>>>>> intended
>>>>>>>>>>      recipient(s) and may contain confidential and privileged
>>>>>>>>>>      information. Any unauthorized review, use, disclosure or
>>>>>>>>>>      distribution is prohibited. If you are not the intended
>>>>>>>>>> recipient,
>>>>>>>>>>      please contact the sender by reply email and destroy all
>>>>>>>>>> copies of
>>>>>>>>>>      the original message.
>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>

-- 
Frederic Parain - Oracle
Grenoble Engineering Center - France
Phone: +33 4 76 18 81 17
Email: Frederic.Parain at oracle.com