RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)

David Holmes david.holmes at oracle.com
Fri Nov 13 04:52:11 UTC 2015


Hi Coleen,

On 13/11/2015 7:41 AM, Coleen Phillimore wrote:
>
> Hi,  I should learn my lesson that Frederic, Bertrand and David are
> generally right.
>
> My restructuring went afoul of the STEP framework, and so I have made
> VMError::print_vm_info() report the safe things that VMError::report()
> reports.  Hopefully, this will address the concern about safety.   It
> also allows us to remove the lock and static buffer in the VM.info
> function.   The duplication isn't as bad as I thought and is a better
> solution than piggy backing from VMError::report().
>
> Please review:
>
> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
> bug link https://bugs.openjdk.java.net/browse/JDK-8027429

This looks good to me. A few small queries:

888   if (Universe::is_fully_initialized()) {

I don't think it is possible to run a Dcmd before the VM is fully 
initialized.

889     Universe::heap()->print_on_error(st);

Is this safe? I'm not sure what it does - print heap memory regions ??

  891     st->print_cr("Polling page: " INTPTR_FORMAT, 
p2i(os::get_polling_page()));

Is this useful ??

Thanks,
David

> I tested with jcmd <pid> VM.info concurrently and with an inserted crash
> during VM.info call to make sure the original hs_err_pid report comes
> out okay.
>
> Thanks,
> Coleen
>
> On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>>
>>
>> On 11/10/15 7:12 PM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> I still do not like the intertwining of the info and error code, but
>>> I won't say more on that.
>>
>> Yes, I understand the approach has good arguments on both sides. I'm
>> trying to find a compromise that can move this feature forward.
>>>
>>> However you may now have a concurrency problem:
>>>
>>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>>  843 void VMError::report_info(outputStream* st) {
>>>  844
>>>  845   // don't allocate large buffer on stack
>>>  846   static char buf[O_BUFLEN];
>>>
>>> Can you have concurrent info requests in flight? For the info case
>>> the buffer could be allocated on the stack I think, or even
>>> dynamically. (Aside: will this be executed in the service thread?)
>>
>> There is a lock added to protect the VM.info dcmd, so there would be
>> only sequential access to the static buffer.  I don't think this is
>> executed in the Service Thread.  I see no evidence of this anyway.
>>
>>>
>>> report_process could possibly be refactored further as there's a lot
>>> of initial stuff that won't apply to the info case:
>>>  - we won't print all threads
>>>  - the VM state will always be "not at safepoint"
>>>  - The Universe must be initialized (not clear at what point during
>>> shutdown a dcmd can still come in?)
>>>  - we don't print owned locks
>>
>> Yes, I can do this.  It actually improves how this looks.
>>
>> Thanks,
>> Coleen
>>>
>>> That's all from me.
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>>
>>>> Hi,
>>>>
>>>> I've done some refactoring and cleanup on this which the refactoring
>>>> enabled.   Again, it's important that the VM.info output be mostly the
>>>> same order as the hs_err_pid* report file so that sustaining can easily
>>>> match and find output that they are looking for.   It's also important
>>>> not to copy this code because there may be new information added
>>>> that we
>>>> also want in the VM.info output.  If any future information that we
>>>> want
>>>> to print seems unsafe, the vm_info_cmd can be used to exclude it or it
>>>> can be printed only in the error reporting function. Rejecting the
>>>> entire change because of the theoretical unsafe addition of code isn't
>>>> the right thing to do, although yeah, I've argued the other way in the
>>>> past.  This is something that is helpful to sustaining and our
>>>> licensees.
>>>>
>>>> In this patch, the functions that VM.info calls are safe while the JVM
>>>> is running, but the stepping is required because at crash time, they
>>>> might not be.  Although I've never seen the error reporting crash in
>>>> these places.
>>>>
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>
>>>> This refactoring has been run through RBT.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>>> Also note that some of the crash information is not interesting for
>>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>>> registers/stack data for any specific thread. If people want thread
>>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>>> system state.
>>>>>
>>>>> I'm a little surprised that thread info is not of interest, but as you
>>>>> say that is available by other means  - which was part of my earlier
>>>>> point about there already being safe means to get some of the info.
>>>>>
>>>>> If this is really only about the small subset of safe "system
>>>>> information" that VMError reports, then I see even less motivation to
>>>>> piggy-back VM-info onto VMerror and entangle the two. If each block of
>>>>> information can be considered a single "step" in the error case then
>>>>> we can simply factor out the actual printing parts and share them. If
>>>>> the error case wants fine-grained steps then we can't do that.
>>>>>
>>>>> David
>>>>>
>>>>>> Kind Regards
>>>>>> /Mattis
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Coleen Phillimore
>>>>>> Sent: den 3 november 2015 16:25
>>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>>> hs_err print-out
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've done a bit of refactoring with this change and have tested it
>>>>>> with and without intentional crashes to the VM.info reporting.  If it
>>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>>> where it comes from in the hs_err file.  And the hs_err_pid file is
>>>>>> correctly produced.  If we can make the jcmd come with a warning that
>>>>>> it *could* potentially crash the vm, I think this feature is a good
>>>>>> addition for sustaining.
>>>>>>
>>>>>> Without crashing, VM.info gives a lot of information that we normally
>>>>>> give with the hs_err file.  The reason for the careful STEP'ping in
>>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>>> count on anything.  The VM.info command is designed to be run on a
>>>>>> moderately stable JVM so is less likely to crash on the things that
>>>>>> we're printing out.
>>>>>>
>>>>>> I think we should fix the Threads walking code that Fred pointed out
>>>>>> to take the Threads_lock, but I don't think we should *copy* code for
>>>>>> VM.info or copy print_on_error functions.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>>> Thanks Thomas,
>>>>>>>>
>>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>>> if we
>>>>>>>> are not careful.
>>>>>>>>
>>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious source of
>>>>>>>> potential crashes if you take into account the fact that the
>>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>>
>>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>>
>>>>>>> I'm not sure of why we do not print the threads when there is notion
>>>>>>> of current thread (and I would wonder whether we should print
>>>>>>> them for
>>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>>> crash.
>>>>>>>
>>>>>>> However, I still do not like it from a design point of view for the
>>>>>>> other reasons stated in my e-mail.
>>>>>>>
>>>>>>> Bertrand.
>>>>>>>
>>>>>>>>
>>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>>> what
>>>>>>>> it calls) has been designed.
>>>>>>>>
>>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient unless you
>>>>>>>> pass that argument to all layers. This is the biggest design issue.
>>>>>>>> For instance all calls to print_on_error could have this additional
>>>>>>>> argument so that the different components can adapt their behavior
>>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>>> case,
>>>>>>>> but the same if true for all methods called by VMError::report. I
>>>>>>>> would go as far as adding a comment in all print_on_error methods
>>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>>> locks
>>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>>> VMError:report is supposed to skip.
>>>>>>>> The current design, where the various components may not be aware
>>>>>>>> that they are called for two different purposes, is not robust
>>>>>>>> enough.
>>>>>>>>
>>>>>>>> Note also that being 'robust enough for the "crash" case' is not a
>>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>>> of the
>>>>>>>> info STEP crashes the VM instead of just bringing us to the next
>>>>>>>> error reporting STEP. Now, as stated before, I could agree if we
>>>>>>>> use
>>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Bertrand
>>>>>>>>
>>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>>
>>>>>>>>> However, there are some things to keep in mind when doing this:
>>>>>>>>>
>>>>>>>>> Error handling is different between the "info-only" and "crash"
>>>>>>>>> case. In
>>>>>>>>> "info-only" case, VMError::report() is called without enclosing
>>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>>> handler
>>>>>>>>> intact, not with the secondary signal handler. So, normal signal
>>>>>>>>> handling works, which is good. But it also means that any error
>>>>>>>>> inside one of the error reporting STEPs will bring down the VM,
>>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>>> ("Error
>>>>>>>>> occurred during error reporting").
>>>>>>>>>
>>>>>>>>> The problem is that a number of error reporting steps are a bit
>>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>>> happen
>>>>>>>>> is that the STEP will be interrupted. These risky steps should not
>>>>>>>>> be executed in the "only-info" case.
>>>>>>>>>
>>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>>> "_vm_info_cmd"
>>>>>>>>> and conditionally excludes dangerous or pointless error reporting
>>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>>
>>>>>>>>> As far as I can see, the other concerns concern only the "crash"
>>>>>>>>> case and simply do not apply to the "info-only" case. If an error
>>>>>>>>> reporting step is robust enough for the "crash" case (low stack
>>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it should be
>>>>>>>>> safe enough for the "info" case.
>>>>>>>>>
>>>>>>>>> Kind Regards, Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>>> <mattis.castegren at oracle.com <mailto:mattis.castegren at oracle.com>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>      Hi
>>>>>>>>>
>>>>>>>>>      Of course, when I wrote "Review and push this fix" I meant
>>>>>>>>> doing a
>>>>>>>>>      thorough technical review. We do not want VM.info to be
>>>>>>>>> unsafe, and
>>>>>>>>>      I will let David reply to the technical questions that has
>>>>>>>>> been
>>>>>>>>>      raised. I know he did consider some of these issues, but last
>>>>>>>>> week
>>>>>>>>>      he was on Java one, and this week he is out. I will let him
>>>>>>>>> comment
>>>>>>>>>      once he gets back.
>>>>>>>>>
>>>>>>>>>      Kind Regards
>>>>>>>>>      /Mattis
>>>>>>>>>
>>>>>>>>>      -----Original Message-----
>>>>>>>>>      From: Bertrand Delsart
>>>>>>>>>      Sent: den 2 november 2015 10:13
>>>>>>>>>      To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>>      hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>      Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>>> to get
>>>>>>>>>      hs_err print-out
>>>>>>>>>
>>>>>>>>>      On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>>       > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>>       >> Hi
>>>>>>>>>       >>
>>>>>>>>>       >> I agree that there may be things we could add to VM.info
>>>>>>>>> that we
>>>>>>>>>       >> couldn't add in an hs_err file due to the fact that we
>>>>>>>>> are
>>>>>>>>> crashing.
>>>>>>>>>       >> However, I really do believe that we should use the same
>>>>>>>>> code.
>>>>>>>>> The
>>>>>>>>>       >> goal of the hs_err file is exactly the same as the
>>>>>>>>> goal of
>>>>>>>>> VM.info,
>>>>>>>>>       >> to print as much information about the system as
>>>>>>>>> possible so
>>>>>>>>> that
>>>>>>>>>       >> when we look at an hs_err file or the output from
>>>>>>>>> VM.info,
>>>>>>>>> we get as
>>>>>>>>>       >> much information about the system as possible. That way
>>>>>>>>> we can
>>>>>>>>>      reduce
>>>>>>>>>       >> the need to go back and forth with the customer to ask
>>>>>>>>> for more
>>>>>>>>>      information.
>>>>>>>>>       >>
>>>>>>>>>       >> If we enumerate what we need in VM.info, the list
>>>>>>>>> would look
>>>>>>>>> very
>>>>>>>>>       >> much like what we already do in VMError, and therefore I
>>>>>>>>> think we
>>>>>>>>>       >> should use the same code. If we find something that we
>>>>>>>>> would
>>>>>>>>> want,
>>>>>>>>>       >> that is not currently printed in hs_err files, I think we
>>>>>>>>> have
>>>>>>>>>      two options:
>>>>>>>>>       >>
>>>>>>>>>       >> 1) Add it to hs_err files as well. If it is
>>>>>>>>> interesting when
>>>>>>>>>       >> gathering customer information before a crash, it
>>>>>>>>> probably is
>>>>>>>>>       >> interesting after a crash
>>>>>>>>>       >> 2) If it for some reason cannot be captured when
>>>>>>>>> crashing,
>>>>>>>>> add it to
>>>>>>>>>       >> the code with the condition that we don't print it during
>>>>>>>>> crash
>>>>>>>>>      time.
>>>>>>>>>       >>
>>>>>>>>>       >> I would therefore suggest the following plan of action
>>>>>>>>>       >>
>>>>>>>>>       >> 1) Review and push this fix to see if there are any
>>>>>>>>> technical issues
>>>>>>>>>       >> with regards to thread safety, etc. I know David has had
>>>>>>>>> this in
>>>>>>>>>      mind
>>>>>>>>>       >> when doing the change. After that, we have a command that
>>>>>>>>>      support can
>>>>>>>>>       >> use.
>>>>>>>>>       >> 2) Think about if there is any information missing, both
>>>>>>>>> from hs_err
>>>>>>>>>       >> files and from VM.info. Here, we may want to ask support
>>>>>>>>> about if
>>>>>>>>>       >> there is any additional information they need to ask for
>>>>>>>>> when they
>>>>>>>>>       >> get this data. If possible, this information should be
>>>>>>>>> added
>>>>>>>>> to both
>>>>>>>>>       >> hs_err files and VM.info, we don't want support to
>>>>>>>>> have to
>>>>>>>>> ask for
>>>>>>>>>       >> VM.info if we already have an hs_err file. Of course, if
>>>>>>>>> there is
>>>>>>>>>       >> anything Dev would want to see when analyzing crashes in
>>>>>>>>> testing, we
>>>>>>>>>       >> should add that too.
>>>>>>>>>       >
>>>>>>>>>       > I have no issue with VMError and VM.Info using the same
>>>>>>>>> code
>>>>>>>>> (where
>>>>>>>>>       > appropriate), but I think it wrong to try and implement
>>>>>>>>> VM.Info
>>>>>>>>>       > directly using the actual VMError mechanism. It is
>>>>>>>>> crude and
>>>>>>>>> makes it
>>>>>>>>>       > difficult to make independent changes to either facility.
>>>>>>>>>       >
>>>>>>>>>       > Sorry.
>>>>>>>>>       >
>>>>>>>>>       > David
>>>>>>>>>
>>>>>>>>>      Furthermore, we cannot just "Review and push this fix to
>>>>>>>>> see if
>>>>>>>>>      there are any technical issues". Two reviewers have already
>>>>>>>>> stated
>>>>>>>>>      that there are issues and these issues must be addressed
>>>>>>>>> before
>>>>>>>>> pushing.
>>>>>>>>>
>>>>>>>>>      The answer cannot just be that we need this feature, it must
>>>>>>>>> address
>>>>>>>>>      the technical issues that were reported.
>>>>>>>>>
>>>>>>>>>      One possible answer is that we all agree that VMInfo would be
>>>>>>>>> unsafe.
>>>>>>>>>      The method would have to be renamed UnsafeVMInfo and it
>>>>>>>>> must be
>>>>>>>>>      clear that its 'impact' would be very risky, potentially
>>>>>>>>> crashing
>>>>>>>>>      the JVM.
>>>>>>>>>
>>>>>>>>>      If this is sufficient for your needs then we could agree with
>>>>>>>>> it. My
>>>>>>>>>      only additional requirement would be to clearly mark that the
>>>>>>>>> method
>>>>>>>>>      has been called so that we do not waste time on non-issues if
>>>>>>>>>      calling VMInfo leads to a crash (or, even worse but less
>>>>>>>>> likely, to
>>>>>>>>>      a silent data corruption that may lead to problems later).
>>>>>>>>> The bare
>>>>>>>>>      minimum would be to double check that a crash during an
>>>>>>>>> execution of
>>>>>>>>>      VM.info is clearly identifiable (this is worth double
>>>>>>>>> checking
>>>>>>>>>      because of the use of static buffers for the error reporting,
>>>>>>>>> see
>>>>>>>>>      below).
>>>>>>>>>
>>>>>>>>>      If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>>> wonder
>>>>>>>>>      whether a safe version of it is still a "small enhancement".
>>>>>>>>> A JEP
>>>>>>>>>      may be required.
>>>>>>>>>
>>>>>>>>>      VMError::report and the methods it calls (print_on_error)
>>>>>>>>> have been
>>>>>>>>>      written with assumptions clearly stated in the comments above
>>>>>>>>>      VMError::report (see below the two most important ones).
>>>>>>>>> HotSpot
>>>>>>>>>      error reporting strategy differed from JRockit. This explains
>>>>>>>>> why
>>>>>>>>>      the JRockit approach cannot be applied as is.
>>>>>>>>>
>>>>>>>>>      I agree with David that you can try leveraging some of the
>>>>>>>>> logic
>>>>>>>>>      from VMError::report. However, to be sure there is no
>>>>>>>>>      misunderstanding, let's be clearer about David's "(where
>>>>>>>>>      appropriate)": we would for instance have to review all the
>>>>>>>>> called
>>>>>>>>>      methods, check which ones purposefully ignore locking
>>>>>>>>> (potentially
>>>>>>>>>      crashing) and provide alternate code for the unsafe ones
>>>>>>>>> (which may
>>>>>>>>>      not be trivial in some cases). As a simple example, a method
>>>>>>>>> called
>>>>>>>>>      "print_on_error" should not be called when we are not in an
>>>>>>>>>      erroneous condition. However, in some components of the
>>>>>>>>> JVM, the
>>>>>>>>> low
>>>>>>>>>      level implementations of 'print_on_error' and
>>>>>>>>> 'print_info_safely'
>>>>>>>>>      could share the same code.
>>>>>>>>>
>>>>>>>>>      Regards,
>>>>>>>>>
>>>>>>>>>      Bertrand.
>>>>>>>>>
>>>>>>>>>      Here are the important comments about VMError::report
>>>>>>>>> assumptions:
>>>>>>>>>
>>>>>>>>>      // In general, a hang in error handler is much worse than a
>>>>>>>>> crash or
>>>>>>>>>      internal
>>>>>>>>>      // error, as it's harder to recover from a hang. Deadlock can
>>>>>>>>> happen
>>>>>>>>>      if we
>>>>>>>>>      // try to grab a lock that is already owned by current
>>>>>>>>> thread,
>>>>>>>>> or if the
>>>>>>>>>      // owner is blocked forever (e.g. in
>>>>>>>>> os::infinite_sleep()). If
>>>>>>>>>      possible, the
>>>>>>>>>      // error handler and all the functions it called should avoid
>>>>>>>>>      grabbing any
>>>>>>>>>      // lock. An important thing to notice is that memory
>>>>>>>>> allocation
>>>>>>>>> needs a
>>>>>>>>>      lock.
>>>>>>>>>      //
>>>>>>>>>      // We should avoid using large stack allocated buffers. Many
>>>>>>>>> errors
>>>>>>>>>      happen
>>>>>>>>>      // when stack space is already low. Making things even
>>>>>>>>> worse is
>>>>>>>>> that
>>>>>>>>>      there
>>>>>>>>>      // could be nested report_and_die() calls on stack (see
>>>>>>>>> above).
>>>>>>>>> Only one
>>>>>>>>>      // thread can report error, so large buffers are statically
>>>>>>>>> allocated in
>>>>>>>>>      data
>>>>>>>>>      // segment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       >
>>>>>>>>>       >
>>>>>>>>>       >> Kind Regards
>>>>>>>>>       >> /Mattis
>>>>>>>>>       >>
>>>>>>>>>       >> -----Original Message-----
>>>>>>>>>       >> From: David Holmes
>>>>>>>>>       >> Sent: den 29 oktober 2015 23:20
>>>>>>>>>       >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>>       >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>       >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>> VM.info to
>>>>>>>>> get
>>>>>>>>>       >> hs_err print-out
>>>>>>>>>       >>
>>>>>>>>>       >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>>       >>> Some background: We have this command in JRockit. The
>>>>>>>>>      information you
>>>>>>>>>       >>> gather when you crash to give a good summary of what
>>>>>>>>> system
>>>>>>>>> you run
>>>>>>>>>       >>> on is pretty much exactly the information you need to
>>>>>>>>> get a
>>>>>>>>> good
>>>>>>>>>       >>> summary on a system that has not crashed. The JRockit
>>>>>>>>> command is
>>>>>>>>>       >>> extremely useful for support, and saves a lot of work
>>>>>>>>> going
>>>>>>>>>      back and
>>>>>>>>>       >>> forth asking about system information.
>>>>>>>>>       >>>
>>>>>>>>>       >>> Also, if we write something new in the hs_err file,
>>>>>>>>> like if
>>>>>>>>>      there has
>>>>>>>>>       >>> been any out of memory errors, we often would want
>>>>>>>>> the same
>>>>>>>>>       >>> information in the VM.info output. From my experience in
>>>>>>>>>       >>> Sustaining/Support, I can't think of any information you
>>>>>>>>> would want
>>>>>>>>>       >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>>> file
>>>>>>>>> and the
>>>>>>>>>       >>> other way around, apart from details about the crash
>>>>>>>>> (obviously).
>>>>>>>>>       >>>
>>>>>>>>>       >>> I don't see a reason to exactly enumerate what
>>>>>>>>> information
>>>>>>>>> VM.info
>>>>>>>>>       >>> should provide. From a sustaining/support
>>>>>>>>> perspective, we
>>>>>>>>> want a
>>>>>>>>>       >>> one-stop command to gather as much useful information as
>>>>>>>>> possible,
>>>>>>>>>       >>> which is the same idea we have for the hs_err file.
>>>>>>>>>       >>
>>>>>>>>>       >> The reason to enumerate what is required is to see where
>>>>>>>>> that
>>>>>>>>>       >> information already exists and can be collected from. The
>>>>>>>>> VMError
>>>>>>>>>       >> report has to be very careful about what it does and
>>>>>>>>> how it
>>>>>>>>> does it
>>>>>>>>>       >> because of the fact we may have crashed and the general
>>>>>>>>> process
>>>>>>>>>      state
>>>>>>>>>       >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>>> information is
>>>>>>>>>       >> required from the available sources in whatever order or
>>>>>>>>> format
>>>>>>>>>      desired.
>>>>>>>>>       >>
>>>>>>>>>       >> I have no problem with this command, just how it has been
>>>>>>>>>      proposed to
>>>>>>>>>       >> implement it.
>>>>>>>>>       >>
>>>>>>>>>       >> David
>>>>>>>>>       >>
>>>>>>>>>       >>> Kind Regards
>>>>>>>>>       >>> /Mattis
>>>>>>>>>       >>>
>>>>>>>>>       >>> -----Original Message-----
>>>>>>>>>       >>> From: David Holmes
>>>>>>>>>       >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>>       >>> To: Coleen Phillimore;
>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>       >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>> VM.info
>>>>>>>>> to get
>>>>>>>>>       >>> hs_err print-out
>>>>>>>>>       >>>
>>>>>>>>>       >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>>       >>>>
>>>>>>>>>       >>>>
>>>>>>>>>       >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>>       >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>>       >>>>>>
>>>>>>>>>       >>>>>> So I actually disagree.  I don't think there should
>>>>>>>>> be an
>>>>>>>>>       >>>>>> additional separate mechanism to get the same
>>>>>>>>> information
>>>>>>>>>      that we
>>>>>>>>>       >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>>> feature like
>>>>>>>>>       >>>>>> this for a long time.
>>>>>>>>>       >>>>>>
>>>>>>>>>       >>>>>> I pre-reviewed this change and I thought it looked
>>>>>>>>> good in
>>>>>>>>>      general.
>>>>>>>>>       >>>>>> I didn't see the thread iteration problem that Fred
>>>>>>>>> refers to
>>>>>>>>>       >>>>>> below, but I think the individual problems can be
>>>>>>>>> fixed.
>>>>>>>>>       >>>>>>
>>>>>>>>>       >>>>>> The last thing I want is this code to be copied
>>>>>>>>> somewhere else.
>>>>>>>>>       >>>>>
>>>>>>>>>       >>>>> Factored as needed not copied. VMError is not an
>>>>>>>>> "info"
>>>>>>>>> reporting
>>>>>>>>>       >>>>> mechanism.
>>>>>>>>>       >>>>
>>>>>>>>>       >>>> If you look at the things that are reported in each
>>>>>>>>> "STEP",
>>>>>>>>>      there's a
>>>>>>>>>       >>>> small amount of code and the order is important.
>>>>>>>>>       >>>
>>>>>>>>>       >>> The order is less important in an "info" request I would
>>>>>>>>> think.
>>>>>>>>>       >>>
>>>>>>>>>       >>>> I'd like the vm info to use the same order and report
>>>>>>>>> what it
>>>>>>>>>      can do
>>>>>>>>>       >>>> safely.   Refactoring 5 lines of code into functions
>>>>>>>>> doesn't
>>>>>>>>> make
>>>>>>>>>       >>>> sense.
>>>>>>>>>       >>>
>>>>>>>>>       >>> I need to consider exactly what it is the "info"
>>>>>>>>> needs to
>>>>>>>>> report in
>>>>>>>>>       >>> more detail. There are existing facilities (system
>>>>>>>>> properties,
>>>>>>>>>       >>> management
>>>>>>>>>       >>> APIs) for various bits of runtime information, which
>>>>>>>>> VMError can't
>>>>>>>>>       >>> utilize but a Dcmd can.
>>>>>>>>>       >>>
>>>>>>>>>       >>> David
>>>>>>>>>       >>>
>>>>>>>>>       >>>> Coleen
>>>>>>>>>       >>>>
>>>>>>>>>       >>>>>
>>>>>>>>>       >>>>> David
>>>>>>>>>       >>>>>
>>>>>>>>>       >>>>>> Coleen
>>>>>>>>>       >>>>>>
>>>>>>>>>       >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>>       >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>>> should
>>>>>>>>> not be
>>>>>>>>>       >>>>>>> piggy-backed onto VMError handling for the reasons
>>>>>>>>> stated
>>>>>>>>>      (and the
>>>>>>>>>       >>>>>>> error handling logic is complicated enough as it
>>>>>>>>> is!). For
>>>>>>>>>      things
>>>>>>>>>       >>>>>>> like thread lists there are already safe management
>>>>>>>>>      functions that
>>>>>>>>>       >>>>>>> can/should be used.
>>>>>>>>>       >>>>>>>
>>>>>>>>>       >>>>>>> Thanks,
>>>>>>>>>       >>>>>>> David H.
>>>>>>>>>       >>>>>>>
>>>>>>>>>       >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>>       >>>>>>>> Hi David,
>>>>>>>>>       >>>>>>>>
>>>>>>>>>       >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>>> concerned
>>>>>>>>> with the
>>>>>>>>>       >>>>>>>> fact that the diagnostic command is calling
>>>>>>>>> VMError::report().
>>>>>>>>>       >>>>>>>> This method has been implemented to be executed
>>>>>>>>> in the
>>>>>>>>>      particular
>>>>>>>>>       >>>>>>>> context of fatal errors, and its usage while the
>>>>>>>>> VM is
>>>>>>>>> running
>>>>>>>>>       >>>>>>>> normally seems dangerous.
>>>>>>>>>       >>>>>>>>
>>>>>>>>>       >>>>>>>> For instance, VMError::report() consciously avoids
>>>>>>>>>      grabbing locks
>>>>>>>>>       >>>>>>>> because of the risk of deadlock during the error
>>>>>>>>> reporting.
>>>>>>>>>       >>>>>>>> The consequence is that some data structures are
>>>>>>>>> browsed in an
>>>>>>>>>       >>>>>>>> unsafe way. One example: VMError::report() calls
>>>>>>>>>       >>>>>>>> Threads::print_on_error() which iterates over the
>>>>>>>>> thread list
>>>>>>>>>       >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>>       >>>>>>>>
>>>>>>>>>       >>>>>>>> The implementation of the diagnostic command seems
>>>>>>>>> also to
>>>>>>>>>       >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>>> VMError::report()
>>>>>>>>>       >>>>>>>> method. Have you considered implementing a new
>>>>>>>>> MT-safe
>>>>>>>>>      reporting
>>>>>>>>>       >>>>>>>> method rather than trying to modify the special
>>>>>>>>>      VMError::report()
>>>>>>>>>       >>>>>>>> methods? (Note that some code factorization between
>>>>>>>>>       >>>>>>>> VMError::report() and this new method should be
>>>>>>>>> possible).
>>>>>>>>>       >>>>>>>>
>>>>>>>>>       >>>>>>>> Thanks,
>>>>>>>>>       >>>>>>>>
>>>>>>>>>       >>>>>>>> Fred
>>>>>>>>>       >>>>>>>>
>>>>>>>>>       >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>>       >>>>>>>>> Hi!
>>>>>>>>>       >>>>>>>>>
>>>>>>>>>       >>>>>>>>> Please review my change for this small
>>>>>>>>> enhancement.
>>>>>>>>>       >>>>>>>>>
>>>>>>>>>       >>>>>>>>> bug:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>>       >>>>>>>>> webrev:
>>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>>       >>>>>>>>>
>>>>>>>>>       >>>>>>>>> Cheers,
>>>>>>>>>       >>>>>>>>> -Buck
>>>>>>>>>       >>>>>>>>
>>>>>>>>>       >>>>>>
>>>>>>>>>       >>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>      --
>>>>>>>>>      Bertrand Delsart,                     Grenoble Engineering
>>>>>>>>> Center
>>>>>>>>>      Oracle,         180 av. de l'Europe, ZIRST de
>>>>>>>>> Montbonnot
>>>>>>>>>      38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>>      bertrand.delsart at oracle.com
>>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>>                Phone : +33 4 76 18 81 23
>>>>>>>>>
>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>      NOTICE: This email message is for the sole use of the
>>>>>>>>> intended
>>>>>>>>>      recipient(s) and may contain confidential and privileged
>>>>>>>>>      information. Any unauthorized review, use, disclosure or
>>>>>>>>>      distribution is prohibited. If you are not the intended
>>>>>>>>> recipient,
>>>>>>>>>      please contact the sender by reply email and destroy all
>>>>>>>>> copies of
>>>>>>>>>      the original message.
>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>


More information about the hotspot-runtime-dev mailing list