RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)
Coleen Phillimore
coleen.phillimore at oracle.com
Fri Nov 13 15:12:05 UTC 2015
Thanks Frederic, for the comment. Did you also review the code?
Coleen
On 11/13/15 7:08 AM, Frederic Parain wrote:
>
>
> On 13/11/2015 05:52, David Holmes wrote:
>> Hi Coleen,
>>
>> On 13/11/2015 7:41 AM, Coleen Phillimore wrote:
>>>
>>> Hi, I should learn my lesson that Frederic, Bertrand and David are
>>> generally right.
>>>
>>> My restructuring went afoul of the STEP framework, and so I have made
>>> VMError::print_vm_info() report the safe things that VMError::report()
>>> reports. Hopefully, this will address the concern about safety. It
>>> also allows us to remove the lock and static buffer in the VM.info
>>> function. The duplication isn't as bad as I thought and is a better
>>> solution than piggy backing from VMError::report().
>>>
>>> Please review:
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>
>> This looks good to me. A few small queries:
>>
>> 888 if (Universe::is_fully_initialized()) {
>>
>> I don't think it is possible to run a Dcmd before the VM is fully
>> initialized.
>
> It is not possible to invoke a Dcmd from outside of the VM (using the
> attachAPI or the PlatformMBeanServer) before the VM is fully initialized.
> However it is still possible to invoke a Dcmd directly from VM code.
> The Dcmd framework is initialized in Management::init(), which is called
> before universe_post_init() is invoked. I don't know exactly the
> dependencies
> of Universe::heap()->print_on_error(), but the test looks reasonable.
>
> Fred
>
>>
>> 889 Universe::heap()->print_on_error(st);
>>
>> Is this safe? I'm not sure what it does - print heap memory regions ??
>>
>> 891 st->print_cr("Polling page: " INTPTR_FORMAT,
>> p2i(os::get_polling_page()));
>>
>> Is this useful ??
>>
>> Thanks,
>> David
>>
>>> I tested with jcmd <pid> VM.info concurrently and with an inserted
>>> crash
>>> during VM.info call to make sure the original hs_err_pid report comes
>>> out okay.
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>>>>
>>>>
>>>> On 11/10/15 7:12 PM, David Holmes wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> I still do not like the intertwining of the info and error code, but
>>>>> I won't say more on that.
>>>>
>>>> Yes, I understand the approach has good arguments on both sides. I'm
>>>> trying to find a compromise that can move this feature forward.
>>>>>
>>>>> However you may now have a concurrency problem:
>>>>>
>>>>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>>>> 843 void VMError::report_info(outputStream* st) {
>>>>> 844
>>>>> 845 // don't allocate large buffer on stack
>>>>> 846 static char buf[O_BUFLEN];
>>>>>
>>>>> Can you have concurrent info requests in flight? For the info case
>>>>> the buffer could be allocated on the stack I think, or even
>>>>> dynamically. (Aside: will this be executed in the service thread?)
>>>>
>>>> There is a lock added to protect the VM.info dcmd, so there would be
>>>> only sequential access to the static buffer. I don't think this is
>>>> executed in the Service Thread. I see no evidence of this anyway.
>>>>
>>>>>
>>>>> report_process could possibly be refactored further as there's a lot
>>>>> of initial stuff that won't apply to the info case:
>>>>> - we won't print all threads
>>>>> - the VM state will always be "not at safepoint"
>>>>> - The Universe must be initialized (not clear at what point during
>>>>> shutdown a dcmd can still come in?)
>>>>> - we don't print owned locks
>>>>
>>>> Yes, I can do this. It actually improves how this looks.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>
>>>>> That's all from me.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've done some refactoring and cleanup on this which the refactoring
>>>>>> enabled. Again, it's important that the VM.info output be
>>>>>> mostly the
>>>>>> same order as the hs_err_pid* report file so that sustaining can
>>>>>> easily
>>>>>> match and find output that they are looking for. It's also
>>>>>> important
>>>>>> not to copy this code because there may be new information added
>>>>>> that we
>>>>>> also want in the VM.info output. If any future information that we
>>>>>> want
>>>>>> to print seems unsafe, the vm_info_cmd can be used to exclude it
>>>>>> or it
>>>>>> can be printed only in the error reporting function. Rejecting the
>>>>>> entire change because of the theoretical unsafe addition of code
>>>>>> isn't
>>>>>> the right thing to do, although yeah, I've argued the other way
>>>>>> in the
>>>>>> past. This is something that is helpful to sustaining and our
>>>>>> licensees.
>>>>>>
>>>>>> In this patch, the functions that VM.info calls are safe while
>>>>>> the JVM
>>>>>> is running, but the stepping is required because at crash time, they
>>>>>> might not be. Although I've never seen the error reporting crash in
>>>>>> these places.
>>>>>>
>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>
>>>>>> This refactoring has been run through RBT.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>>>>> Also note that some of the crash information is not interesting
>>>>>>>> for
>>>>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>>>>> registers/stack data for any specific thread. If people want
>>>>>>>> thread
>>>>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>>>>> system state.
>>>>>>>
>>>>>>> I'm a little surprised that thread info is not of interest, but
>>>>>>> as you
>>>>>>> say that is available by other means - which was part of my
>>>>>>> earlier
>>>>>>> point about there already being safe means to get some of the info.
>>>>>>>
>>>>>>> If this is really only about the small subset of safe "system
>>>>>>> information" that VMError reports, then I see even less
>>>>>>> motivation to
>>>>>>> piggy-back VM-info onto VMerror and entangle the two. If each
>>>>>>> block of
>>>>>>> information can be considered a single "step" in the error case
>>>>>>> then
>>>>>>> we can simply factor out the actual printing parts and share
>>>>>>> them. If
>>>>>>> the error case wants fine-grained steps then we can't do that.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>> Kind Regards
>>>>>>>> /Mattis
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Coleen Phillimore
>>>>>>>> Sent: den 3 november 2015 16:25
>>>>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>>>>> hs_err print-out
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've done a bit of refactoring with this change and have tested it
>>>>>>>> with and without intentional crashes to the VM.info reporting.
>>>>>>>> If it
>>>>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>>>>> where it comes from in the hs_err file. And the hs_err_pid
>>>>>>>> file is
>>>>>>>> correctly produced. If we can make the jcmd come with a
>>>>>>>> warning that
>>>>>>>> it *could* potentially crash the vm, I think this feature is a
>>>>>>>> good
>>>>>>>> addition for sustaining.
>>>>>>>>
>>>>>>>> Without crashing, VM.info gives a lot of information that we
>>>>>>>> normally
>>>>>>>> give with the hs_err file. The reason for the careful
>>>>>>>> STEP'ping in
>>>>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>>>>> count on anything. The VM.info command is designed to be run on a
>>>>>>>> moderately stable JVM so is less likely to crash on the things
>>>>>>>> that
>>>>>>>> we're printing out.
>>>>>>>>
>>>>>>>> I think we should fix the Threads walking code that Fred
>>>>>>>> pointed out
>>>>>>>> to take the Threads_lock, but I don't think we should *copy*
>>>>>>>> code for
>>>>>>>> VM.info or copy print_on_error functions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>>>>> Thanks Thomas,
>>>>>>>>>>
>>>>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>>>>> if we
>>>>>>>>>> are not careful.
>>>>>>>>>>
>>>>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious
>>>>>>>>>> source of
>>>>>>>>>> potential crashes if you take into account the fact that the
>>>>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>>>>
>>>>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>>>>
>>>>>>>>> I'm not sure of why we do not print the threads when there is
>>>>>>>>> notion
>>>>>>>>> of current thread (and I would wonder whether we should print
>>>>>>>>> them for
>>>>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>>>>> crash.
>>>>>>>>>
>>>>>>>>> However, I still do not like it from a design point of view
>>>>>>>>> for the
>>>>>>>>> other reasons stated in my e-mail.
>>>>>>>>>
>>>>>>>>> Bertrand.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>>>>> what
>>>>>>>>>> it calls) has been designed.
>>>>>>>>>>
>>>>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient
>>>>>>>>>> unless you
>>>>>>>>>> pass that argument to all layers. This is the biggest design
>>>>>>>>>> issue.
>>>>>>>>>> For instance all calls to print_on_error could have this
>>>>>>>>>> additional
>>>>>>>>>> argument so that the different components can adapt their
>>>>>>>>>> behavior
>>>>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>>>>> case,
>>>>>>>>>> but the same if true for all methods called by
>>>>>>>>>> VMError::report. I
>>>>>>>>>> would go as far as adding a comment in all print_on_error
>>>>>>>>>> methods
>>>>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>>>>> locks
>>>>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>>>>> VMError:report is supposed to skip.
>>>>>>>>>> The current design, where the various components may not be
>>>>>>>>>> aware
>>>>>>>>>> that they are called for two different purposes, is not robust
>>>>>>>>>> enough.
>>>>>>>>>>
>>>>>>>>>> Note also that being 'robust enough for the "crash" case' is
>>>>>>>>>> not a
>>>>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>>>>> of the
>>>>>>>>>> info STEP crashes the VM instead of just bringing us to the next
>>>>>>>>>> error reporting STEP. Now, as stated before, I could agree if we
>>>>>>>>>> use
>>>>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Bertrand
>>>>>>>>>>
>>>>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>>>>
>>>>>>>>>>> However, there are some things to keep in mind when doing this:
>>>>>>>>>>>
>>>>>>>>>>> Error handling is different between the "info-only" and "crash"
>>>>>>>>>>> case. In
>>>>>>>>>>> "info-only" case, VMError::report() is called without enclosing
>>>>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>>>>> handler
>>>>>>>>>>> intact, not with the secondary signal handler. So, normal
>>>>>>>>>>> signal
>>>>>>>>>>> handling works, which is good. But it also means that any error
>>>>>>>>>>> inside one of the error reporting STEPs will bring down the VM,
>>>>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>>>>> ("Error
>>>>>>>>>>> occurred during error reporting").
>>>>>>>>>>>
>>>>>>>>>>> The problem is that a number of error reporting steps are a bit
>>>>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>>>>> happen
>>>>>>>>>>> is that the STEP will be interrupted. These risky steps
>>>>>>>>>>> should not
>>>>>>>>>>> be executed in the "only-info" case.
>>>>>>>>>>>
>>>>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>>>>> "_vm_info_cmd"
>>>>>>>>>>> and conditionally excludes dangerous or pointless error
>>>>>>>>>>> reporting
>>>>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>>>>
>>>>>>>>>>> As far as I can see, the other concerns concern only the
>>>>>>>>>>> "crash"
>>>>>>>>>>> case and simply do not apply to the "info-only" case. If an
>>>>>>>>>>> error
>>>>>>>>>>> reporting step is robust enough for the "crash" case (low stack
>>>>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it
>>>>>>>>>>> should be
>>>>>>>>>>> safe enough for the "info" case.
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards, Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>>>>> <mattis.castegren at oracle.com
>>>>>>>>>>> <mailto:mattis.castegren at oracle.com>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>> Of course, when I wrote "Review and push this fix" I meant
>>>>>>>>>>> doing a
>>>>>>>>>>> thorough technical review. We do not want VM.info to be
>>>>>>>>>>> unsafe, and
>>>>>>>>>>> I will let David reply to the technical questions that has
>>>>>>>>>>> been
>>>>>>>>>>> raised. I know he did consider some of these issues,
>>>>>>>>>>> but last
>>>>>>>>>>> week
>>>>>>>>>>> he was on Java one, and this week he is out. I will let
>>>>>>>>>>> him
>>>>>>>>>>> comment
>>>>>>>>>>> once he gets back.
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards
>>>>>>>>>>> /Mattis
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Bertrand Delsart
>>>>>>>>>>> Sent: den 2 november 2015 10:13
>>>>>>>>>>> To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>>>>> to get
>>>>>>>>>>> hs_err print-out
>>>>>>>>>>>
>>>>>>>>>>> On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>>>> > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>>>> >> Hi
>>>>>>>>>>> >>
>>>>>>>>>>> >> I agree that there may be things we could add to
>>>>>>>>>>> VM.info
>>>>>>>>>>> that we
>>>>>>>>>>> >> couldn't add in an hs_err file due to the fact that we
>>>>>>>>>>> are
>>>>>>>>>>> crashing.
>>>>>>>>>>> >> However, I really do believe that we should use the
>>>>>>>>>>> same
>>>>>>>>>>> code.
>>>>>>>>>>> The
>>>>>>>>>>> >> goal of the hs_err file is exactly the same as the
>>>>>>>>>>> goal of
>>>>>>>>>>> VM.info,
>>>>>>>>>>> >> to print as much information about the system as
>>>>>>>>>>> possible so
>>>>>>>>>>> that
>>>>>>>>>>> >> when we look at an hs_err file or the output from
>>>>>>>>>>> VM.info,
>>>>>>>>>>> we get as
>>>>>>>>>>> >> much information about the system as possible. That
>>>>>>>>>>> way
>>>>>>>>>>> we can
>>>>>>>>>>> reduce
>>>>>>>>>>> >> the need to go back and forth with the customer to ask
>>>>>>>>>>> for more
>>>>>>>>>>> information.
>>>>>>>>>>> >>
>>>>>>>>>>> >> If we enumerate what we need in VM.info, the list
>>>>>>>>>>> would look
>>>>>>>>>>> very
>>>>>>>>>>> >> much like what we already do in VMError, and
>>>>>>>>>>> therefore I
>>>>>>>>>>> think we
>>>>>>>>>>> >> should use the same code. If we find something that we
>>>>>>>>>>> would
>>>>>>>>>>> want,
>>>>>>>>>>> >> that is not currently printed in hs_err files, I
>>>>>>>>>>> think we
>>>>>>>>>>> have
>>>>>>>>>>> two options:
>>>>>>>>>>> >>
>>>>>>>>>>> >> 1) Add it to hs_err files as well. If it is
>>>>>>>>>>> interesting when
>>>>>>>>>>> >> gathering customer information before a crash, it
>>>>>>>>>>> probably is
>>>>>>>>>>> >> interesting after a crash
>>>>>>>>>>> >> 2) If it for some reason cannot be captured when
>>>>>>>>>>> crashing,
>>>>>>>>>>> add it to
>>>>>>>>>>> >> the code with the condition that we don't print it
>>>>>>>>>>> during
>>>>>>>>>>> crash
>>>>>>>>>>> time.
>>>>>>>>>>> >>
>>>>>>>>>>> >> I would therefore suggest the following plan of action
>>>>>>>>>>> >>
>>>>>>>>>>> >> 1) Review and push this fix to see if there are any
>>>>>>>>>>> technical issues
>>>>>>>>>>> >> with regards to thread safety, etc. I know David
>>>>>>>>>>> has had
>>>>>>>>>>> this in
>>>>>>>>>>> mind
>>>>>>>>>>> >> when doing the change. After that, we have a
>>>>>>>>>>> command that
>>>>>>>>>>> support can
>>>>>>>>>>> >> use.
>>>>>>>>>>> >> 2) Think about if there is any information missing,
>>>>>>>>>>> both
>>>>>>>>>>> from hs_err
>>>>>>>>>>> >> files and from VM.info. Here, we may want to ask
>>>>>>>>>>> support
>>>>>>>>>>> about if
>>>>>>>>>>> >> there is any additional information they need to
>>>>>>>>>>> ask for
>>>>>>>>>>> when they
>>>>>>>>>>> >> get this data. If possible, this information should be
>>>>>>>>>>> added
>>>>>>>>>>> to both
>>>>>>>>>>> >> hs_err files and VM.info, we don't want support to
>>>>>>>>>>> have to
>>>>>>>>>>> ask for
>>>>>>>>>>> >> VM.info if we already have an hs_err file. Of
>>>>>>>>>>> course, if
>>>>>>>>>>> there is
>>>>>>>>>>> >> anything Dev would want to see when analyzing
>>>>>>>>>>> crashes in
>>>>>>>>>>> testing, we
>>>>>>>>>>> >> should add that too.
>>>>>>>>>>> >
>>>>>>>>>>> > I have no issue with VMError and VM.Info using the same
>>>>>>>>>>> code
>>>>>>>>>>> (where
>>>>>>>>>>> > appropriate), but I think it wrong to try and implement
>>>>>>>>>>> VM.Info
>>>>>>>>>>> > directly using the actual VMError mechanism. It is
>>>>>>>>>>> crude and
>>>>>>>>>>> makes it
>>>>>>>>>>> > difficult to make independent changes to either
>>>>>>>>>>> facility.
>>>>>>>>>>> >
>>>>>>>>>>> > Sorry.
>>>>>>>>>>> >
>>>>>>>>>>> > David
>>>>>>>>>>>
>>>>>>>>>>> Furthermore, we cannot just "Review and push this fix to
>>>>>>>>>>> see if
>>>>>>>>>>> there are any technical issues". Two reviewers have
>>>>>>>>>>> already
>>>>>>>>>>> stated
>>>>>>>>>>> that there are issues and these issues must be addressed
>>>>>>>>>>> before
>>>>>>>>>>> pushing.
>>>>>>>>>>>
>>>>>>>>>>> The answer cannot just be that we need this feature, it
>>>>>>>>>>> must
>>>>>>>>>>> address
>>>>>>>>>>> the technical issues that were reported.
>>>>>>>>>>>
>>>>>>>>>>> One possible answer is that we all agree that VMInfo
>>>>>>>>>>> would be
>>>>>>>>>>> unsafe.
>>>>>>>>>>> The method would have to be renamed UnsafeVMInfo and it
>>>>>>>>>>> must be
>>>>>>>>>>> clear that its 'impact' would be very risky, potentially
>>>>>>>>>>> crashing
>>>>>>>>>>> the JVM.
>>>>>>>>>>>
>>>>>>>>>>> If this is sufficient for your needs then we could
>>>>>>>>>>> agree with
>>>>>>>>>>> it. My
>>>>>>>>>>> only additional requirement would be to clearly mark
>>>>>>>>>>> that the
>>>>>>>>>>> method
>>>>>>>>>>> has been called so that we do not waste time on
>>>>>>>>>>> non-issues if
>>>>>>>>>>> calling VMInfo leads to a crash (or, even worse but less
>>>>>>>>>>> likely, to
>>>>>>>>>>> a silent data corruption that may lead to problems later).
>>>>>>>>>>> The bare
>>>>>>>>>>> minimum would be to double check that a crash during an
>>>>>>>>>>> execution of
>>>>>>>>>>> VM.info is clearly identifiable (this is worth double
>>>>>>>>>>> checking
>>>>>>>>>>> because of the use of static buffers for the error
>>>>>>>>>>> reporting,
>>>>>>>>>>> see
>>>>>>>>>>> below).
>>>>>>>>>>>
>>>>>>>>>>> If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>>>>> wonder
>>>>>>>>>>> whether a safe version of it is still a "small
>>>>>>>>>>> enhancement".
>>>>>>>>>>> A JEP
>>>>>>>>>>> may be required.
>>>>>>>>>>>
>>>>>>>>>>> VMError::report and the methods it calls (print_on_error)
>>>>>>>>>>> have been
>>>>>>>>>>> written with assumptions clearly stated in the comments
>>>>>>>>>>> above
>>>>>>>>>>> VMError::report (see below the two most important ones).
>>>>>>>>>>> HotSpot
>>>>>>>>>>> error reporting strategy differed from JRockit. This
>>>>>>>>>>> explains
>>>>>>>>>>> why
>>>>>>>>>>> the JRockit approach cannot be applied as is.
>>>>>>>>>>>
>>>>>>>>>>> I agree with David that you can try leveraging some of the
>>>>>>>>>>> logic
>>>>>>>>>>> from VMError::report. However, to be sure there is no
>>>>>>>>>>> misunderstanding, let's be clearer about David's "(where
>>>>>>>>>>> appropriate)": we would for instance have to review all
>>>>>>>>>>> the
>>>>>>>>>>> called
>>>>>>>>>>> methods, check which ones purposefully ignore locking
>>>>>>>>>>> (potentially
>>>>>>>>>>> crashing) and provide alternate code for the unsafe ones
>>>>>>>>>>> (which may
>>>>>>>>>>> not be trivial in some cases). As a simple example, a
>>>>>>>>>>> method
>>>>>>>>>>> called
>>>>>>>>>>> "print_on_error" should not be called when we are not
>>>>>>>>>>> in an
>>>>>>>>>>> erroneous condition. However, in some components of the
>>>>>>>>>>> JVM, the
>>>>>>>>>>> low
>>>>>>>>>>> level implementations of 'print_on_error' and
>>>>>>>>>>> 'print_info_safely'
>>>>>>>>>>> could share the same code.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Bertrand.
>>>>>>>>>>>
>>>>>>>>>>> Here are the important comments about VMError::report
>>>>>>>>>>> assumptions:
>>>>>>>>>>>
>>>>>>>>>>> // In general, a hang in error handler is much worse
>>>>>>>>>>> than a
>>>>>>>>>>> crash or
>>>>>>>>>>> internal
>>>>>>>>>>> // error, as it's harder to recover from a hang.
>>>>>>>>>>> Deadlock can
>>>>>>>>>>> happen
>>>>>>>>>>> if we
>>>>>>>>>>> // try to grab a lock that is already owned by current
>>>>>>>>>>> thread,
>>>>>>>>>>> or if the
>>>>>>>>>>> // owner is blocked forever (e.g. in
>>>>>>>>>>> os::infinite_sleep()). If
>>>>>>>>>>> possible, the
>>>>>>>>>>> // error handler and all the functions it called should
>>>>>>>>>>> avoid
>>>>>>>>>>> grabbing any
>>>>>>>>>>> // lock. An important thing to notice is that memory
>>>>>>>>>>> allocation
>>>>>>>>>>> needs a
>>>>>>>>>>> lock.
>>>>>>>>>>> //
>>>>>>>>>>> // We should avoid using large stack allocated buffers.
>>>>>>>>>>> Many
>>>>>>>>>>> errors
>>>>>>>>>>> happen
>>>>>>>>>>> // when stack space is already low. Making things even
>>>>>>>>>>> worse is
>>>>>>>>>>> that
>>>>>>>>>>> there
>>>>>>>>>>> // could be nested report_and_die() calls on stack (see
>>>>>>>>>>> above).
>>>>>>>>>>> Only one
>>>>>>>>>>> // thread can report error, so large buffers are
>>>>>>>>>>> statically
>>>>>>>>>>> allocated in
>>>>>>>>>>> data
>>>>>>>>>>> // segment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >> Kind Regards
>>>>>>>>>>> >> /Mattis
>>>>>>>>>>> >>
>>>>>>>>>>> >> -----Original Message-----
>>>>>>>>>>> >> From: David Holmes
>>>>>>>>>>> >> Sent: den 29 oktober 2015 23:20
>>>>>>>>>>> >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>> >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>> >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>> VM.info to
>>>>>>>>>>> get
>>>>>>>>>>> >> hs_err print-out
>>>>>>>>>>> >>
>>>>>>>>>>> >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>>>> >>> Some background: We have this command in JRockit. The
>>>>>>>>>>> information you
>>>>>>>>>>> >>> gather when you crash to give a good summary of what
>>>>>>>>>>> system
>>>>>>>>>>> you run
>>>>>>>>>>> >>> on is pretty much exactly the information you need to
>>>>>>>>>>> get a
>>>>>>>>>>> good
>>>>>>>>>>> >>> summary on a system that has not crashed. The JRockit
>>>>>>>>>>> command is
>>>>>>>>>>> >>> extremely useful for support, and saves a lot of work
>>>>>>>>>>> going
>>>>>>>>>>> back and
>>>>>>>>>>> >>> forth asking about system information.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Also, if we write something new in the hs_err file,
>>>>>>>>>>> like if
>>>>>>>>>>> there has
>>>>>>>>>>> >>> been any out of memory errors, we often would want
>>>>>>>>>>> the same
>>>>>>>>>>> >>> information in the VM.info output. From my
>>>>>>>>>>> experience in
>>>>>>>>>>> >>> Sustaining/Support, I can't think of any
>>>>>>>>>>> information you
>>>>>>>>>>> would want
>>>>>>>>>>> >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>>>>> file
>>>>>>>>>>> and the
>>>>>>>>>>> >>> other way around, apart from details about the crash
>>>>>>>>>>> (obviously).
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I don't see a reason to exactly enumerate what
>>>>>>>>>>> information
>>>>>>>>>>> VM.info
>>>>>>>>>>> >>> should provide. From a sustaining/support
>>>>>>>>>>> perspective, we
>>>>>>>>>>> want a
>>>>>>>>>>> >>> one-stop command to gather as much useful
>>>>>>>>>>> information as
>>>>>>>>>>> possible,
>>>>>>>>>>> >>> which is the same idea we have for the hs_err file.
>>>>>>>>>>> >>
>>>>>>>>>>> >> The reason to enumerate what is required is to see
>>>>>>>>>>> where
>>>>>>>>>>> that
>>>>>>>>>>> >> information already exists and can be collected
>>>>>>>>>>> from. The
>>>>>>>>>>> VMError
>>>>>>>>>>> >> report has to be very careful about what it does and
>>>>>>>>>>> how it
>>>>>>>>>>> does it
>>>>>>>>>>> >> because of the fact we may have crashed and the
>>>>>>>>>>> general
>>>>>>>>>>> process
>>>>>>>>>>> state
>>>>>>>>>>> >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>>>>> information is
>>>>>>>>>>> >> required from the available sources in whatever
>>>>>>>>>>> order or
>>>>>>>>>>> format
>>>>>>>>>>> desired.
>>>>>>>>>>> >>
>>>>>>>>>>> >> I have no problem with this command, just how it
>>>>>>>>>>> has been
>>>>>>>>>>> proposed to
>>>>>>>>>>> >> implement it.
>>>>>>>>>>> >>
>>>>>>>>>>> >> David
>>>>>>>>>>> >>
>>>>>>>>>>> >>> Kind Regards
>>>>>>>>>>> >>> /Mattis
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> -----Original Message-----
>>>>>>>>>>> >>> From: David Holmes
>>>>>>>>>>> >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>>>> >>> To: Coleen Phillimore;
>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>> >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>> VM.info
>>>>>>>>>>> to get
>>>>>>>>>>> >>> hs_err print-out
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>>>> >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> So I actually disagree. I don't think there
>>>>>>>>>>> should
>>>>>>>>>>> be an
>>>>>>>>>>> >>>>>> additional separate mechanism to get the same
>>>>>>>>>>> information
>>>>>>>>>>> that we
>>>>>>>>>>> >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>>>>> feature like
>>>>>>>>>>> >>>>>> this for a long time.
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> I pre-reviewed this change and I thought it looked
>>>>>>>>>>> good in
>>>>>>>>>>> general.
>>>>>>>>>>> >>>>>> I didn't see the thread iteration problem that
>>>>>>>>>>> Fred
>>>>>>>>>>> refers to
>>>>>>>>>>> >>>>>> below, but I think the individual problems can be
>>>>>>>>>>> fixed.
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> The last thing I want is this code to be copied
>>>>>>>>>>> somewhere else.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Factored as needed not copied. VMError is not an
>>>>>>>>>>> "info"
>>>>>>>>>>> reporting
>>>>>>>>>>> >>>>> mechanism.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> If you look at the things that are reported in each
>>>>>>>>>>> "STEP",
>>>>>>>>>>> there's a
>>>>>>>>>>> >>>> small amount of code and the order is important.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> The order is less important in an "info" request I
>>>>>>>>>>> would
>>>>>>>>>>> think.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>> I'd like the vm info to use the same order and
>>>>>>>>>>> report
>>>>>>>>>>> what it
>>>>>>>>>>> can do
>>>>>>>>>>> >>>> safely. Refactoring 5 lines of code into functions
>>>>>>>>>>> doesn't
>>>>>>>>>>> make
>>>>>>>>>>> >>>> sense.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I need to consider exactly what it is the "info"
>>>>>>>>>>> needs to
>>>>>>>>>>> report in
>>>>>>>>>>> >>> more detail. There are existing facilities (system
>>>>>>>>>>> properties,
>>>>>>>>>>> >>> management
>>>>>>>>>>> >>> APIs) for various bits of runtime information, which
>>>>>>>>>>> VMError can't
>>>>>>>>>>> >>> utilize but a Dcmd can.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> David
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>> Coleen
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> David
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>> Coleen
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>>>> >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>>>>> should
>>>>>>>>>>> not be
>>>>>>>>>>> >>>>>>> piggy-backed onto VMError handling for the
>>>>>>>>>>> reasons
>>>>>>>>>>> stated
>>>>>>>>>>> (and the
>>>>>>>>>>> >>>>>>> error handling logic is complicated enough as it
>>>>>>>>>>> is!). For
>>>>>>>>>>> things
>>>>>>>>>>> >>>>>>> like thread lists there are already safe
>>>>>>>>>>> management
>>>>>>>>>>> functions that
>>>>>>>>>>> >>>>>>> can/should be used.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>>> >>>>>>> David H.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>>>> >>>>>>>> Hi David,
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>>>>> concerned
>>>>>>>>>>> with the
>>>>>>>>>>> >>>>>>>> fact that the diagnostic command is calling
>>>>>>>>>>> VMError::report().
>>>>>>>>>>> >>>>>>>> This method has been implemented to be executed
>>>>>>>>>>> in the
>>>>>>>>>>> particular
>>>>>>>>>>> >>>>>>>> context of fatal errors, and its usage while the
>>>>>>>>>>> VM is
>>>>>>>>>>> running
>>>>>>>>>>> >>>>>>>> normally seems dangerous.
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> For instance, VMError::report() consciously
>>>>>>>>>>> avoids
>>>>>>>>>>> grabbing locks
>>>>>>>>>>> >>>>>>>> because of the risk of deadlock during the error
>>>>>>>>>>> reporting.
>>>>>>>>>>> >>>>>>>> The consequence is that some data structures are
>>>>>>>>>>> browsed in an
>>>>>>>>>>> >>>>>>>> unsafe way. One example: VMError::report() calls
>>>>>>>>>>> >>>>>>>> Threads::print_on_error() which iterates over
>>>>>>>>>>> the
>>>>>>>>>>> thread list
>>>>>>>>>>> >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> The implementation of the diagnostic command
>>>>>>>>>>> seems
>>>>>>>>>>> also to
>>>>>>>>>>> >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>>>>> VMError::report()
>>>>>>>>>>> >>>>>>>> method. Have you considered implementing a new
>>>>>>>>>>> MT-safe
>>>>>>>>>>> reporting
>>>>>>>>>>> >>>>>>>> method rather than trying to modify the special
>>>>>>>>>>> VMError::report()
>>>>>>>>>>> >>>>>>>> methods? (Note that some code factorization
>>>>>>>>>>> between
>>>>>>>>>>> >>>>>>>> VMError::report() and this new method should be
>>>>>>>>>>> possible).
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Fred
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>>>> >>>>>>>>> Hi!
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> Please review my change for this small
>>>>>>>>>>> enhancement.
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> bug:
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>>>> >>>>>>>>> webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> Cheers,
>>>>>>>>>>> >>>>>>>>> -Buck
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Bertrand Delsart, Grenoble Engineering
>>>>>>>>>>> Center
>>>>>>>>>>> Oracle, 180 av. de l'Europe, ZIRST de
>>>>>>>>>>> Montbonnot
>>>>>>>>>>> 38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>>>> bertrand.delsart at oracle.com
>>>>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>>>> Phone : +33 4 76 18 81 23
>>>>>>>>>>>
>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>
>>>>>>>>>>> NOTICE: This email message is for the sole use of the
>>>>>>>>>>> intended
>>>>>>>>>>> recipient(s) and may contain confidential and privileged
>>>>>>>>>>> information. Any unauthorized review, use, disclosure or
>>>>>>>>>>> distribution is prohibited. If you are not the intended
>>>>>>>>>>> recipient,
>>>>>>>>>>> please contact the sender by reply email and destroy all
>>>>>>>>>>> copies of
>>>>>>>>>>> the original message.
>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list