RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)
Coleen Phillimore
coleen.phillimore at oracle.com
Fri Nov 13 15:11:35 UTC 2015
David, Thank you for looking at this again and our offline discussions.
On 11/12/15 11:52 PM, David Holmes wrote:
> Hi Coleen,
>
> On 13/11/2015 7:41 AM, Coleen Phillimore wrote:
>>
>> Hi, I should learn my lesson that Frederic, Bertrand and David are
>> generally right.
>>
>> My restructuring went afoul of the STEP framework, and so I have made
>> VMError::print_vm_info() report the safe things that VMError::report()
>> reports. Hopefully, this will address the concern about safety. It
>> also allows us to remove the lock and static buffer in the VM.info
>> function. The duplication isn't as bad as I thought and is a better
>> solution than piggy backing from VMError::report().
>>
>> Please review:
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>
> This looks good to me. A few small queries:
>
> 888 if (Universe::is_fully_initialized()) {
>
> I don't think it is possible to run a Dcmd before the VM is fully
> initialized.
in the interest of keeping semi-duplicated code looking the similar, and
what Frederic says, I'd like to keep the test there. It's not hurting
anything.
>
> 889 Universe::heap()->print_on_error(st);
>
> Is this safe? I'm not sure what it does - print heap memory regions ??
yes, it appears to be safe.
>
> 891 st->print_cr("Polling page: " INTPTR_FORMAT,
> p2i(os::get_polling_page()));
>
> Is this useful ??
Sure, why not? Somebody added it to the hs_err_pid file for some reason.
Thanks!
Coleen
>
> Thanks,
> David
>
>> I tested with jcmd <pid> VM.info concurrently and with an inserted crash
>> during VM.info call to make sure the original hs_err_pid report comes
>> out okay.
>>
>> Thanks,
>> Coleen
>>
>> On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>>>
>>>
>>> On 11/10/15 7:12 PM, David Holmes wrote:
>>>> Hi Coleen,
>>>>
>>>> I still do not like the intertwining of the info and error code, but
>>>> I won't say more on that.
>>>
>>> Yes, I understand the approach has good arguments on both sides. I'm
>>> trying to find a compromise that can move this feature forward.
>>>>
>>>> However you may now have a concurrency problem:
>>>>
>>>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>>> 843 void VMError::report_info(outputStream* st) {
>>>> 844
>>>> 845 // don't allocate large buffer on stack
>>>> 846 static char buf[O_BUFLEN];
>>>>
>>>> Can you have concurrent info requests in flight? For the info case
>>>> the buffer could be allocated on the stack I think, or even
>>>> dynamically. (Aside: will this be executed in the service thread?)
>>>
>>> There is a lock added to protect the VM.info dcmd, so there would be
>>> only sequential access to the static buffer. I don't think this is
>>> executed in the Service Thread. I see no evidence of this anyway.
>>>
>>>>
>>>> report_process could possibly be refactored further as there's a lot
>>>> of initial stuff that won't apply to the info case:
>>>> - we won't print all threads
>>>> - the VM state will always be "not at safepoint"
>>>> - The Universe must be initialized (not clear at what point during
>>>> shutdown a dcmd can still come in?)
>>>> - we don't print owned locks
>>>
>>> Yes, I can do this. It actually improves how this looks.
>>>
>>> Thanks,
>>> Coleen
>>>>
>>>> That's all from me.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've done some refactoring and cleanup on this which the refactoring
>>>>> enabled. Again, it's important that the VM.info output be mostly
>>>>> the
>>>>> same order as the hs_err_pid* report file so that sustaining can
>>>>> easily
>>>>> match and find output that they are looking for. It's also
>>>>> important
>>>>> not to copy this code because there may be new information added
>>>>> that we
>>>>> also want in the VM.info output. If any future information that we
>>>>> want
>>>>> to print seems unsafe, the vm_info_cmd can be used to exclude it
>>>>> or it
>>>>> can be printed only in the error reporting function. Rejecting the
>>>>> entire change because of the theoretical unsafe addition of code
>>>>> isn't
>>>>> the right thing to do, although yeah, I've argued the other way in
>>>>> the
>>>>> past. This is something that is helpful to sustaining and our
>>>>> licensees.
>>>>>
>>>>> In this patch, the functions that VM.info calls are safe while the
>>>>> JVM
>>>>> is running, but the stepping is required because at crash time, they
>>>>> might not be. Although I've never seen the error reporting crash in
>>>>> these places.
>>>>>
>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>
>>>>> This refactoring has been run through RBT.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>>>> Also note that some of the crash information is not interesting for
>>>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>>>> registers/stack data for any specific thread. If people want thread
>>>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>>>> system state.
>>>>>>
>>>>>> I'm a little surprised that thread info is not of interest, but
>>>>>> as you
>>>>>> say that is available by other means - which was part of my earlier
>>>>>> point about there already being safe means to get some of the info.
>>>>>>
>>>>>> If this is really only about the small subset of safe "system
>>>>>> information" that VMError reports, then I see even less
>>>>>> motivation to
>>>>>> piggy-back VM-info onto VMerror and entangle the two. If each
>>>>>> block of
>>>>>> information can be considered a single "step" in the error case then
>>>>>> we can simply factor out the actual printing parts and share
>>>>>> them. If
>>>>>> the error case wants fine-grained steps then we can't do that.
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> Kind Regards
>>>>>>> /Mattis
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Coleen Phillimore
>>>>>>> Sent: den 3 november 2015 16:25
>>>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>>>> hs_err print-out
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've done a bit of refactoring with this change and have tested it
>>>>>>> with and without intentional crashes to the VM.info reporting.
>>>>>>> If it
>>>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>>>> where it comes from in the hs_err file. And the hs_err_pid file is
>>>>>>> correctly produced. If we can make the jcmd come with a warning
>>>>>>> that
>>>>>>> it *could* potentially crash the vm, I think this feature is a good
>>>>>>> addition for sustaining.
>>>>>>>
>>>>>>> Without crashing, VM.info gives a lot of information that we
>>>>>>> normally
>>>>>>> give with the hs_err file. The reason for the careful STEP'ping in
>>>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>>>> count on anything. The VM.info command is designed to be run on a
>>>>>>> moderately stable JVM so is less likely to crash on the things that
>>>>>>> we're printing out.
>>>>>>>
>>>>>>> I think we should fix the Threads walking code that Fred pointed
>>>>>>> out
>>>>>>> to take the Threads_lock, but I don't think we should *copy*
>>>>>>> code for
>>>>>>> VM.info or copy print_on_error functions.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>>>> Thanks Thomas,
>>>>>>>>>
>>>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>>>> if we
>>>>>>>>> are not careful.
>>>>>>>>>
>>>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious
>>>>>>>>> source of
>>>>>>>>> potential crashes if you take into account the fact that the
>>>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>>>
>>>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>>>
>>>>>>>> I'm not sure of why we do not print the threads when there is
>>>>>>>> notion
>>>>>>>> of current thread (and I would wonder whether we should print
>>>>>>>> them for
>>>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>>>> crash.
>>>>>>>>
>>>>>>>> However, I still do not like it from a design point of view for
>>>>>>>> the
>>>>>>>> other reasons stated in my e-mail.
>>>>>>>>
>>>>>>>> Bertrand.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>>>> what
>>>>>>>>> it calls) has been designed.
>>>>>>>>>
>>>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient unless
>>>>>>>>> you
>>>>>>>>> pass that argument to all layers. This is the biggest design
>>>>>>>>> issue.
>>>>>>>>> For instance all calls to print_on_error could have this
>>>>>>>>> additional
>>>>>>>>> argument so that the different components can adapt their
>>>>>>>>> behavior
>>>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>>>> case,
>>>>>>>>> but the same if true for all methods called by VMError::report. I
>>>>>>>>> would go as far as adding a comment in all print_on_error methods
>>>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>>>> locks
>>>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>>>> VMError:report is supposed to skip.
>>>>>>>>> The current design, where the various components may not be aware
>>>>>>>>> that they are called for two different purposes, is not robust
>>>>>>>>> enough.
>>>>>>>>>
>>>>>>>>> Note also that being 'robust enough for the "crash" case' is
>>>>>>>>> not a
>>>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>>>> of the
>>>>>>>>> info STEP crashes the VM instead of just bringing us to the next
>>>>>>>>> error reporting STEP. Now, as stated before, I could agree if we
>>>>>>>>> use
>>>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Bertrand
>>>>>>>>>
>>>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>>>
>>>>>>>>>> However, there are some things to keep in mind when doing this:
>>>>>>>>>>
>>>>>>>>>> Error handling is different between the "info-only" and "crash"
>>>>>>>>>> case. In
>>>>>>>>>> "info-only" case, VMError::report() is called without enclosing
>>>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>>>> handler
>>>>>>>>>> intact, not with the secondary signal handler. So, normal signal
>>>>>>>>>> handling works, which is good. But it also means that any error
>>>>>>>>>> inside one of the error reporting STEPs will bring down the VM,
>>>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>>>> ("Error
>>>>>>>>>> occurred during error reporting").
>>>>>>>>>>
>>>>>>>>>> The problem is that a number of error reporting steps are a bit
>>>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>>>> happen
>>>>>>>>>> is that the STEP will be interrupted. These risky steps
>>>>>>>>>> should not
>>>>>>>>>> be executed in the "only-info" case.
>>>>>>>>>>
>>>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>>>> "_vm_info_cmd"
>>>>>>>>>> and conditionally excludes dangerous or pointless error
>>>>>>>>>> reporting
>>>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>>>
>>>>>>>>>> As far as I can see, the other concerns concern only the "crash"
>>>>>>>>>> case and simply do not apply to the "info-only" case. If an
>>>>>>>>>> error
>>>>>>>>>> reporting step is robust enough for the "crash" case (low stack
>>>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it
>>>>>>>>>> should be
>>>>>>>>>> safe enough for the "info" case.
>>>>>>>>>>
>>>>>>>>>> Kind Regards, Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>>>> <mattis.castegren at oracle.com
>>>>>>>>>> <mailto:mattis.castegren at oracle.com>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> Of course, when I wrote "Review and push this fix" I meant
>>>>>>>>>> doing a
>>>>>>>>>> thorough technical review. We do not want VM.info to be
>>>>>>>>>> unsafe, and
>>>>>>>>>> I will let David reply to the technical questions that has
>>>>>>>>>> been
>>>>>>>>>> raised. I know he did consider some of these issues, but
>>>>>>>>>> last
>>>>>>>>>> week
>>>>>>>>>> he was on Java one, and this week he is out. I will let him
>>>>>>>>>> comment
>>>>>>>>>> once he gets back.
>>>>>>>>>>
>>>>>>>>>> Kind Regards
>>>>>>>>>> /Mattis
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Bertrand Delsart
>>>>>>>>>> Sent: den 2 november 2015 10:13
>>>>>>>>>> To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>>>> to get
>>>>>>>>>> hs_err print-out
>>>>>>>>>>
>>>>>>>>>> On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>>> > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>>> >> Hi
>>>>>>>>>> >>
>>>>>>>>>> >> I agree that there may be things we could add to
>>>>>>>>>> VM.info
>>>>>>>>>> that we
>>>>>>>>>> >> couldn't add in an hs_err file due to the fact that we
>>>>>>>>>> are
>>>>>>>>>> crashing.
>>>>>>>>>> >> However, I really do believe that we should use the
>>>>>>>>>> same
>>>>>>>>>> code.
>>>>>>>>>> The
>>>>>>>>>> >> goal of the hs_err file is exactly the same as the
>>>>>>>>>> goal of
>>>>>>>>>> VM.info,
>>>>>>>>>> >> to print as much information about the system as
>>>>>>>>>> possible so
>>>>>>>>>> that
>>>>>>>>>> >> when we look at an hs_err file or the output from
>>>>>>>>>> VM.info,
>>>>>>>>>> we get as
>>>>>>>>>> >> much information about the system as possible. That way
>>>>>>>>>> we can
>>>>>>>>>> reduce
>>>>>>>>>> >> the need to go back and forth with the customer to ask
>>>>>>>>>> for more
>>>>>>>>>> information.
>>>>>>>>>> >>
>>>>>>>>>> >> If we enumerate what we need in VM.info, the list
>>>>>>>>>> would look
>>>>>>>>>> very
>>>>>>>>>> >> much like what we already do in VMError, and
>>>>>>>>>> therefore I
>>>>>>>>>> think we
>>>>>>>>>> >> should use the same code. If we find something that we
>>>>>>>>>> would
>>>>>>>>>> want,
>>>>>>>>>> >> that is not currently printed in hs_err files, I
>>>>>>>>>> think we
>>>>>>>>>> have
>>>>>>>>>> two options:
>>>>>>>>>> >>
>>>>>>>>>> >> 1) Add it to hs_err files as well. If it is
>>>>>>>>>> interesting when
>>>>>>>>>> >> gathering customer information before a crash, it
>>>>>>>>>> probably is
>>>>>>>>>> >> interesting after a crash
>>>>>>>>>> >> 2) If it for some reason cannot be captured when
>>>>>>>>>> crashing,
>>>>>>>>>> add it to
>>>>>>>>>> >> the code with the condition that we don't print it
>>>>>>>>>> during
>>>>>>>>>> crash
>>>>>>>>>> time.
>>>>>>>>>> >>
>>>>>>>>>> >> I would therefore suggest the following plan of action
>>>>>>>>>> >>
>>>>>>>>>> >> 1) Review and push this fix to see if there are any
>>>>>>>>>> technical issues
>>>>>>>>>> >> with regards to thread safety, etc. I know David has
>>>>>>>>>> had
>>>>>>>>>> this in
>>>>>>>>>> mind
>>>>>>>>>> >> when doing the change. After that, we have a command
>>>>>>>>>> that
>>>>>>>>>> support can
>>>>>>>>>> >> use.
>>>>>>>>>> >> 2) Think about if there is any information missing,
>>>>>>>>>> both
>>>>>>>>>> from hs_err
>>>>>>>>>> >> files and from VM.info. Here, we may want to ask
>>>>>>>>>> support
>>>>>>>>>> about if
>>>>>>>>>> >> there is any additional information they need to ask
>>>>>>>>>> for
>>>>>>>>>> when they
>>>>>>>>>> >> get this data. If possible, this information should be
>>>>>>>>>> added
>>>>>>>>>> to both
>>>>>>>>>> >> hs_err files and VM.info, we don't want support to
>>>>>>>>>> have to
>>>>>>>>>> ask for
>>>>>>>>>> >> VM.info if we already have an hs_err file. Of
>>>>>>>>>> course, if
>>>>>>>>>> there is
>>>>>>>>>> >> anything Dev would want to see when analyzing
>>>>>>>>>> crashes in
>>>>>>>>>> testing, we
>>>>>>>>>> >> should add that too.
>>>>>>>>>> >
>>>>>>>>>> > I have no issue with VMError and VM.Info using the same
>>>>>>>>>> code
>>>>>>>>>> (where
>>>>>>>>>> > appropriate), but I think it wrong to try and implement
>>>>>>>>>> VM.Info
>>>>>>>>>> > directly using the actual VMError mechanism. It is
>>>>>>>>>> crude and
>>>>>>>>>> makes it
>>>>>>>>>> > difficult to make independent changes to either
>>>>>>>>>> facility.
>>>>>>>>>> >
>>>>>>>>>> > Sorry.
>>>>>>>>>> >
>>>>>>>>>> > David
>>>>>>>>>>
>>>>>>>>>> Furthermore, we cannot just "Review and push this fix to
>>>>>>>>>> see if
>>>>>>>>>> there are any technical issues". Two reviewers have already
>>>>>>>>>> stated
>>>>>>>>>> that there are issues and these issues must be addressed
>>>>>>>>>> before
>>>>>>>>>> pushing.
>>>>>>>>>>
>>>>>>>>>> The answer cannot just be that we need this feature, it
>>>>>>>>>> must
>>>>>>>>>> address
>>>>>>>>>> the technical issues that were reported.
>>>>>>>>>>
>>>>>>>>>> One possible answer is that we all agree that VMInfo
>>>>>>>>>> would be
>>>>>>>>>> unsafe.
>>>>>>>>>> The method would have to be renamed UnsafeVMInfo and it
>>>>>>>>>> must be
>>>>>>>>>> clear that its 'impact' would be very risky, potentially
>>>>>>>>>> crashing
>>>>>>>>>> the JVM.
>>>>>>>>>>
>>>>>>>>>> If this is sufficient for your needs then we could agree
>>>>>>>>>> with
>>>>>>>>>> it. My
>>>>>>>>>> only additional requirement would be to clearly mark
>>>>>>>>>> that the
>>>>>>>>>> method
>>>>>>>>>> has been called so that we do not waste time on
>>>>>>>>>> non-issues if
>>>>>>>>>> calling VMInfo leads to a crash (or, even worse but less
>>>>>>>>>> likely, to
>>>>>>>>>> a silent data corruption that may lead to problems later).
>>>>>>>>>> The bare
>>>>>>>>>> minimum would be to double check that a crash during an
>>>>>>>>>> execution of
>>>>>>>>>> VM.info is clearly identifiable (this is worth double
>>>>>>>>>> checking
>>>>>>>>>> because of the use of static buffers for the error
>>>>>>>>>> reporting,
>>>>>>>>>> see
>>>>>>>>>> below).
>>>>>>>>>>
>>>>>>>>>> If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>>>> wonder
>>>>>>>>>> whether a safe version of it is still a "small
>>>>>>>>>> enhancement".
>>>>>>>>>> A JEP
>>>>>>>>>> may be required.
>>>>>>>>>>
>>>>>>>>>> VMError::report and the methods it calls (print_on_error)
>>>>>>>>>> have been
>>>>>>>>>> written with assumptions clearly stated in the comments
>>>>>>>>>> above
>>>>>>>>>> VMError::report (see below the two most important ones).
>>>>>>>>>> HotSpot
>>>>>>>>>> error reporting strategy differed from JRockit. This
>>>>>>>>>> explains
>>>>>>>>>> why
>>>>>>>>>> the JRockit approach cannot be applied as is.
>>>>>>>>>>
>>>>>>>>>> I agree with David that you can try leveraging some of the
>>>>>>>>>> logic
>>>>>>>>>> from VMError::report. However, to be sure there is no
>>>>>>>>>> misunderstanding, let's be clearer about David's "(where
>>>>>>>>>> appropriate)": we would for instance have to review all the
>>>>>>>>>> called
>>>>>>>>>> methods, check which ones purposefully ignore locking
>>>>>>>>>> (potentially
>>>>>>>>>> crashing) and provide alternate code for the unsafe ones
>>>>>>>>>> (which may
>>>>>>>>>> not be trivial in some cases). As a simple example, a
>>>>>>>>>> method
>>>>>>>>>> called
>>>>>>>>>> "print_on_error" should not be called when we are not in an
>>>>>>>>>> erroneous condition. However, in some components of the
>>>>>>>>>> JVM, the
>>>>>>>>>> low
>>>>>>>>>> level implementations of 'print_on_error' and
>>>>>>>>>> 'print_info_safely'
>>>>>>>>>> could share the same code.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Bertrand.
>>>>>>>>>>
>>>>>>>>>> Here are the important comments about VMError::report
>>>>>>>>>> assumptions:
>>>>>>>>>>
>>>>>>>>>> // In general, a hang in error handler is much worse than a
>>>>>>>>>> crash or
>>>>>>>>>> internal
>>>>>>>>>> // error, as it's harder to recover from a hang.
>>>>>>>>>> Deadlock can
>>>>>>>>>> happen
>>>>>>>>>> if we
>>>>>>>>>> // try to grab a lock that is already owned by current
>>>>>>>>>> thread,
>>>>>>>>>> or if the
>>>>>>>>>> // owner is blocked forever (e.g. in
>>>>>>>>>> os::infinite_sleep()). If
>>>>>>>>>> possible, the
>>>>>>>>>> // error handler and all the functions it called should
>>>>>>>>>> avoid
>>>>>>>>>> grabbing any
>>>>>>>>>> // lock. An important thing to notice is that memory
>>>>>>>>>> allocation
>>>>>>>>>> needs a
>>>>>>>>>> lock.
>>>>>>>>>> //
>>>>>>>>>> // We should avoid using large stack allocated buffers.
>>>>>>>>>> Many
>>>>>>>>>> errors
>>>>>>>>>> happen
>>>>>>>>>> // when stack space is already low. Making things even
>>>>>>>>>> worse is
>>>>>>>>>> that
>>>>>>>>>> there
>>>>>>>>>> // could be nested report_and_die() calls on stack (see
>>>>>>>>>> above).
>>>>>>>>>> Only one
>>>>>>>>>> // thread can report error, so large buffers are statically
>>>>>>>>>> allocated in
>>>>>>>>>> data
>>>>>>>>>> // segment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >> Kind Regards
>>>>>>>>>> >> /Mattis
>>>>>>>>>> >>
>>>>>>>>>> >> -----Original Message-----
>>>>>>>>>> >> From: David Holmes
>>>>>>>>>> >> Sent: den 29 oktober 2015 23:20
>>>>>>>>>> >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>>> >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>> >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>> VM.info to
>>>>>>>>>> get
>>>>>>>>>> >> hs_err print-out
>>>>>>>>>> >>
>>>>>>>>>> >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>>> >>> Some background: We have this command in JRockit. The
>>>>>>>>>> information you
>>>>>>>>>> >>> gather when you crash to give a good summary of what
>>>>>>>>>> system
>>>>>>>>>> you run
>>>>>>>>>> >>> on is pretty much exactly the information you need to
>>>>>>>>>> get a
>>>>>>>>>> good
>>>>>>>>>> >>> summary on a system that has not crashed. The JRockit
>>>>>>>>>> command is
>>>>>>>>>> >>> extremely useful for support, and saves a lot of work
>>>>>>>>>> going
>>>>>>>>>> back and
>>>>>>>>>> >>> forth asking about system information.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Also, if we write something new in the hs_err file,
>>>>>>>>>> like if
>>>>>>>>>> there has
>>>>>>>>>> >>> been any out of memory errors, we often would want
>>>>>>>>>> the same
>>>>>>>>>> >>> information in the VM.info output. From my
>>>>>>>>>> experience in
>>>>>>>>>> >>> Sustaining/Support, I can't think of any
>>>>>>>>>> information you
>>>>>>>>>> would want
>>>>>>>>>> >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>>>> file
>>>>>>>>>> and the
>>>>>>>>>> >>> other way around, apart from details about the crash
>>>>>>>>>> (obviously).
>>>>>>>>>> >>>
>>>>>>>>>> >>> I don't see a reason to exactly enumerate what
>>>>>>>>>> information
>>>>>>>>>> VM.info
>>>>>>>>>> >>> should provide. From a sustaining/support
>>>>>>>>>> perspective, we
>>>>>>>>>> want a
>>>>>>>>>> >>> one-stop command to gather as much useful
>>>>>>>>>> information as
>>>>>>>>>> possible,
>>>>>>>>>> >>> which is the same idea we have for the hs_err file.
>>>>>>>>>> >>
>>>>>>>>>> >> The reason to enumerate what is required is to see
>>>>>>>>>> where
>>>>>>>>>> that
>>>>>>>>>> >> information already exists and can be collected
>>>>>>>>>> from. The
>>>>>>>>>> VMError
>>>>>>>>>> >> report has to be very careful about what it does and
>>>>>>>>>> how it
>>>>>>>>>> does it
>>>>>>>>>> >> because of the fact we may have crashed and the general
>>>>>>>>>> process
>>>>>>>>>> state
>>>>>>>>>> >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>>>> information is
>>>>>>>>>> >> required from the available sources in whatever
>>>>>>>>>> order or
>>>>>>>>>> format
>>>>>>>>>> desired.
>>>>>>>>>> >>
>>>>>>>>>> >> I have no problem with this command, just how it has
>>>>>>>>>> been
>>>>>>>>>> proposed to
>>>>>>>>>> >> implement it.
>>>>>>>>>> >>
>>>>>>>>>> >> David
>>>>>>>>>> >>
>>>>>>>>>> >>> Kind Regards
>>>>>>>>>> >>> /Mattis
>>>>>>>>>> >>>
>>>>>>>>>> >>> -----Original Message-----
>>>>>>>>>> >>> From: David Holmes
>>>>>>>>>> >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>>> >>> To: Coleen Phillimore;
>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>> >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>> VM.info
>>>>>>>>>> to get
>>>>>>>>>> >>> hs_err print-out
>>>>>>>>>> >>>
>>>>>>>>>> >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>>> >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> So I actually disagree. I don't think there should
>>>>>>>>>> be an
>>>>>>>>>> >>>>>> additional separate mechanism to get the same
>>>>>>>>>> information
>>>>>>>>>> that we
>>>>>>>>>> >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>>>> feature like
>>>>>>>>>> >>>>>> this for a long time.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> I pre-reviewed this change and I thought it looked
>>>>>>>>>> good in
>>>>>>>>>> general.
>>>>>>>>>> >>>>>> I didn't see the thread iteration problem that Fred
>>>>>>>>>> refers to
>>>>>>>>>> >>>>>> below, but I think the individual problems can be
>>>>>>>>>> fixed.
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> The last thing I want is this code to be copied
>>>>>>>>>> somewhere else.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Factored as needed not copied. VMError is not an
>>>>>>>>>> "info"
>>>>>>>>>> reporting
>>>>>>>>>> >>>>> mechanism.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> If you look at the things that are reported in each
>>>>>>>>>> "STEP",
>>>>>>>>>> there's a
>>>>>>>>>> >>>> small amount of code and the order is important.
>>>>>>>>>> >>>
>>>>>>>>>> >>> The order is less important in an "info" request I
>>>>>>>>>> would
>>>>>>>>>> think.
>>>>>>>>>> >>>
>>>>>>>>>> >>>> I'd like the vm info to use the same order and report
>>>>>>>>>> what it
>>>>>>>>>> can do
>>>>>>>>>> >>>> safely. Refactoring 5 lines of code into functions
>>>>>>>>>> doesn't
>>>>>>>>>> make
>>>>>>>>>> >>>> sense.
>>>>>>>>>> >>>
>>>>>>>>>> >>> I need to consider exactly what it is the "info"
>>>>>>>>>> needs to
>>>>>>>>>> report in
>>>>>>>>>> >>> more detail. There are existing facilities (system
>>>>>>>>>> properties,
>>>>>>>>>> >>> management
>>>>>>>>>> >>> APIs) for various bits of runtime information, which
>>>>>>>>>> VMError can't
>>>>>>>>>> >>> utilize but a Dcmd can.
>>>>>>>>>> >>>
>>>>>>>>>> >>> David
>>>>>>>>>> >>>
>>>>>>>>>> >>>> Coleen
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> David
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>> Coleen
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>>> >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>>>> should
>>>>>>>>>> not be
>>>>>>>>>> >>>>>>> piggy-backed onto VMError handling for the reasons
>>>>>>>>>> stated
>>>>>>>>>> (and the
>>>>>>>>>> >>>>>>> error handling logic is complicated enough as it
>>>>>>>>>> is!). For
>>>>>>>>>> things
>>>>>>>>>> >>>>>>> like thread lists there are already safe
>>>>>>>>>> management
>>>>>>>>>> functions that
>>>>>>>>>> >>>>>>> can/should be used.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>> >>>>>>> David H.
>>>>>>>>>> >>>>>>>
>>>>>>>>>> >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>>> >>>>>>>> Hi David,
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>>>> concerned
>>>>>>>>>> with the
>>>>>>>>>> >>>>>>>> fact that the diagnostic command is calling
>>>>>>>>>> VMError::report().
>>>>>>>>>> >>>>>>>> This method has been implemented to be executed
>>>>>>>>>> in the
>>>>>>>>>> particular
>>>>>>>>>> >>>>>>>> context of fatal errors, and its usage while the
>>>>>>>>>> VM is
>>>>>>>>>> running
>>>>>>>>>> >>>>>>>> normally seems dangerous.
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> For instance, VMError::report() consciously
>>>>>>>>>> avoids
>>>>>>>>>> grabbing locks
>>>>>>>>>> >>>>>>>> because of the risk of deadlock during the error
>>>>>>>>>> reporting.
>>>>>>>>>> >>>>>>>> The consequence is that some data structures are
>>>>>>>>>> browsed in an
>>>>>>>>>> >>>>>>>> unsafe way. One example: VMError::report() calls
>>>>>>>>>> >>>>>>>> Threads::print_on_error() which iterates over the
>>>>>>>>>> thread list
>>>>>>>>>> >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> The implementation of the diagnostic command
>>>>>>>>>> seems
>>>>>>>>>> also to
>>>>>>>>>> >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>>>> VMError::report()
>>>>>>>>>> >>>>>>>> method. Have you considered implementing a new
>>>>>>>>>> MT-safe
>>>>>>>>>> reporting
>>>>>>>>>> >>>>>>>> method rather than trying to modify the special
>>>>>>>>>> VMError::report()
>>>>>>>>>> >>>>>>>> methods? (Note that some code factorization
>>>>>>>>>> between
>>>>>>>>>> >>>>>>>> VMError::report() and this new method should be
>>>>>>>>>> possible).
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> Fred
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>>> >>>>>>>>> Hi!
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> Please review my change for this small
>>>>>>>>>> enhancement.
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> bug:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>>> >>>>>>>>> webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>>> >>>>>>>>>
>>>>>>>>>> >>>>>>>>> Cheers,
>>>>>>>>>> >>>>>>>>> -Buck
>>>>>>>>>> >>>>>>>>
>>>>>>>>>> >>>>>>
>>>>>>>>>> >>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Bertrand Delsart, Grenoble Engineering
>>>>>>>>>> Center
>>>>>>>>>> Oracle, 180 av. de l'Europe, ZIRST de
>>>>>>>>>> Montbonnot
>>>>>>>>>> 38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>>> bertrand.delsart at oracle.com
>>>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>>> Phone : +33 4 76 18 81 23
>>>>>>>>>>
>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>
>>>>>>>>>> NOTICE: This email message is for the sole use of the
>>>>>>>>>> intended
>>>>>>>>>> recipient(s) and may contain confidential and privileged
>>>>>>>>>> information. Any unauthorized review, use, disclosure or
>>>>>>>>>> distribution is prohibited. If you are not the intended
>>>>>>>>>> recipient,
>>>>>>>>>> please contact the sender by reply email and destroy all
>>>>>>>>>> copies of
>>>>>>>>>> the original message.
>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
More information about the hotspot-runtime-dev
mailing list