RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)
Coleen Phillimore
coleen.phillimore at oracle.com
Fri Nov 13 19:19:58 UTC 2015
Thank you, Harold!
Coleen
On 11/13/15 1:13 PM, harold seigel wrote:
> Hi Coleen,
>
> Your latest webrev looks good.
>
> Harold
>
> On 11/13/2015 10:11 AM, Coleen Phillimore wrote:
>>
>> David, Thank you for looking at this again and our offline discussions.
>>
>> On 11/12/15 11:52 PM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> On 13/11/2015 7:41 AM, Coleen Phillimore wrote:
>>>>
>>>> Hi, I should learn my lesson that Frederic, Bertrand and David are
>>>> generally right.
>>>>
>>>> My restructuring went afoul of the STEP framework, and so I have made
>>>> VMError::print_vm_info() report the safe things that VMError::report()
>>>> reports. Hopefully, this will address the concern about safety. It
>>>> also allows us to remove the lock and static buffer in the VM.info
>>>> function. The duplication isn't as bad as I thought and is a better
>>>> solution than piggy backing from VMError::report().
>>>>
>>>> Please review:
>>>>
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>
>>> This looks good to me. A few small queries:
>>>
>>> 888 if (Universe::is_fully_initialized()) {
>>>
>>> I don't think it is possible to run a Dcmd before the VM is fully
>>> initialized.
>>
>> in the interest of keeping semi-duplicated code looking the similar,
>> and what Frederic says, I'd like to keep the test there. It's not
>> hurting anything.
>>>
>>> 889 Universe::heap()->print_on_error(st);
>>>
>>> Is this safe? I'm not sure what it does - print heap memory regions ??
>>
>> yes, it appears to be safe.
>>>
>>> 891 st->print_cr("Polling page: " INTPTR_FORMAT,
>>> p2i(os::get_polling_page()));
>>>
>>> Is this useful ??
>>
>> Sure, why not? Somebody added it to the hs_err_pid file for some
>> reason.
>>
>> Thanks!
>> Coleen
>>>
>>> Thanks,
>>> David
>>>
>>>> I tested with jcmd <pid> VM.info concurrently and with an inserted
>>>> crash
>>>> during VM.info call to make sure the original hs_err_pid report comes
>>>> out okay.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>>>>>
>>>>>
>>>>> On 11/10/15 7:12 PM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> I still do not like the intertwining of the info and error code, but
>>>>>> I won't say more on that.
>>>>>
>>>>> Yes, I understand the approach has good arguments on both sides. I'm
>>>>> trying to find a compromise that can move this feature forward.
>>>>>>
>>>>>> However you may now have a concurrency problem:
>>>>>>
>>>>>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>>>>> 843 void VMError::report_info(outputStream* st) {
>>>>>> 844
>>>>>> 845 // don't allocate large buffer on stack
>>>>>> 846 static char buf[O_BUFLEN];
>>>>>>
>>>>>> Can you have concurrent info requests in flight? For the info case
>>>>>> the buffer could be allocated on the stack I think, or even
>>>>>> dynamically. (Aside: will this be executed in the service thread?)
>>>>>
>>>>> There is a lock added to protect the VM.info dcmd, so there would be
>>>>> only sequential access to the static buffer. I don't think this is
>>>>> executed in the Service Thread. I see no evidence of this anyway.
>>>>>
>>>>>>
>>>>>> report_process could possibly be refactored further as there's a lot
>>>>>> of initial stuff that won't apply to the info case:
>>>>>> - we won't print all threads
>>>>>> - the VM state will always be "not at safepoint"
>>>>>> - The Universe must be initialized (not clear at what point during
>>>>>> shutdown a dcmd can still come in?)
>>>>>> - we don't print owned locks
>>>>>
>>>>> Yes, I can do this. It actually improves how this looks.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>>
>>>>>> That's all from me.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've done some refactoring and cleanup on this which the
>>>>>>> refactoring
>>>>>>> enabled. Again, it's important that the VM.info output be
>>>>>>> mostly the
>>>>>>> same order as the hs_err_pid* report file so that sustaining can
>>>>>>> easily
>>>>>>> match and find output that they are looking for. It's also
>>>>>>> important
>>>>>>> not to copy this code because there may be new information added
>>>>>>> that we
>>>>>>> also want in the VM.info output. If any future information that we
>>>>>>> want
>>>>>>> to print seems unsafe, the vm_info_cmd can be used to exclude it
>>>>>>> or it
>>>>>>> can be printed only in the error reporting function. Rejecting the
>>>>>>> entire change because of the theoretical unsafe addition of code
>>>>>>> isn't
>>>>>>> the right thing to do, although yeah, I've argued the other way
>>>>>>> in the
>>>>>>> past. This is something that is helpful to sustaining and our
>>>>>>> licensees.
>>>>>>>
>>>>>>> In this patch, the functions that VM.info calls are safe while
>>>>>>> the JVM
>>>>>>> is running, but the stepping is required because at crash time,
>>>>>>> they
>>>>>>> might not be. Although I've never seen the error reporting
>>>>>>> crash in
>>>>>>> these places.
>>>>>>>
>>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>
>>>>>>> This refactoring has been run through RBT.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>>>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>>>>>> Also note that some of the crash information is not
>>>>>>>>> interesting for
>>>>>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>>>>>> registers/stack data for any specific thread. If people want
>>>>>>>>> thread
>>>>>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>>>>>> system state.
>>>>>>>>
>>>>>>>> I'm a little surprised that thread info is not of interest, but
>>>>>>>> as you
>>>>>>>> say that is available by other means - which was part of my
>>>>>>>> earlier
>>>>>>>> point about there already being safe means to get some of the
>>>>>>>> info.
>>>>>>>>
>>>>>>>> If this is really only about the small subset of safe "system
>>>>>>>> information" that VMError reports, then I see even less
>>>>>>>> motivation to
>>>>>>>> piggy-back VM-info onto VMerror and entangle the two. If each
>>>>>>>> block of
>>>>>>>> information can be considered a single "step" in the error case
>>>>>>>> then
>>>>>>>> we can simply factor out the actual printing parts and share
>>>>>>>> them. If
>>>>>>>> the error case wants fine-grained steps then we can't do that.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Kind Regards
>>>>>>>>> /Mattis
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Coleen Phillimore
>>>>>>>>> Sent: den 3 november 2015 16:25
>>>>>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>>>>>> hs_err print-out
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've done a bit of refactoring with this change and have
>>>>>>>>> tested it
>>>>>>>>> with and without intentional crashes to the VM.info
>>>>>>>>> reporting. If it
>>>>>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>>>>>> where it comes from in the hs_err file. And the hs_err_pid
>>>>>>>>> file is
>>>>>>>>> correctly produced. If we can make the jcmd come with a
>>>>>>>>> warning that
>>>>>>>>> it *could* potentially crash the vm, I think this feature is a
>>>>>>>>> good
>>>>>>>>> addition for sustaining.
>>>>>>>>>
>>>>>>>>> Without crashing, VM.info gives a lot of information that we
>>>>>>>>> normally
>>>>>>>>> give with the hs_err file. The reason for the careful
>>>>>>>>> STEP'ping in
>>>>>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>>>>>> count on anything. The VM.info command is designed to be run
>>>>>>>>> on a
>>>>>>>>> moderately stable JVM so is less likely to crash on the things
>>>>>>>>> that
>>>>>>>>> we're printing out.
>>>>>>>>>
>>>>>>>>> I think we should fix the Threads walking code that Fred
>>>>>>>>> pointed out
>>>>>>>>> to take the Threads_lock, but I don't think we should *copy*
>>>>>>>>> code for
>>>>>>>>> VM.info or copy print_on_error functions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>>>>>> Thanks Thomas,
>>>>>>>>>>>
>>>>>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>>>>>> if we
>>>>>>>>>>> are not careful.
>>>>>>>>>>>
>>>>>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious
>>>>>>>>>>> source of
>>>>>>>>>>> potential crashes if you take into account the fact that the
>>>>>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>>>>>
>>>>>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>>>>>
>>>>>>>>>> I'm not sure of why we do not print the threads when there is
>>>>>>>>>> notion
>>>>>>>>>> of current thread (and I would wonder whether we should print
>>>>>>>>>> them for
>>>>>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>>>>>> crash.
>>>>>>>>>>
>>>>>>>>>> However, I still do not like it from a design point of view
>>>>>>>>>> for the
>>>>>>>>>> other reasons stated in my e-mail.
>>>>>>>>>>
>>>>>>>>>> Bertrand.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>>>>>> what
>>>>>>>>>>> it calls) has been designed.
>>>>>>>>>>>
>>>>>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient
>>>>>>>>>>> unless you
>>>>>>>>>>> pass that argument to all layers. This is the biggest design
>>>>>>>>>>> issue.
>>>>>>>>>>> For instance all calls to print_on_error could have this
>>>>>>>>>>> additional
>>>>>>>>>>> argument so that the different components can adapt their
>>>>>>>>>>> behavior
>>>>>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>>>>>> case,
>>>>>>>>>>> but the same if true for all methods called by
>>>>>>>>>>> VMError::report. I
>>>>>>>>>>> would go as far as adding a comment in all print_on_error
>>>>>>>>>>> methods
>>>>>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>>>>>> locks
>>>>>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>>>>>> VMError:report is supposed to skip.
>>>>>>>>>>> The current design, where the various components may not be
>>>>>>>>>>> aware
>>>>>>>>>>> that they are called for two different purposes, is not robust
>>>>>>>>>>> enough.
>>>>>>>>>>>
>>>>>>>>>>> Note also that being 'robust enough for the "crash" case' is
>>>>>>>>>>> not a
>>>>>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>>>>>> of the
>>>>>>>>>>> info STEP crashes the VM instead of just bringing us to the
>>>>>>>>>>> next
>>>>>>>>>>> error reporting STEP. Now, as stated before, I could agree
>>>>>>>>>>> if we
>>>>>>>>>>> use
>>>>>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Bertrand
>>>>>>>>>>>
>>>>>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>>>>>
>>>>>>>>>>>> However, there are some things to keep in mind when doing
>>>>>>>>>>>> this:
>>>>>>>>>>>>
>>>>>>>>>>>> Error handling is different between the "info-only" and
>>>>>>>>>>>> "crash"
>>>>>>>>>>>> case. In
>>>>>>>>>>>> "info-only" case, VMError::report() is called without
>>>>>>>>>>>> enclosing
>>>>>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>>>>>> handler
>>>>>>>>>>>> intact, not with the secondary signal handler. So, normal
>>>>>>>>>>>> signal
>>>>>>>>>>>> handling works, which is good. But it also means that any
>>>>>>>>>>>> error
>>>>>>>>>>>> inside one of the error reporting STEPs will bring down the
>>>>>>>>>>>> VM,
>>>>>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>>>>>> ("Error
>>>>>>>>>>>> occurred during error reporting").
>>>>>>>>>>>>
>>>>>>>>>>>> The problem is that a number of error reporting steps are a
>>>>>>>>>>>> bit
>>>>>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>>>>>> happen
>>>>>>>>>>>> is that the STEP will be interrupted. These risky steps
>>>>>>>>>>>> should not
>>>>>>>>>>>> be executed in the "only-info" case.
>>>>>>>>>>>>
>>>>>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>>>>>> "_vm_info_cmd"
>>>>>>>>>>>> and conditionally excludes dangerous or pointless error
>>>>>>>>>>>> reporting
>>>>>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>>>>>
>>>>>>>>>>>> As far as I can see, the other concerns concern only the
>>>>>>>>>>>> "crash"
>>>>>>>>>>>> case and simply do not apply to the "info-only" case. If an
>>>>>>>>>>>> error
>>>>>>>>>>>> reporting step is robust enough for the "crash" case (low
>>>>>>>>>>>> stack
>>>>>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it
>>>>>>>>>>>> should be
>>>>>>>>>>>> safe enough for the "info" case.
>>>>>>>>>>>>
>>>>>>>>>>>> Kind Regards, Thomas
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>>>>>> <mattis.castegren at oracle.com
>>>>>>>>>>>> <mailto:mattis.castegren at oracle.com>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi
>>>>>>>>>>>>
>>>>>>>>>>>> Of course, when I wrote "Review and push this fix" I
>>>>>>>>>>>> meant
>>>>>>>>>>>> doing a
>>>>>>>>>>>> thorough technical review. We do not want VM.info to be
>>>>>>>>>>>> unsafe, and
>>>>>>>>>>>> I will let David reply to the technical questions that
>>>>>>>>>>>> has
>>>>>>>>>>>> been
>>>>>>>>>>>> raised. I know he did consider some of these issues,
>>>>>>>>>>>> but last
>>>>>>>>>>>> week
>>>>>>>>>>>> he was on Java one, and this week he is out. I will
>>>>>>>>>>>> let him
>>>>>>>>>>>> comment
>>>>>>>>>>>> once he gets back.
>>>>>>>>>>>>
>>>>>>>>>>>> Kind Regards
>>>>>>>>>>>> /Mattis
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Bertrand Delsart
>>>>>>>>>>>> Sent: den 2 november 2015 10:13
>>>>>>>>>>>> To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>>>>>> to get
>>>>>>>>>>>> hs_err print-out
>>>>>>>>>>>>
>>>>>>>>>>>> On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>>>>> > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>>>>> >> Hi
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> I agree that there may be things we could add to
>>>>>>>>>>>> VM.info
>>>>>>>>>>>> that we
>>>>>>>>>>>> >> couldn't add in an hs_err file due to the fact
>>>>>>>>>>>> that we
>>>>>>>>>>>> are
>>>>>>>>>>>> crashing.
>>>>>>>>>>>> >> However, I really do believe that we should use
>>>>>>>>>>>> the same
>>>>>>>>>>>> code.
>>>>>>>>>>>> The
>>>>>>>>>>>> >> goal of the hs_err file is exactly the same as the
>>>>>>>>>>>> goal of
>>>>>>>>>>>> VM.info,
>>>>>>>>>>>> >> to print as much information about the system as
>>>>>>>>>>>> possible so
>>>>>>>>>>>> that
>>>>>>>>>>>> >> when we look at an hs_err file or the output from
>>>>>>>>>>>> VM.info,
>>>>>>>>>>>> we get as
>>>>>>>>>>>> >> much information about the system as possible.
>>>>>>>>>>>> That way
>>>>>>>>>>>> we can
>>>>>>>>>>>> reduce
>>>>>>>>>>>> >> the need to go back and forth with the customer to
>>>>>>>>>>>> ask
>>>>>>>>>>>> for more
>>>>>>>>>>>> information.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> If we enumerate what we need in VM.info, the list
>>>>>>>>>>>> would look
>>>>>>>>>>>> very
>>>>>>>>>>>> >> much like what we already do in VMError, and
>>>>>>>>>>>> therefore I
>>>>>>>>>>>> think we
>>>>>>>>>>>> >> should use the same code. If we find something
>>>>>>>>>>>> that we
>>>>>>>>>>>> would
>>>>>>>>>>>> want,
>>>>>>>>>>>> >> that is not currently printed in hs_err files, I
>>>>>>>>>>>> think we
>>>>>>>>>>>> have
>>>>>>>>>>>> two options:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> 1) Add it to hs_err files as well. If it is
>>>>>>>>>>>> interesting when
>>>>>>>>>>>> >> gathering customer information before a crash, it
>>>>>>>>>>>> probably is
>>>>>>>>>>>> >> interesting after a crash
>>>>>>>>>>>> >> 2) If it for some reason cannot be captured when
>>>>>>>>>>>> crashing,
>>>>>>>>>>>> add it to
>>>>>>>>>>>> >> the code with the condition that we don't print it
>>>>>>>>>>>> during
>>>>>>>>>>>> crash
>>>>>>>>>>>> time.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> I would therefore suggest the following plan of
>>>>>>>>>>>> action
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> 1) Review and push this fix to see if there are any
>>>>>>>>>>>> technical issues
>>>>>>>>>>>> >> with regards to thread safety, etc. I know David
>>>>>>>>>>>> has had
>>>>>>>>>>>> this in
>>>>>>>>>>>> mind
>>>>>>>>>>>> >> when doing the change. After that, we have a
>>>>>>>>>>>> command that
>>>>>>>>>>>> support can
>>>>>>>>>>>> >> use.
>>>>>>>>>>>> >> 2) Think about if there is any information
>>>>>>>>>>>> missing, both
>>>>>>>>>>>> from hs_err
>>>>>>>>>>>> >> files and from VM.info. Here, we may want to ask
>>>>>>>>>>>> support
>>>>>>>>>>>> about if
>>>>>>>>>>>> >> there is any additional information they need to
>>>>>>>>>>>> ask for
>>>>>>>>>>>> when they
>>>>>>>>>>>> >> get this data. If possible, this information
>>>>>>>>>>>> should be
>>>>>>>>>>>> added
>>>>>>>>>>>> to both
>>>>>>>>>>>> >> hs_err files and VM.info, we don't want support to
>>>>>>>>>>>> have to
>>>>>>>>>>>> ask for
>>>>>>>>>>>> >> VM.info if we already have an hs_err file. Of
>>>>>>>>>>>> course, if
>>>>>>>>>>>> there is
>>>>>>>>>>>> >> anything Dev would want to see when analyzing
>>>>>>>>>>>> crashes in
>>>>>>>>>>>> testing, we
>>>>>>>>>>>> >> should add that too.
>>>>>>>>>>>> >
>>>>>>>>>>>> > I have no issue with VMError and VM.Info using the
>>>>>>>>>>>> same
>>>>>>>>>>>> code
>>>>>>>>>>>> (where
>>>>>>>>>>>> > appropriate), but I think it wrong to try and
>>>>>>>>>>>> implement
>>>>>>>>>>>> VM.Info
>>>>>>>>>>>> > directly using the actual VMError mechanism. It is
>>>>>>>>>>>> crude and
>>>>>>>>>>>> makes it
>>>>>>>>>>>> > difficult to make independent changes to either
>>>>>>>>>>>> facility.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Sorry.
>>>>>>>>>>>> >
>>>>>>>>>>>> > David
>>>>>>>>>>>>
>>>>>>>>>>>> Furthermore, we cannot just "Review and push this fix to
>>>>>>>>>>>> see if
>>>>>>>>>>>> there are any technical issues". Two reviewers have
>>>>>>>>>>>> already
>>>>>>>>>>>> stated
>>>>>>>>>>>> that there are issues and these issues must be addressed
>>>>>>>>>>>> before
>>>>>>>>>>>> pushing.
>>>>>>>>>>>>
>>>>>>>>>>>> The answer cannot just be that we need this feature,
>>>>>>>>>>>> it must
>>>>>>>>>>>> address
>>>>>>>>>>>> the technical issues that were reported.
>>>>>>>>>>>>
>>>>>>>>>>>> One possible answer is that we all agree that VMInfo
>>>>>>>>>>>> would be
>>>>>>>>>>>> unsafe.
>>>>>>>>>>>> The method would have to be renamed UnsafeVMInfo and it
>>>>>>>>>>>> must be
>>>>>>>>>>>> clear that its 'impact' would be very risky, potentially
>>>>>>>>>>>> crashing
>>>>>>>>>>>> the JVM.
>>>>>>>>>>>>
>>>>>>>>>>>> If this is sufficient for your needs then we could
>>>>>>>>>>>> agree with
>>>>>>>>>>>> it. My
>>>>>>>>>>>> only additional requirement would be to clearly mark
>>>>>>>>>>>> that the
>>>>>>>>>>>> method
>>>>>>>>>>>> has been called so that we do not waste time on
>>>>>>>>>>>> non-issues if
>>>>>>>>>>>> calling VMInfo leads to a crash (or, even worse but less
>>>>>>>>>>>> likely, to
>>>>>>>>>>>> a silent data corruption that may lead to problems
>>>>>>>>>>>> later).
>>>>>>>>>>>> The bare
>>>>>>>>>>>> minimum would be to double check that a crash during an
>>>>>>>>>>>> execution of
>>>>>>>>>>>> VM.info is clearly identifiable (this is worth double
>>>>>>>>>>>> checking
>>>>>>>>>>>> because of the use of static buffers for the error
>>>>>>>>>>>> reporting,
>>>>>>>>>>>> see
>>>>>>>>>>>> below).
>>>>>>>>>>>>
>>>>>>>>>>>> If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>>>>>> wonder
>>>>>>>>>>>> whether a safe version of it is still a "small
>>>>>>>>>>>> enhancement".
>>>>>>>>>>>> A JEP
>>>>>>>>>>>> may be required.
>>>>>>>>>>>>
>>>>>>>>>>>> VMError::report and the methods it calls (print_on_error)
>>>>>>>>>>>> have been
>>>>>>>>>>>> written with assumptions clearly stated in the
>>>>>>>>>>>> comments above
>>>>>>>>>>>> VMError::report (see below the two most important ones).
>>>>>>>>>>>> HotSpot
>>>>>>>>>>>> error reporting strategy differed from JRockit. This
>>>>>>>>>>>> explains
>>>>>>>>>>>> why
>>>>>>>>>>>> the JRockit approach cannot be applied as is.
>>>>>>>>>>>>
>>>>>>>>>>>> I agree with David that you can try leveraging some of
>>>>>>>>>>>> the
>>>>>>>>>>>> logic
>>>>>>>>>>>> from VMError::report. However, to be sure there is no
>>>>>>>>>>>> misunderstanding, let's be clearer about David's "(where
>>>>>>>>>>>> appropriate)": we would for instance have to review
>>>>>>>>>>>> all the
>>>>>>>>>>>> called
>>>>>>>>>>>> methods, check which ones purposefully ignore locking
>>>>>>>>>>>> (potentially
>>>>>>>>>>>> crashing) and provide alternate code for the unsafe ones
>>>>>>>>>>>> (which may
>>>>>>>>>>>> not be trivial in some cases). As a simple example, a
>>>>>>>>>>>> method
>>>>>>>>>>>> called
>>>>>>>>>>>> "print_on_error" should not be called when we are not
>>>>>>>>>>>> in an
>>>>>>>>>>>> erroneous condition. However, in some components of the
>>>>>>>>>>>> JVM, the
>>>>>>>>>>>> low
>>>>>>>>>>>> level implementations of 'print_on_error' and
>>>>>>>>>>>> 'print_info_safely'
>>>>>>>>>>>> could share the same code.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Bertrand.
>>>>>>>>>>>>
>>>>>>>>>>>> Here are the important comments about VMError::report
>>>>>>>>>>>> assumptions:
>>>>>>>>>>>>
>>>>>>>>>>>> // In general, a hang in error handler is much worse
>>>>>>>>>>>> than a
>>>>>>>>>>>> crash or
>>>>>>>>>>>> internal
>>>>>>>>>>>> // error, as it's harder to recover from a hang.
>>>>>>>>>>>> Deadlock can
>>>>>>>>>>>> happen
>>>>>>>>>>>> if we
>>>>>>>>>>>> // try to grab a lock that is already owned by current
>>>>>>>>>>>> thread,
>>>>>>>>>>>> or if the
>>>>>>>>>>>> // owner is blocked forever (e.g. in
>>>>>>>>>>>> os::infinite_sleep()). If
>>>>>>>>>>>> possible, the
>>>>>>>>>>>> // error handler and all the functions it called
>>>>>>>>>>>> should avoid
>>>>>>>>>>>> grabbing any
>>>>>>>>>>>> // lock. An important thing to notice is that memory
>>>>>>>>>>>> allocation
>>>>>>>>>>>> needs a
>>>>>>>>>>>> lock.
>>>>>>>>>>>> //
>>>>>>>>>>>> // We should avoid using large stack allocated
>>>>>>>>>>>> buffers. Many
>>>>>>>>>>>> errors
>>>>>>>>>>>> happen
>>>>>>>>>>>> // when stack space is already low. Making things even
>>>>>>>>>>>> worse is
>>>>>>>>>>>> that
>>>>>>>>>>>> there
>>>>>>>>>>>> // could be nested report_and_die() calls on stack (see
>>>>>>>>>>>> above).
>>>>>>>>>>>> Only one
>>>>>>>>>>>> // thread can report error, so large buffers are
>>>>>>>>>>>> statically
>>>>>>>>>>>> allocated in
>>>>>>>>>>>> data
>>>>>>>>>>>> // segment.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >> Kind Regards
>>>>>>>>>>>> >> /Mattis
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> -----Original Message-----
>>>>>>>>>>>> >> From: David Holmes
>>>>>>>>>>>> >> Sent: den 29 oktober 2015 23:20
>>>>>>>>>>>> >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>>> >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>> >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>>> VM.info to
>>>>>>>>>>>> get
>>>>>>>>>>>> >> hs_err print-out
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>>>>> >>> Some background: We have this command in JRockit.
>>>>>>>>>>>> The
>>>>>>>>>>>> information you
>>>>>>>>>>>> >>> gather when you crash to give a good summary of what
>>>>>>>>>>>> system
>>>>>>>>>>>> you run
>>>>>>>>>>>> >>> on is pretty much exactly the information you
>>>>>>>>>>>> need to
>>>>>>>>>>>> get a
>>>>>>>>>>>> good
>>>>>>>>>>>> >>> summary on a system that has not crashed. The
>>>>>>>>>>>> JRockit
>>>>>>>>>>>> command is
>>>>>>>>>>>> >>> extremely useful for support, and saves a lot of
>>>>>>>>>>>> work
>>>>>>>>>>>> going
>>>>>>>>>>>> back and
>>>>>>>>>>>> >>> forth asking about system information.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Also, if we write something new in the hs_err file,
>>>>>>>>>>>> like if
>>>>>>>>>>>> there has
>>>>>>>>>>>> >>> been any out of memory errors, we often would want
>>>>>>>>>>>> the same
>>>>>>>>>>>> >>> information in the VM.info output. From my
>>>>>>>>>>>> experience in
>>>>>>>>>>>> >>> Sustaining/Support, I can't think of any
>>>>>>>>>>>> information you
>>>>>>>>>>>> would want
>>>>>>>>>>>> >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>>>>>> file
>>>>>>>>>>>> and the
>>>>>>>>>>>> >>> other way around, apart from details about the crash
>>>>>>>>>>>> (obviously).
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> I don't see a reason to exactly enumerate what
>>>>>>>>>>>> information
>>>>>>>>>>>> VM.info
>>>>>>>>>>>> >>> should provide. From a sustaining/support
>>>>>>>>>>>> perspective, we
>>>>>>>>>>>> want a
>>>>>>>>>>>> >>> one-stop command to gather as much useful
>>>>>>>>>>>> information as
>>>>>>>>>>>> possible,
>>>>>>>>>>>> >>> which is the same idea we have for the hs_err file.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> The reason to enumerate what is required is to see
>>>>>>>>>>>> where
>>>>>>>>>>>> that
>>>>>>>>>>>> >> information already exists and can be collected
>>>>>>>>>>>> from. The
>>>>>>>>>>>> VMError
>>>>>>>>>>>> >> report has to be very careful about what it does and
>>>>>>>>>>>> how it
>>>>>>>>>>>> does it
>>>>>>>>>>>> >> because of the fact we may have crashed and the
>>>>>>>>>>>> general
>>>>>>>>>>>> process
>>>>>>>>>>>> state
>>>>>>>>>>>> >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>>>>>> information is
>>>>>>>>>>>> >> required from the available sources in whatever
>>>>>>>>>>>> order or
>>>>>>>>>>>> format
>>>>>>>>>>>> desired.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> I have no problem with this command, just how it
>>>>>>>>>>>> has been
>>>>>>>>>>>> proposed to
>>>>>>>>>>>> >> implement it.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> David
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>> Kind Regards
>>>>>>>>>>>> >>> /Mattis
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> -----Original Message-----
>>>>>>>>>>>> >>> From: David Holmes
>>>>>>>>>>>> >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>>>>> >>> To: Coleen Phillimore;
>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>> >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>>> VM.info
>>>>>>>>>>>> to get
>>>>>>>>>>>> >>> hs_err print-out
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>>>>> >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>> >>>>>> So I actually disagree. I don't think there
>>>>>>>>>>>> should
>>>>>>>>>>>> be an
>>>>>>>>>>>> >>>>>> additional separate mechanism to get the same
>>>>>>>>>>>> information
>>>>>>>>>>>> that we
>>>>>>>>>>>> >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>>>>>> feature like
>>>>>>>>>>>> >>>>>> this for a long time.
>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>> >>>>>> I pre-reviewed this change and I thought it
>>>>>>>>>>>> looked
>>>>>>>>>>>> good in
>>>>>>>>>>>> general.
>>>>>>>>>>>> >>>>>> I didn't see the thread iteration problem that
>>>>>>>>>>>> Fred
>>>>>>>>>>>> refers to
>>>>>>>>>>>> >>>>>> below, but I think the individual problems can be
>>>>>>>>>>>> fixed.
>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>> >>>>>> The last thing I want is this code to be copied
>>>>>>>>>>>> somewhere else.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> Factored as needed not copied. VMError is not an
>>>>>>>>>>>> "info"
>>>>>>>>>>>> reporting
>>>>>>>>>>>> >>>>> mechanism.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> If you look at the things that are reported in each
>>>>>>>>>>>> "STEP",
>>>>>>>>>>>> there's a
>>>>>>>>>>>> >>>> small amount of code and the order is important.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> The order is less important in an "info" request
>>>>>>>>>>>> I would
>>>>>>>>>>>> think.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>> I'd like the vm info to use the same order and
>>>>>>>>>>>> report
>>>>>>>>>>>> what it
>>>>>>>>>>>> can do
>>>>>>>>>>>> >>>> safely. Refactoring 5 lines of code into
>>>>>>>>>>>> functions
>>>>>>>>>>>> doesn't
>>>>>>>>>>>> make
>>>>>>>>>>>> >>>> sense.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> I need to consider exactly what it is the "info"
>>>>>>>>>>>> needs to
>>>>>>>>>>>> report in
>>>>>>>>>>>> >>> more detail. There are existing facilities (system
>>>>>>>>>>>> properties,
>>>>>>>>>>>> >>> management
>>>>>>>>>>>> >>> APIs) for various bits of runtime information, which
>>>>>>>>>>>> VMError can't
>>>>>>>>>>>> >>> utilize but a Dcmd can.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> David
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>> Coleen
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> David
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>> Coleen
>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>> >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>>>>> >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>>>>>> should
>>>>>>>>>>>> not be
>>>>>>>>>>>> >>>>>>> piggy-backed onto VMError handling for the
>>>>>>>>>>>> reasons
>>>>>>>>>>>> stated
>>>>>>>>>>>> (and the
>>>>>>>>>>>> >>>>>>> error handling logic is complicated enough as it
>>>>>>>>>>>> is!). For
>>>>>>>>>>>> things
>>>>>>>>>>>> >>>>>>> like thread lists there are already safe
>>>>>>>>>>>> management
>>>>>>>>>>>> functions that
>>>>>>>>>>>> >>>>>>> can/should be used.
>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>>>> >>>>>>> David H.
>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>> >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>>>>> >>>>>>>> Hi David,
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>>>>>> concerned
>>>>>>>>>>>> with the
>>>>>>>>>>>> >>>>>>>> fact that the diagnostic command is calling
>>>>>>>>>>>> VMError::report().
>>>>>>>>>>>> >>>>>>>> This method has been implemented to be executed
>>>>>>>>>>>> in the
>>>>>>>>>>>> particular
>>>>>>>>>>>> >>>>>>>> context of fatal errors, and its usage while
>>>>>>>>>>>> the
>>>>>>>>>>>> VM is
>>>>>>>>>>>> running
>>>>>>>>>>>> >>>>>>>> normally seems dangerous.
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> For instance, VMError::report() consciously
>>>>>>>>>>>> avoids
>>>>>>>>>>>> grabbing locks
>>>>>>>>>>>> >>>>>>>> because of the risk of deadlock during the
>>>>>>>>>>>> error
>>>>>>>>>>>> reporting.
>>>>>>>>>>>> >>>>>>>> The consequence is that some data structures
>>>>>>>>>>>> are
>>>>>>>>>>>> browsed in an
>>>>>>>>>>>> >>>>>>>> unsafe way. One example: VMError::report()
>>>>>>>>>>>> calls
>>>>>>>>>>>> >>>>>>>> Threads::print_on_error() which iterates
>>>>>>>>>>>> over the
>>>>>>>>>>>> thread list
>>>>>>>>>>>> >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> The implementation of the diagnostic command
>>>>>>>>>>>> seems
>>>>>>>>>>>> also to
>>>>>>>>>>>> >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>>>>>> VMError::report()
>>>>>>>>>>>> >>>>>>>> method. Have you considered implementing a new
>>>>>>>>>>>> MT-safe
>>>>>>>>>>>> reporting
>>>>>>>>>>>> >>>>>>>> method rather than trying to modify the special
>>>>>>>>>>>> VMError::report()
>>>>>>>>>>>> >>>>>>>> methods? (Note that some code factorization
>>>>>>>>>>>> between
>>>>>>>>>>>> >>>>>>>> VMError::report() and this new method should be
>>>>>>>>>>>> possible).
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> Thanks,
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> Fred
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>>>>> >>>>>>>>> Hi!
>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>> >>>>>>>>> Please review my change for this small
>>>>>>>>>>>> enhancement.
>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>> >>>>>>>>> bug:
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>>>>> >>>>>>>>> webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>> >>>>>>>>> Cheers,
>>>>>>>>>>>> >>>>>>>>> -Buck
>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Bertrand Delsart, Grenoble Engineering
>>>>>>>>>>>> Center
>>>>>>>>>>>> Oracle, 180 av. de l'Europe, ZIRST de
>>>>>>>>>>>> Montbonnot
>>>>>>>>>>>> 38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>>>>> bertrand.delsart at oracle.com
>>>>>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>>>>> Phone : +33 4 76 18 81 23
>>>>>>>>>>>>
>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>
>>>>>>>>>>>> NOTICE: This email message is for the sole use of the
>>>>>>>>>>>> intended
>>>>>>>>>>>> recipient(s) and may contain confidential and privileged
>>>>>>>>>>>> information. Any unauthorized review, use, disclosure or
>>>>>>>>>>>> distribution is prohibited. If you are not the intended
>>>>>>>>>>>> recipient,
>>>>>>>>>>>> please contact the sender by reply email and destroy all
>>>>>>>>>>>> copies of
>>>>>>>>>>>> the original message.
>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>
More information about the hotspot-runtime-dev
mailing list