RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)
Coleen Phillimore
coleen.phillimore at oracle.com
Thu Nov 12 21:41:02 UTC 2015
Hi, I should learn my lesson that Frederic, Bertrand and David are
generally right.
My restructuring went afoul of the STEP framework, and so I have made
VMError::print_vm_info() report the safe things that VMError::report()
reports. Hopefully, this will address the concern about safety. It
also allows us to remove the lock and static buffer in the VM.info
function. The duplication isn't as bad as I thought and is a better
solution than piggy backing from VMError::report().
Please review:
open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
bug link https://bugs.openjdk.java.net/browse/JDK-8027429
I tested with jcmd <pid> VM.info concurrently and with an inserted crash
during VM.info call to make sure the original hs_err_pid report comes
out okay.
Thanks,
Coleen
On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>
>
> On 11/10/15 7:12 PM, David Holmes wrote:
>> Hi Coleen,
>>
>> I still do not like the intertwining of the info and error code, but
>> I won't say more on that.
>
> Yes, I understand the approach has good arguments on both sides. I'm
> trying to find a compromise that can move this feature forward.
>>
>> However you may now have a concurrency problem:
>>
>> 842 // Report for the vm_info_cmd, skipping crash specific code
>> 843 void VMError::report_info(outputStream* st) {
>> 844
>> 845 // don't allocate large buffer on stack
>> 846 static char buf[O_BUFLEN];
>>
>> Can you have concurrent info requests in flight? For the info case
>> the buffer could be allocated on the stack I think, or even
>> dynamically. (Aside: will this be executed in the service thread?)
>
> There is a lock added to protect the VM.info dcmd, so there would be
> only sequential access to the static buffer. I don't think this is
> executed in the Service Thread. I see no evidence of this anyway.
>
>>
>> report_process could possibly be refactored further as there's a lot
>> of initial stuff that won't apply to the info case:
>> - we won't print all threads
>> - the VM state will always be "not at safepoint"
>> - The Universe must be initialized (not clear at what point during
>> shutdown a dcmd can still come in?)
>> - we don't print owned locks
>
> Yes, I can do this. It actually improves how this looks.
>
> Thanks,
> Coleen
>>
>> That's all from me.
>>
>> Thanks,
>> David
>>
>>
>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>
>>> Hi,
>>>
>>> I've done some refactoring and cleanup on this which the refactoring
>>> enabled. Again, it's important that the VM.info output be mostly the
>>> same order as the hs_err_pid* report file so that sustaining can easily
>>> match and find output that they are looking for. It's also important
>>> not to copy this code because there may be new information added
>>> that we
>>> also want in the VM.info output. If any future information that we
>>> want
>>> to print seems unsafe, the vm_info_cmd can be used to exclude it or it
>>> can be printed only in the error reporting function. Rejecting the
>>> entire change because of the theoretical unsafe addition of code isn't
>>> the right thing to do, although yeah, I've argued the other way in the
>>> past. This is something that is helpful to sustaining and our
>>> licensees.
>>>
>>> In this patch, the functions that VM.info calls are safe while the JVM
>>> is running, but the stepping is required because at crash time, they
>>> might not be. Although I've never seen the error reporting crash in
>>> these places.
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>
>>> This refactoring has been run through RBT.
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>> Also note that some of the crash information is not interesting for
>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>> registers/stack data for any specific thread. If people want thread
>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>> system state.
>>>>
>>>> I'm a little surprised that thread info is not of interest, but as you
>>>> say that is available by other means - which was part of my earlier
>>>> point about there already being safe means to get some of the info.
>>>>
>>>> If this is really only about the small subset of safe "system
>>>> information" that VMError reports, then I see even less motivation to
>>>> piggy-back VM-info onto VMerror and entangle the two. If each block of
>>>> information can be considered a single "step" in the error case then
>>>> we can simply factor out the actual printing parts and share them. If
>>>> the error case wants fine-grained steps then we can't do that.
>>>>
>>>> David
>>>>
>>>>> Kind Regards
>>>>> /Mattis
>>>>>
>>>>> -----Original Message-----
>>>>> From: Coleen Phillimore
>>>>> Sent: den 3 november 2015 16:25
>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>> hs_err print-out
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've done a bit of refactoring with this change and have tested it
>>>>> with and without intentional crashes to the VM.info reporting. If it
>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>> where it comes from in the hs_err file. And the hs_err_pid file is
>>>>> correctly produced. If we can make the jcmd come with a warning that
>>>>> it *could* potentially crash the vm, I think this feature is a good
>>>>> addition for sustaining.
>>>>>
>>>>> Without crashing, VM.info gives a lot of information that we normally
>>>>> give with the hs_err file. The reason for the careful STEP'ping in
>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>> count on anything. The VM.info command is designed to be run on a
>>>>> moderately stable JVM so is less likely to crash on the things that
>>>>> we're printing out.
>>>>>
>>>>> I think we should fix the Threads walking code that Fred pointed out
>>>>> to take the Threads_lock, but I don't think we should *copy* code for
>>>>> VM.info or copy print_on_error functions.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>> Thanks Thomas,
>>>>>>>
>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>> if we
>>>>>>> are not careful.
>>>>>>>
>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious source of
>>>>>>> potential crashes if you take into account the fact that the
>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>
>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>
>>>>>> I'm not sure of why we do not print the threads when there is notion
>>>>>> of current thread (and I would wonder whether we should print
>>>>>> them for
>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>> crash.
>>>>>>
>>>>>> However, I still do not like it from a design point of view for the
>>>>>> other reasons stated in my e-mail.
>>>>>>
>>>>>> Bertrand.
>>>>>>
>>>>>>>
>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>> what
>>>>>>> it calls) has been designed.
>>>>>>>
>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient unless you
>>>>>>> pass that argument to all layers. This is the biggest design issue.
>>>>>>> For instance all calls to print_on_error could have this additional
>>>>>>> argument so that the different components can adapt their behavior
>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>> case,
>>>>>>> but the same if true for all methods called by VMError::report. I
>>>>>>> would go as far as adding a comment in all print_on_error methods
>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>> locks
>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>> VMError:report is supposed to skip.
>>>>>>> The current design, where the various components may not be aware
>>>>>>> that they are called for two different purposes, is not robust
>>>>>>> enough.
>>>>>>>
>>>>>>> Note also that being 'robust enough for the "crash" case' is not a
>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>> of the
>>>>>>> info STEP crashes the VM instead of just bringing us to the next
>>>>>>> error reporting STEP. Now, as stated before, I could agree if we
>>>>>>> use
>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Bertrand
>>>>>>>
>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>
>>>>>>>> However, there are some things to keep in mind when doing this:
>>>>>>>>
>>>>>>>> Error handling is different between the "info-only" and "crash"
>>>>>>>> case. In
>>>>>>>> "info-only" case, VMError::report() is called without enclosing
>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>> handler
>>>>>>>> intact, not with the secondary signal handler. So, normal signal
>>>>>>>> handling works, which is good. But it also means that any error
>>>>>>>> inside one of the error reporting STEPs will bring down the VM,
>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>> ("Error
>>>>>>>> occurred during error reporting").
>>>>>>>>
>>>>>>>> The problem is that a number of error reporting steps are a bit
>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>> happen
>>>>>>>> is that the STEP will be interrupted. These risky steps should not
>>>>>>>> be executed in the "only-info" case.
>>>>>>>>
>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>> "_vm_info_cmd"
>>>>>>>> and conditionally excludes dangerous or pointless error reporting
>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>
>>>>>>>> As far as I can see, the other concerns concern only the "crash"
>>>>>>>> case and simply do not apply to the "info-only" case. If an error
>>>>>>>> reporting step is robust enough for the "crash" case (low stack
>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it should be
>>>>>>>> safe enough for the "info" case.
>>>>>>>>
>>>>>>>> Kind Regards, Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>> <mattis.castegren at oracle.com <mailto:mattis.castegren at oracle.com>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Of course, when I wrote "Review and push this fix" I meant
>>>>>>>> doing a
>>>>>>>> thorough technical review. We do not want VM.info to be
>>>>>>>> unsafe, and
>>>>>>>> I will let David reply to the technical questions that has
>>>>>>>> been
>>>>>>>> raised. I know he did consider some of these issues, but last
>>>>>>>> week
>>>>>>>> he was on Java one, and this week he is out. I will let him
>>>>>>>> comment
>>>>>>>> once he gets back.
>>>>>>>>
>>>>>>>> Kind Regards
>>>>>>>> /Mattis
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Bertrand Delsart
>>>>>>>> Sent: den 2 november 2015 10:13
>>>>>>>> To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>> to get
>>>>>>>> hs_err print-out
>>>>>>>>
>>>>>>>> On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>> > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>> >> Hi
>>>>>>>> >>
>>>>>>>> >> I agree that there may be things we could add to VM.info
>>>>>>>> that we
>>>>>>>> >> couldn't add in an hs_err file due to the fact that we
>>>>>>>> are
>>>>>>>> crashing.
>>>>>>>> >> However, I really do believe that we should use the same
>>>>>>>> code.
>>>>>>>> The
>>>>>>>> >> goal of the hs_err file is exactly the same as the
>>>>>>>> goal of
>>>>>>>> VM.info,
>>>>>>>> >> to print as much information about the system as
>>>>>>>> possible so
>>>>>>>> that
>>>>>>>> >> when we look at an hs_err file or the output from
>>>>>>>> VM.info,
>>>>>>>> we get as
>>>>>>>> >> much information about the system as possible. That way
>>>>>>>> we can
>>>>>>>> reduce
>>>>>>>> >> the need to go back and forth with the customer to ask
>>>>>>>> for more
>>>>>>>> information.
>>>>>>>> >>
>>>>>>>> >> If we enumerate what we need in VM.info, the list
>>>>>>>> would look
>>>>>>>> very
>>>>>>>> >> much like what we already do in VMError, and therefore I
>>>>>>>> think we
>>>>>>>> >> should use the same code. If we find something that we
>>>>>>>> would
>>>>>>>> want,
>>>>>>>> >> that is not currently printed in hs_err files, I think we
>>>>>>>> have
>>>>>>>> two options:
>>>>>>>> >>
>>>>>>>> >> 1) Add it to hs_err files as well. If it is
>>>>>>>> interesting when
>>>>>>>> >> gathering customer information before a crash, it
>>>>>>>> probably is
>>>>>>>> >> interesting after a crash
>>>>>>>> >> 2) If it for some reason cannot be captured when
>>>>>>>> crashing,
>>>>>>>> add it to
>>>>>>>> >> the code with the condition that we don't print it during
>>>>>>>> crash
>>>>>>>> time.
>>>>>>>> >>
>>>>>>>> >> I would therefore suggest the following plan of action
>>>>>>>> >>
>>>>>>>> >> 1) Review and push this fix to see if there are any
>>>>>>>> technical issues
>>>>>>>> >> with regards to thread safety, etc. I know David has had
>>>>>>>> this in
>>>>>>>> mind
>>>>>>>> >> when doing the change. After that, we have a command that
>>>>>>>> support can
>>>>>>>> >> use.
>>>>>>>> >> 2) Think about if there is any information missing, both
>>>>>>>> from hs_err
>>>>>>>> >> files and from VM.info. Here, we may want to ask support
>>>>>>>> about if
>>>>>>>> >> there is any additional information they need to ask for
>>>>>>>> when they
>>>>>>>> >> get this data. If possible, this information should be
>>>>>>>> added
>>>>>>>> to both
>>>>>>>> >> hs_err files and VM.info, we don't want support to
>>>>>>>> have to
>>>>>>>> ask for
>>>>>>>> >> VM.info if we already have an hs_err file. Of course, if
>>>>>>>> there is
>>>>>>>> >> anything Dev would want to see when analyzing crashes in
>>>>>>>> testing, we
>>>>>>>> >> should add that too.
>>>>>>>> >
>>>>>>>> > I have no issue with VMError and VM.Info using the same
>>>>>>>> code
>>>>>>>> (where
>>>>>>>> > appropriate), but I think it wrong to try and implement
>>>>>>>> VM.Info
>>>>>>>> > directly using the actual VMError mechanism. It is
>>>>>>>> crude and
>>>>>>>> makes it
>>>>>>>> > difficult to make independent changes to either facility.
>>>>>>>> >
>>>>>>>> > Sorry.
>>>>>>>> >
>>>>>>>> > David
>>>>>>>>
>>>>>>>> Furthermore, we cannot just "Review and push this fix to
>>>>>>>> see if
>>>>>>>> there are any technical issues". Two reviewers have already
>>>>>>>> stated
>>>>>>>> that there are issues and these issues must be addressed
>>>>>>>> before
>>>>>>>> pushing.
>>>>>>>>
>>>>>>>> The answer cannot just be that we need this feature, it must
>>>>>>>> address
>>>>>>>> the technical issues that were reported.
>>>>>>>>
>>>>>>>> One possible answer is that we all agree that VMInfo would be
>>>>>>>> unsafe.
>>>>>>>> The method would have to be renamed UnsafeVMInfo and it
>>>>>>>> must be
>>>>>>>> clear that its 'impact' would be very risky, potentially
>>>>>>>> crashing
>>>>>>>> the JVM.
>>>>>>>>
>>>>>>>> If this is sufficient for your needs then we could agree with
>>>>>>>> it. My
>>>>>>>> only additional requirement would be to clearly mark that the
>>>>>>>> method
>>>>>>>> has been called so that we do not waste time on non-issues if
>>>>>>>> calling VMInfo leads to a crash (or, even worse but less
>>>>>>>> likely, to
>>>>>>>> a silent data corruption that may lead to problems later).
>>>>>>>> The bare
>>>>>>>> minimum would be to double check that a crash during an
>>>>>>>> execution of
>>>>>>>> VM.info is clearly identifiable (this is worth double
>>>>>>>> checking
>>>>>>>> because of the use of static buffers for the error reporting,
>>>>>>>> see
>>>>>>>> below).
>>>>>>>>
>>>>>>>> If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>> wonder
>>>>>>>> whether a safe version of it is still a "small enhancement".
>>>>>>>> A JEP
>>>>>>>> may be required.
>>>>>>>>
>>>>>>>> VMError::report and the methods it calls (print_on_error)
>>>>>>>> have been
>>>>>>>> written with assumptions clearly stated in the comments above
>>>>>>>> VMError::report (see below the two most important ones).
>>>>>>>> HotSpot
>>>>>>>> error reporting strategy differed from JRockit. This explains
>>>>>>>> why
>>>>>>>> the JRockit approach cannot be applied as is.
>>>>>>>>
>>>>>>>> I agree with David that you can try leveraging some of the
>>>>>>>> logic
>>>>>>>> from VMError::report. However, to be sure there is no
>>>>>>>> misunderstanding, let's be clearer about David's "(where
>>>>>>>> appropriate)": we would for instance have to review all the
>>>>>>>> called
>>>>>>>> methods, check which ones purposefully ignore locking
>>>>>>>> (potentially
>>>>>>>> crashing) and provide alternate code for the unsafe ones
>>>>>>>> (which may
>>>>>>>> not be trivial in some cases). As a simple example, a method
>>>>>>>> called
>>>>>>>> "print_on_error" should not be called when we are not in an
>>>>>>>> erroneous condition. However, in some components of the
>>>>>>>> JVM, the
>>>>>>>> low
>>>>>>>> level implementations of 'print_on_error' and
>>>>>>>> 'print_info_safely'
>>>>>>>> could share the same code.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Bertrand.
>>>>>>>>
>>>>>>>> Here are the important comments about VMError::report
>>>>>>>> assumptions:
>>>>>>>>
>>>>>>>> // In general, a hang in error handler is much worse than a
>>>>>>>> crash or
>>>>>>>> internal
>>>>>>>> // error, as it's harder to recover from a hang. Deadlock can
>>>>>>>> happen
>>>>>>>> if we
>>>>>>>> // try to grab a lock that is already owned by current
>>>>>>>> thread,
>>>>>>>> or if the
>>>>>>>> // owner is blocked forever (e.g. in
>>>>>>>> os::infinite_sleep()). If
>>>>>>>> possible, the
>>>>>>>> // error handler and all the functions it called should avoid
>>>>>>>> grabbing any
>>>>>>>> // lock. An important thing to notice is that memory
>>>>>>>> allocation
>>>>>>>> needs a
>>>>>>>> lock.
>>>>>>>> //
>>>>>>>> // We should avoid using large stack allocated buffers. Many
>>>>>>>> errors
>>>>>>>> happen
>>>>>>>> // when stack space is already low. Making things even
>>>>>>>> worse is
>>>>>>>> that
>>>>>>>> there
>>>>>>>> // could be nested report_and_die() calls on stack (see
>>>>>>>> above).
>>>>>>>> Only one
>>>>>>>> // thread can report error, so large buffers are statically
>>>>>>>> allocated in
>>>>>>>> data
>>>>>>>> // segment.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >> Kind Regards
>>>>>>>> >> /Mattis
>>>>>>>> >>
>>>>>>>> >> -----Original Message-----
>>>>>>>> >> From: David Holmes
>>>>>>>> >> Sent: den 29 oktober 2015 23:20
>>>>>>>> >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>> >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>> >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>> VM.info to
>>>>>>>> get
>>>>>>>> >> hs_err print-out
>>>>>>>> >>
>>>>>>>> >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>> >>> Some background: We have this command in JRockit. The
>>>>>>>> information you
>>>>>>>> >>> gather when you crash to give a good summary of what
>>>>>>>> system
>>>>>>>> you run
>>>>>>>> >>> on is pretty much exactly the information you need to
>>>>>>>> get a
>>>>>>>> good
>>>>>>>> >>> summary on a system that has not crashed. The JRockit
>>>>>>>> command is
>>>>>>>> >>> extremely useful for support, and saves a lot of work
>>>>>>>> going
>>>>>>>> back and
>>>>>>>> >>> forth asking about system information.
>>>>>>>> >>>
>>>>>>>> >>> Also, if we write something new in the hs_err file,
>>>>>>>> like if
>>>>>>>> there has
>>>>>>>> >>> been any out of memory errors, we often would want
>>>>>>>> the same
>>>>>>>> >>> information in the VM.info output. From my experience in
>>>>>>>> >>> Sustaining/Support, I can't think of any information you
>>>>>>>> would want
>>>>>>>> >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>> file
>>>>>>>> and the
>>>>>>>> >>> other way around, apart from details about the crash
>>>>>>>> (obviously).
>>>>>>>> >>>
>>>>>>>> >>> I don't see a reason to exactly enumerate what
>>>>>>>> information
>>>>>>>> VM.info
>>>>>>>> >>> should provide. From a sustaining/support
>>>>>>>> perspective, we
>>>>>>>> want a
>>>>>>>> >>> one-stop command to gather as much useful information as
>>>>>>>> possible,
>>>>>>>> >>> which is the same idea we have for the hs_err file.
>>>>>>>> >>
>>>>>>>> >> The reason to enumerate what is required is to see where
>>>>>>>> that
>>>>>>>> >> information already exists and can be collected from. The
>>>>>>>> VMError
>>>>>>>> >> report has to be very careful about what it does and
>>>>>>>> how it
>>>>>>>> does it
>>>>>>>> >> because of the fact we may have crashed and the general
>>>>>>>> process
>>>>>>>> state
>>>>>>>> >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>> information is
>>>>>>>> >> required from the available sources in whatever order or
>>>>>>>> format
>>>>>>>> desired.
>>>>>>>> >>
>>>>>>>> >> I have no problem with this command, just how it has been
>>>>>>>> proposed to
>>>>>>>> >> implement it.
>>>>>>>> >>
>>>>>>>> >> David
>>>>>>>> >>
>>>>>>>> >>> Kind Regards
>>>>>>>> >>> /Mattis
>>>>>>>> >>>
>>>>>>>> >>> -----Original Message-----
>>>>>>>> >>> From: David Holmes
>>>>>>>> >>> Sent: den 29 oktober 2015 13:08
>>>>>>>> >>> To: Coleen Phillimore;
>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>> >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>> VM.info
>>>>>>>> to get
>>>>>>>> >>> hs_err print-out
>>>>>>>> >>>
>>>>>>>> >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>> >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> So I actually disagree. I don't think there should
>>>>>>>> be an
>>>>>>>> >>>>>> additional separate mechanism to get the same
>>>>>>>> information
>>>>>>>> that we
>>>>>>>> >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>> feature like
>>>>>>>> >>>>>> this for a long time.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> I pre-reviewed this change and I thought it looked
>>>>>>>> good in
>>>>>>>> general.
>>>>>>>> >>>>>> I didn't see the thread iteration problem that Fred
>>>>>>>> refers to
>>>>>>>> >>>>>> below, but I think the individual problems can be
>>>>>>>> fixed.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> The last thing I want is this code to be copied
>>>>>>>> somewhere else.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Factored as needed not copied. VMError is not an
>>>>>>>> "info"
>>>>>>>> reporting
>>>>>>>> >>>>> mechanism.
>>>>>>>> >>>>
>>>>>>>> >>>> If you look at the things that are reported in each
>>>>>>>> "STEP",
>>>>>>>> there's a
>>>>>>>> >>>> small amount of code and the order is important.
>>>>>>>> >>>
>>>>>>>> >>> The order is less important in an "info" request I would
>>>>>>>> think.
>>>>>>>> >>>
>>>>>>>> >>>> I'd like the vm info to use the same order and report
>>>>>>>> what it
>>>>>>>> can do
>>>>>>>> >>>> safely. Refactoring 5 lines of code into functions
>>>>>>>> doesn't
>>>>>>>> make
>>>>>>>> >>>> sense.
>>>>>>>> >>>
>>>>>>>> >>> I need to consider exactly what it is the "info"
>>>>>>>> needs to
>>>>>>>> report in
>>>>>>>> >>> more detail. There are existing facilities (system
>>>>>>>> properties,
>>>>>>>> >>> management
>>>>>>>> >>> APIs) for various bits of runtime information, which
>>>>>>>> VMError can't
>>>>>>>> >>> utilize but a Dcmd can.
>>>>>>>> >>>
>>>>>>>> >>> David
>>>>>>>> >>>
>>>>>>>> >>>> Coleen
>>>>>>>> >>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> David
>>>>>>>> >>>>>
>>>>>>>> >>>>>> Coleen
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>> >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>> should
>>>>>>>> not be
>>>>>>>> >>>>>>> piggy-backed onto VMError handling for the reasons
>>>>>>>> stated
>>>>>>>> (and the
>>>>>>>> >>>>>>> error handling logic is complicated enough as it
>>>>>>>> is!). For
>>>>>>>> things
>>>>>>>> >>>>>>> like thread lists there are already safe management
>>>>>>>> functions that
>>>>>>>> >>>>>>> can/should be used.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Thanks,
>>>>>>>> >>>>>>> David H.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>> >>>>>>>> Hi David,
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>> concerned
>>>>>>>> with the
>>>>>>>> >>>>>>>> fact that the diagnostic command is calling
>>>>>>>> VMError::report().
>>>>>>>> >>>>>>>> This method has been implemented to be executed
>>>>>>>> in the
>>>>>>>> particular
>>>>>>>> >>>>>>>> context of fatal errors, and its usage while the
>>>>>>>> VM is
>>>>>>>> running
>>>>>>>> >>>>>>>> normally seems dangerous.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> For instance, VMError::report() consciously avoids
>>>>>>>> grabbing locks
>>>>>>>> >>>>>>>> because of the risk of deadlock during the error
>>>>>>>> reporting.
>>>>>>>> >>>>>>>> The consequence is that some data structures are
>>>>>>>> browsed in an
>>>>>>>> >>>>>>>> unsafe way. One example: VMError::report() calls
>>>>>>>> >>>>>>>> Threads::print_on_error() which iterates over the
>>>>>>>> thread list
>>>>>>>> >>>>>>>> *without owning the Threads_lock*.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> The implementation of the diagnostic command seems
>>>>>>>> also to
>>>>>>>> >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>> VMError::report()
>>>>>>>> >>>>>>>> method. Have you considered implementing a new
>>>>>>>> MT-safe
>>>>>>>> reporting
>>>>>>>> >>>>>>>> method rather than trying to modify the special
>>>>>>>> VMError::report()
>>>>>>>> >>>>>>>> methods? (Note that some code factorization between
>>>>>>>> >>>>>>>> VMError::report() and this new method should be
>>>>>>>> possible).
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Thanks,
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Fred
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>> >>>>>>>>> Hi!
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Please review my change for this small
>>>>>>>> enhancement.
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> bug:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>> >>>>>>>>> webrev:
>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Cheers,
>>>>>>>> >>>>>>>>> -Buck
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Bertrand Delsart, Grenoble Engineering
>>>>>>>> Center
>>>>>>>> Oracle, 180 av. de l'Europe, ZIRST de
>>>>>>>> Montbonnot
>>>>>>>> 38330 Montbonnot Saint Martin, FRANCE
>>>>>>>> bertrand.delsart at oracle.com
>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>> Phone : +33 4 76 18 81 23
>>>>>>>>
>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>> NOTICE: This email message is for the sole use of the
>>>>>>>> intended
>>>>>>>> recipient(s) and may contain confidential and privileged
>>>>>>>> information. Any unauthorized review, use, disclosure or
>>>>>>>> distribution is prohibited. If you are not the intended
>>>>>>>> recipient,
>>>>>>>> please contact the sender by reply email and destroy all
>>>>>>>> copies of
>>>>>>>> the original message.
>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>
More information about the hotspot-runtime-dev
mailing list