RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)

Thu Nov 12 21:41:02 UTC 2015

Hi,  I should learn my lesson that Frederic, Bertrand and David are 
generally right.

My restructuring went afoul of the STEP framework, and so I have made 
VMError::print_vm_info() report the safe things that VMError::report() 
reports.  Hopefully, this will address the concern about safety.   It 
also allows us to remove the lock and static buffer in the VM.info 
function.   The duplication isn't as bad as I thought and is a better 
solution than piggy backing from VMError::report().

Please review:

open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
bug link https://bugs.openjdk.java.net/browse/JDK-8027429

I tested with jcmd <pid> VM.info concurrently and with an inserted crash 
during VM.info call to make sure the original hs_err_pid report comes 
out okay.

Thanks,
Coleen

On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>
>
> On 11/10/15 7:12 PM, David Holmes wrote:
>> Hi Coleen,
>>
>> I still do not like the intertwining of the info and error code, but 
>> I won't say more on that.
>
> Yes, I understand the approach has good arguments on both sides. I'm 
> trying to find a compromise that can move this feature forward.
>>
>> However you may now have a concurrency problem:
>>
>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>  843 void VMError::report_info(outputStream* st) {
>>  844
>>  845   // don't allocate large buffer on stack
>>  846   static char buf[O_BUFLEN];
>>
>> Can you have concurrent info requests in flight? For the info case 
>> the buffer could be allocated on the stack I think, or even 
>> dynamically. (Aside: will this be executed in the service thread?)
>
> There is a lock added to protect the VM.info dcmd, so there would be 
> only sequential access to the static buffer.  I don't think this is 
> executed in the Service Thread.  I see no evidence of this anyway.
>
>>
>> report_process could possibly be refactored further as there's a lot 
>> of initial stuff that won't apply to the info case:
>>  - we won't print all threads
>>  - the VM state will always be "not at safepoint"
>>  - The Universe must be initialized (not clear at what point during 
>> shutdown a dcmd can still come in?)
>>  - we don't print owned locks
>
> Yes, I can do this.  It actually improves how this looks.
>
> Thanks,
> Coleen
>>
>> That's all from me.
>>
>> Thanks,
>> David
>>
>>
>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>
>>> Hi,
>>>
>>> I've done some refactoring and cleanup on this which the refactoring
>>> enabled.   Again, it's important that the VM.info output be mostly the
>>> same order as the hs_err_pid* report file so that sustaining can easily
>>> match and find output that they are looking for.   It's also important
>>> not to copy this code because there may be new information added 
>>> that we
>>> also want in the VM.info output.  If any future information that we 
>>> want
>>> to print seems unsafe, the vm_info_cmd can be used to exclude it or it
>>> can be printed only in the error reporting function. Rejecting the
>>> entire change because of the theoretical unsafe addition of code isn't
>>> the right thing to do, although yeah, I've argued the other way in the
>>> past.  This is something that is helpful to sustaining and our 
>>> licensees.
>>>
>>> In this patch, the functions that VM.info calls are safe while the JVM
>>> is running, but the stepping is required because at crash time, they
>>> might not be.  Although I've never seen the error reporting crash in
>>> these places.
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>
>>> This refactoring has been run through RBT.
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>> Also note that some of the crash information is not interesting for
>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>> registers/stack data for any specific thread. If people want thread
>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>> system state.
>>>>
>>>> I'm a little surprised that thread info is not of interest, but as you
>>>> say that is available by other means  - which was part of my earlier
>>>> point about there already being safe means to get some of the info.
>>>>
>>>> If this is really only about the small subset of safe "system
>>>> information" that VMError reports, then I see even less motivation to
>>>> piggy-back VM-info onto VMerror and entangle the two. If each block of
>>>> information can be considered a single "step" in the error case then
>>>> we can simply factor out the actual printing parts and share them. If
>>>> the error case wants fine-grained steps then we can't do that.
>>>>
>>>> David
>>>>
>>>>> Kind Regards
>>>>> /Mattis
>>>>>
>>>>> -----Original Message-----
>>>>> From: Coleen Phillimore
>>>>> Sent: den 3 november 2015 16:25
>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>> hs_err print-out
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've done a bit of refactoring with this change and have tested it
>>>>> with and without intentional crashes to the VM.info reporting.  If it
>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>> where it comes from in the hs_err file.  And the hs_err_pid file is
>>>>> correctly produced.  If we can make the jcmd come with a warning that
>>>>> it *could* potentially crash the vm, I think this feature is a good
>>>>> addition for sustaining.
>>>>>
>>>>> Without crashing, VM.info gives a lot of information that we normally
>>>>> give with the hs_err file.  The reason for the careful STEP'ping in
>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>> count on anything.  The VM.info command is designed to be run on a
>>>>> moderately stable JVM so is less likely to crash on the things that
>>>>> we're printing out.
>>>>>
>>>>> I think we should fix the Threads walking code that Fred pointed out
>>>>> to take the Threads_lock, but I don't think we should *copy* code for
>>>>> VM.info or copy print_on_error functions.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>> Thanks Thomas,
>>>>>>>
>>>>>>> I fully agree it can work. The problem is that it may also fail 
>>>>>>> if we
>>>>>>> are not careful.
>>>>>>>
>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious source of
>>>>>>> potential crashes if you take into account the fact that the
>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>
>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>
>>>>>> I'm not sure of why we do not print the threads when there is notion
>>>>>> of current thread (and I would wonder whether we should print 
>>>>>> them for
>>>>>> VMInfo) but luckily that may have been sufficient to prevent the 
>>>>>> crash.
>>>>>>
>>>>>> However, I still do not like it from a design point of view for the
>>>>>> other reasons stated in my e-mail.
>>>>>>
>>>>>> Bertrand.
>>>>>>
>>>>>>>
>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>> useless, not what might be dangerous due to how the method (and 
>>>>>>> what
>>>>>>> it calls) has been designed.
>>>>>>>
>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient unless you
>>>>>>> pass that argument to all layers. This is the biggest design issue.
>>>>>>> For instance all calls to print_on_error could have this additional
>>>>>>> argument so that the different components can adapt their behavior
>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>> Threads::print_on_error, grabbing the threads lock in the info 
>>>>>>> case,
>>>>>>> but the same if true for all methods called by VMError::report. I
>>>>>>> would go as far as adding a comment in all print_on_error methods
>>>>>>> stating why the method is safe for both use cases (e.g. why no 
>>>>>>> locks
>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>> VMError:report is supposed to skip.
>>>>>>> The current design, where the various components may not be aware
>>>>>>> that they are called for two different purposes, is not robust 
>>>>>>> enough.
>>>>>>>
>>>>>>> Note also that being 'robust enough for the "crash" case' is not a
>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>> particularly if, as you point out, an error that occurs in one 
>>>>>>> of the
>>>>>>> info STEP crashes the VM instead of just bringing us to the next
>>>>>>> error reporting STEP. Now, as stated before, I could agree if we 
>>>>>>> use
>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Bertrand
>>>>>>>
>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>
>>>>>>>> However, there are some things to keep in mind when doing this:
>>>>>>>>
>>>>>>>> Error handling is different between the "info-only" and "crash"
>>>>>>>> case. In
>>>>>>>> "info-only" case, VMError::report() is called without enclosing
>>>>>>>> VMError::report_and_die() and we run with the normal signal 
>>>>>>>> handler
>>>>>>>> intact, not with the secondary signal handler. So, normal signal
>>>>>>>> handling works, which is good. But it also means that any error
>>>>>>>> inside one of the error reporting STEPs will bring down the VM,
>>>>>>>> instead of just bringing us to the next error reporting STEP 
>>>>>>>> ("Error
>>>>>>>> occurred during error reporting").
>>>>>>>>
>>>>>>>> The problem is that a number of error reporting steps are a bit
>>>>>>>> risky and written with the assurance that the worst which can 
>>>>>>>> happen
>>>>>>>> is that the STEP will be interrupted. These risky steps should not
>>>>>>>> be executed in the "only-info" case.
>>>>>>>>
>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>> "_vm_info_cmd"
>>>>>>>> and conditionally excludes dangerous or pointless error reporting
>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>
>>>>>>>> As far as I can see, the other concerns concern only the "crash"
>>>>>>>> case and simply do not apply to the "info-only" case. If an error
>>>>>>>> reporting step is robust enough for the "crash" case (low stack
>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it should be
>>>>>>>> safe enough for the "info" case.
>>>>>>>>
>>>>>>>> Kind Regards, Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>> <mattis.castegren at oracle.com <mailto:mattis.castegren at oracle.com>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>      Hi
>>>>>>>>
>>>>>>>>      Of course, when I wrote "Review and push this fix" I meant
>>>>>>>> doing a
>>>>>>>>      thorough technical review. We do not want VM.info to be
>>>>>>>> unsafe, and
>>>>>>>>      I will let David reply to the technical questions that has 
>>>>>>>> been
>>>>>>>>      raised. I know he did consider some of these issues, but last
>>>>>>>> week
>>>>>>>>      he was on Java one, and this week he is out. I will let him
>>>>>>>> comment
>>>>>>>>      once he gets back.
>>>>>>>>
>>>>>>>>      Kind Regards
>>>>>>>>      /Mattis
>>>>>>>>
>>>>>>>>      -----Original Message-----
>>>>>>>>      From: Bertrand Delsart
>>>>>>>>      Sent: den 2 november 2015 10:13
>>>>>>>>      To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>      hotspot-runtime-dev at openjdk.java.net
>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>      Subject: Re: RFR: 8027429: Add diagnostic command VM.info 
>>>>>>>> to get
>>>>>>>>      hs_err print-out
>>>>>>>>
>>>>>>>>      On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>       > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>       >> Hi
>>>>>>>>       >>
>>>>>>>>       >> I agree that there may be things we could add to VM.info
>>>>>>>> that we
>>>>>>>>       >> couldn't add in an hs_err file due to the fact that we 
>>>>>>>> are
>>>>>>>> crashing.
>>>>>>>>       >> However, I really do believe that we should use the same
>>>>>>>> code.
>>>>>>>> The
>>>>>>>>       >> goal of the hs_err file is exactly the same as the 
>>>>>>>> goal of
>>>>>>>> VM.info,
>>>>>>>>       >> to print as much information about the system as 
>>>>>>>> possible so
>>>>>>>> that
>>>>>>>>       >> when we look at an hs_err file or the output from 
>>>>>>>> VM.info,
>>>>>>>> we get as
>>>>>>>>       >> much information about the system as possible. That way
>>>>>>>> we can
>>>>>>>>      reduce
>>>>>>>>       >> the need to go back and forth with the customer to ask
>>>>>>>> for more
>>>>>>>>      information.
>>>>>>>>       >>
>>>>>>>>       >> If we enumerate what we need in VM.info, the list 
>>>>>>>> would look
>>>>>>>> very
>>>>>>>>       >> much like what we already do in VMError, and therefore I
>>>>>>>> think we
>>>>>>>>       >> should use the same code. If we find something that we 
>>>>>>>> would
>>>>>>>> want,
>>>>>>>>       >> that is not currently printed in hs_err files, I think we
>>>>>>>> have
>>>>>>>>      two options:
>>>>>>>>       >>
>>>>>>>>       >> 1) Add it to hs_err files as well. If it is 
>>>>>>>> interesting when
>>>>>>>>       >> gathering customer information before a crash, it
>>>>>>>> probably is
>>>>>>>>       >> interesting after a crash
>>>>>>>>       >> 2) If it for some reason cannot be captured when 
>>>>>>>> crashing,
>>>>>>>> add it to
>>>>>>>>       >> the code with the condition that we don't print it during
>>>>>>>> crash
>>>>>>>>      time.
>>>>>>>>       >>
>>>>>>>>       >> I would therefore suggest the following plan of action
>>>>>>>>       >>
>>>>>>>>       >> 1) Review and push this fix to see if there are any
>>>>>>>> technical issues
>>>>>>>>       >> with regards to thread safety, etc. I know David has had
>>>>>>>> this in
>>>>>>>>      mind
>>>>>>>>       >> when doing the change. After that, we have a command that
>>>>>>>>      support can
>>>>>>>>       >> use.
>>>>>>>>       >> 2) Think about if there is any information missing, both
>>>>>>>> from hs_err
>>>>>>>>       >> files and from VM.info. Here, we may want to ask support
>>>>>>>> about if
>>>>>>>>       >> there is any additional information they need to ask for
>>>>>>>> when they
>>>>>>>>       >> get this data. If possible, this information should be 
>>>>>>>> added
>>>>>>>> to both
>>>>>>>>       >> hs_err files and VM.info, we don't want support to 
>>>>>>>> have to
>>>>>>>> ask for
>>>>>>>>       >> VM.info if we already have an hs_err file. Of course, if
>>>>>>>> there is
>>>>>>>>       >> anything Dev would want to see when analyzing crashes in
>>>>>>>> testing, we
>>>>>>>>       >> should add that too.
>>>>>>>>       >
>>>>>>>>       > I have no issue with VMError and VM.Info using the same 
>>>>>>>> code
>>>>>>>> (where
>>>>>>>>       > appropriate), but I think it wrong to try and implement
>>>>>>>> VM.Info
>>>>>>>>       > directly using the actual VMError mechanism. It is 
>>>>>>>> crude and
>>>>>>>> makes it
>>>>>>>>       > difficult to make independent changes to either facility.
>>>>>>>>       >
>>>>>>>>       > Sorry.
>>>>>>>>       >
>>>>>>>>       > David
>>>>>>>>
>>>>>>>>      Furthermore, we cannot just "Review and push this fix to 
>>>>>>>> see if
>>>>>>>>      there are any technical issues". Two reviewers have already
>>>>>>>> stated
>>>>>>>>      that there are issues and these issues must be addressed 
>>>>>>>> before
>>>>>>>> pushing.
>>>>>>>>
>>>>>>>>      The answer cannot just be that we need this feature, it must
>>>>>>>> address
>>>>>>>>      the technical issues that were reported.
>>>>>>>>
>>>>>>>>      One possible answer is that we all agree that VMInfo would be
>>>>>>>> unsafe.
>>>>>>>>      The method would have to be renamed UnsafeVMInfo and it 
>>>>>>>> must be
>>>>>>>>      clear that its 'impact' would be very risky, potentially
>>>>>>>> crashing
>>>>>>>>      the JVM.
>>>>>>>>
>>>>>>>>      If this is sufficient for your needs then we could agree with
>>>>>>>> it. My
>>>>>>>>      only additional requirement would be to clearly mark that the
>>>>>>>> method
>>>>>>>>      has been called so that we do not waste time on non-issues if
>>>>>>>>      calling VMInfo leads to a crash (or, even worse but less
>>>>>>>> likely, to
>>>>>>>>      a silent data corruption that may lead to problems later).
>>>>>>>> The bare
>>>>>>>>      minimum would be to double check that a crash during an
>>>>>>>> execution of
>>>>>>>>      VM.info is clearly identifiable (this is worth double 
>>>>>>>> checking
>>>>>>>>      because of the use of static buffers for the error reporting,
>>>>>>>> see
>>>>>>>>      below).
>>>>>>>>
>>>>>>>>      If an 'unsafe' VMInfo DCmd is not sufficient, you should 
>>>>>>>> wonder
>>>>>>>>      whether a safe version of it is still a "small enhancement".
>>>>>>>> A JEP
>>>>>>>>      may be required.
>>>>>>>>
>>>>>>>>      VMError::report and the methods it calls (print_on_error)
>>>>>>>> have been
>>>>>>>>      written with assumptions clearly stated in the comments above
>>>>>>>>      VMError::report (see below the two most important ones). 
>>>>>>>> HotSpot
>>>>>>>>      error reporting strategy differed from JRockit. This explains
>>>>>>>> why
>>>>>>>>      the JRockit approach cannot be applied as is.
>>>>>>>>
>>>>>>>>      I agree with David that you can try leveraging some of the 
>>>>>>>> logic
>>>>>>>>      from VMError::report. However, to be sure there is no
>>>>>>>>      misunderstanding, let's be clearer about David's "(where
>>>>>>>>      appropriate)": we would for instance have to review all the
>>>>>>>> called
>>>>>>>>      methods, check which ones purposefully ignore locking
>>>>>>>> (potentially
>>>>>>>>      crashing) and provide alternate code for the unsafe ones
>>>>>>>> (which may
>>>>>>>>      not be trivial in some cases). As a simple example, a method
>>>>>>>> called
>>>>>>>>      "print_on_error" should not be called when we are not in an
>>>>>>>>      erroneous condition. However, in some components of the 
>>>>>>>> JVM, the
>>>>>>>> low
>>>>>>>>      level implementations of 'print_on_error' and
>>>>>>>> 'print_info_safely'
>>>>>>>>      could share the same code.
>>>>>>>>
>>>>>>>>      Regards,
>>>>>>>>
>>>>>>>>      Bertrand.
>>>>>>>>
>>>>>>>>      Here are the important comments about VMError::report
>>>>>>>> assumptions:
>>>>>>>>
>>>>>>>>      // In general, a hang in error handler is much worse than a
>>>>>>>> crash or
>>>>>>>>      internal
>>>>>>>>      // error, as it's harder to recover from a hang. Deadlock can
>>>>>>>> happen
>>>>>>>>      if we
>>>>>>>>      // try to grab a lock that is already owned by current 
>>>>>>>> thread,
>>>>>>>> or if the
>>>>>>>>      // owner is blocked forever (e.g. in 
>>>>>>>> os::infinite_sleep()). If
>>>>>>>>      possible, the
>>>>>>>>      // error handler and all the functions it called should avoid
>>>>>>>>      grabbing any
>>>>>>>>      // lock. An important thing to notice is that memory 
>>>>>>>> allocation
>>>>>>>> needs a
>>>>>>>>      lock.
>>>>>>>>      //
>>>>>>>>      // We should avoid using large stack allocated buffers. Many
>>>>>>>> errors
>>>>>>>>      happen
>>>>>>>>      // when stack space is already low. Making things even 
>>>>>>>> worse is
>>>>>>>> that
>>>>>>>>      there
>>>>>>>>      // could be nested report_and_die() calls on stack (see 
>>>>>>>> above).
>>>>>>>> Only one
>>>>>>>>      // thread can report error, so large buffers are statically
>>>>>>>> allocated in
>>>>>>>>      data
>>>>>>>>      // segment.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>       >
>>>>>>>>       >
>>>>>>>>       >> Kind Regards
>>>>>>>>       >> /Mattis
>>>>>>>>       >>
>>>>>>>>       >> -----Original Message-----
>>>>>>>>       >> From: David Holmes
>>>>>>>>       >> Sent: den 29 oktober 2015 23:20
>>>>>>>>       >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>       >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>       >> Subject: Re: RFR: 8027429: Add diagnostic command 
>>>>>>>> VM.info to
>>>>>>>> get
>>>>>>>>       >> hs_err print-out
>>>>>>>>       >>
>>>>>>>>       >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>       >>> Some background: We have this command in JRockit. The
>>>>>>>>      information you
>>>>>>>>       >>> gather when you crash to give a good summary of what 
>>>>>>>> system
>>>>>>>> you run
>>>>>>>>       >>> on is pretty much exactly the information you need to 
>>>>>>>> get a
>>>>>>>> good
>>>>>>>>       >>> summary on a system that has not crashed. The JRockit
>>>>>>>> command is
>>>>>>>>       >>> extremely useful for support, and saves a lot of work 
>>>>>>>> going
>>>>>>>>      back and
>>>>>>>>       >>> forth asking about system information.
>>>>>>>>       >>>
>>>>>>>>       >>> Also, if we write something new in the hs_err file, 
>>>>>>>> like if
>>>>>>>>      there has
>>>>>>>>       >>> been any out of memory errors, we often would want 
>>>>>>>> the same
>>>>>>>>       >>> information in the VM.info output. From my experience in
>>>>>>>>       >>> Sustaining/Support, I can't think of any information you
>>>>>>>> would want
>>>>>>>>       >>> in VM.info that you wouldn't also want in the hs_err 
>>>>>>>> file
>>>>>>>> and the
>>>>>>>>       >>> other way around, apart from details about the crash
>>>>>>>> (obviously).
>>>>>>>>       >>>
>>>>>>>>       >>> I don't see a reason to exactly enumerate what 
>>>>>>>> information
>>>>>>>> VM.info
>>>>>>>>       >>> should provide. From a sustaining/support 
>>>>>>>> perspective, we
>>>>>>>> want a
>>>>>>>>       >>> one-stop command to gather as much useful information as
>>>>>>>> possible,
>>>>>>>>       >>> which is the same idea we have for the hs_err file.
>>>>>>>>       >>
>>>>>>>>       >> The reason to enumerate what is required is to see where
>>>>>>>> that
>>>>>>>>       >> information already exists and can be collected from. The
>>>>>>>> VMError
>>>>>>>>       >> report has to be very careful about what it does and 
>>>>>>>> how it
>>>>>>>> does it
>>>>>>>>       >> because of the fact we may have crashed and the general
>>>>>>>> process
>>>>>>>>      state
>>>>>>>>       >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>> information is
>>>>>>>>       >> required from the available sources in whatever order or
>>>>>>>> format
>>>>>>>>      desired.
>>>>>>>>       >>
>>>>>>>>       >> I have no problem with this command, just how it has been
>>>>>>>>      proposed to
>>>>>>>>       >> implement it.
>>>>>>>>       >>
>>>>>>>>       >> David
>>>>>>>>       >>
>>>>>>>>       >>> Kind Regards
>>>>>>>>       >>> /Mattis
>>>>>>>>       >>>
>>>>>>>>       >>> -----Original Message-----
>>>>>>>>       >>> From: David Holmes
>>>>>>>>       >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>       >>> To: Coleen Phillimore; 
>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>       >>> Subject: Re: RFR: 8027429: Add diagnostic command 
>>>>>>>> VM.info
>>>>>>>> to get
>>>>>>>>       >>> hs_err print-out
>>>>>>>>       >>>
>>>>>>>>       >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>       >>>>
>>>>>>>>       >>>>
>>>>>>>>       >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>       >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>       >>>>>>
>>>>>>>>       >>>>>> So I actually disagree.  I don't think there should
>>>>>>>> be an
>>>>>>>>       >>>>>> additional separate mechanism to get the same
>>>>>>>> information
>>>>>>>>      that we
>>>>>>>>       >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>> feature like
>>>>>>>>       >>>>>> this for a long time.
>>>>>>>>       >>>>>>
>>>>>>>>       >>>>>> I pre-reviewed this change and I thought it looked
>>>>>>>> good in
>>>>>>>>      general.
>>>>>>>>       >>>>>> I didn't see the thread iteration problem that Fred
>>>>>>>> refers to
>>>>>>>>       >>>>>> below, but I think the individual problems can be 
>>>>>>>> fixed.
>>>>>>>>       >>>>>>
>>>>>>>>       >>>>>> The last thing I want is this code to be copied
>>>>>>>> somewhere else.
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> Factored as needed not copied. VMError is not an 
>>>>>>>> "info"
>>>>>>>> reporting
>>>>>>>>       >>>>> mechanism.
>>>>>>>>       >>>>
>>>>>>>>       >>>> If you look at the things that are reported in each
>>>>>>>> "STEP",
>>>>>>>>      there's a
>>>>>>>>       >>>> small amount of code and the order is important.
>>>>>>>>       >>>
>>>>>>>>       >>> The order is less important in an "info" request I would
>>>>>>>> think.
>>>>>>>>       >>>
>>>>>>>>       >>>> I'd like the vm info to use the same order and report
>>>>>>>> what it
>>>>>>>>      can do
>>>>>>>>       >>>> safely.   Refactoring 5 lines of code into functions
>>>>>>>> doesn't
>>>>>>>> make
>>>>>>>>       >>>> sense.
>>>>>>>>       >>>
>>>>>>>>       >>> I need to consider exactly what it is the "info" 
>>>>>>>> needs to
>>>>>>>> report in
>>>>>>>>       >>> more detail. There are existing facilities (system
>>>>>>>> properties,
>>>>>>>>       >>> management
>>>>>>>>       >>> APIs) for various bits of runtime information, which
>>>>>>>> VMError can't
>>>>>>>>       >>> utilize but a Dcmd can.
>>>>>>>>       >>>
>>>>>>>>       >>> David
>>>>>>>>       >>>
>>>>>>>>       >>>> Coleen
>>>>>>>>       >>>>
>>>>>>>>       >>>>>
>>>>>>>>       >>>>> David
>>>>>>>>       >>>>>
>>>>>>>>       >>>>>> Coleen
>>>>>>>>       >>>>>>
>>>>>>>>       >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>       >>>>>>> I agree with Fred, this kind of info reporting 
>>>>>>>> should
>>>>>>>> not be
>>>>>>>>       >>>>>>> piggy-backed onto VMError handling for the reasons
>>>>>>>> stated
>>>>>>>>      (and the
>>>>>>>>       >>>>>>> error handling logic is complicated enough as it
>>>>>>>> is!). For
>>>>>>>>      things
>>>>>>>>       >>>>>>> like thread lists there are already safe management
>>>>>>>>      functions that
>>>>>>>>       >>>>>>> can/should be used.
>>>>>>>>       >>>>>>>
>>>>>>>>       >>>>>>> Thanks,
>>>>>>>>       >>>>>>> David H.
>>>>>>>>       >>>>>>>
>>>>>>>>       >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>       >>>>>>>> Hi David,
>>>>>>>>       >>>>>>>>
>>>>>>>>       >>>>>>>> I haven't review all the code yet, but I'm 
>>>>>>>> concerned
>>>>>>>> with the
>>>>>>>>       >>>>>>>> fact that the diagnostic command is calling
>>>>>>>> VMError::report().
>>>>>>>>       >>>>>>>> This method has been implemented to be executed 
>>>>>>>> in the
>>>>>>>>      particular
>>>>>>>>       >>>>>>>> context of fatal errors, and its usage while the 
>>>>>>>> VM is
>>>>>>>> running
>>>>>>>>       >>>>>>>> normally seems dangerous.
>>>>>>>>       >>>>>>>>
>>>>>>>>       >>>>>>>> For instance, VMError::report() consciously avoids
>>>>>>>>      grabbing locks
>>>>>>>>       >>>>>>>> because of the risk of deadlock during the error
>>>>>>>> reporting.
>>>>>>>>       >>>>>>>> The consequence is that some data structures are
>>>>>>>> browsed in an
>>>>>>>>       >>>>>>>> unsafe way. One example: VMError::report() calls
>>>>>>>>       >>>>>>>> Threads::print_on_error() which iterates over the
>>>>>>>> thread list
>>>>>>>>       >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>       >>>>>>>>
>>>>>>>>       >>>>>>>> The implementation of the diagnostic command seems
>>>>>>>> also to
>>>>>>>>       >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>> VMError::report()
>>>>>>>>       >>>>>>>> method. Have you considered implementing a new 
>>>>>>>> MT-safe
>>>>>>>>      reporting
>>>>>>>>       >>>>>>>> method rather than trying to modify the special
>>>>>>>>      VMError::report()
>>>>>>>>       >>>>>>>> methods? (Note that some code factorization between
>>>>>>>>       >>>>>>>> VMError::report() and this new method should be
>>>>>>>> possible).
>>>>>>>>       >>>>>>>>
>>>>>>>>       >>>>>>>> Thanks,
>>>>>>>>       >>>>>>>>
>>>>>>>>       >>>>>>>> Fred
>>>>>>>>       >>>>>>>>
>>>>>>>>       >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>       >>>>>>>>> Hi!
>>>>>>>>       >>>>>>>>>
>>>>>>>>       >>>>>>>>> Please review my change for this small 
>>>>>>>> enhancement.
>>>>>>>>       >>>>>>>>>
>>>>>>>>       >>>>>>>>> bug: 
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>       >>>>>>>>> webrev:
>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>       >>>>>>>>>
>>>>>>>>       >>>>>>>>> Cheers,
>>>>>>>>       >>>>>>>>> -Buck
>>>>>>>>       >>>>>>>>
>>>>>>>>       >>>>>>
>>>>>>>>       >>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>      --
>>>>>>>>      Bertrand Delsart,                     Grenoble Engineering
>>>>>>>> Center
>>>>>>>>      Oracle,         180 av. de l'Europe, ZIRST de
>>>>>>>> Montbonnot
>>>>>>>>      38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>      bertrand.delsart at oracle.com 
>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>                Phone : +33 4 76 18 81 23
>>>>>>>>
>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>      NOTICE: This email message is for the sole use of the 
>>>>>>>> intended
>>>>>>>>      recipient(s) and may contain confidential and privileged
>>>>>>>>      information. Any unauthorized review, use, disclosure or
>>>>>>>>      distribution is prohibited. If you are not the intended
>>>>>>>> recipient,
>>>>>>>>      please contact the sender by reply email and destroy all
>>>>>>>> copies of
>>>>>>>>      the original message.
>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>