RFR: 8027429: Add diagnostic command VM.info to get hs_err print-out (NEW WEBREV)

Fri Nov 13 19:19:58 UTC 2015

Thank you, Harold!
Coleen

On 11/13/15 1:13 PM, harold seigel wrote:
> Hi Coleen,
>
> Your latest webrev looks good.
>
> Harold
>
> On 11/13/2015 10:11 AM, Coleen Phillimore wrote:
>>
>> David,  Thank you for looking at this again and our offline discussions.
>>
>> On 11/12/15 11:52 PM, David Holmes wrote:
>>> Hi Coleen,
>>>
>>> On 13/11/2015 7:41 AM, Coleen Phillimore wrote:
>>>>
>>>> Hi,  I should learn my lesson that Frederic, Bertrand and David are
>>>> generally right.
>>>>
>>>> My restructuring went afoul of the STEP framework, and so I have made
>>>> VMError::print_vm_info() report the safe things that VMError::report()
>>>> reports.  Hopefully, this will address the concern about safety.   It
>>>> also allows us to remove the lock and static buffer in the VM.info
>>>> function.   The duplication isn't as bad as I thought and is a better
>>>> solution than piggy backing from VMError::report().
>>>>
>>>> Please review:
>>>>
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.03/
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>
>>> This looks good to me. A few small queries:
>>>
>>> 888   if (Universe::is_fully_initialized()) {
>>>
>>> I don't think it is possible to run a Dcmd before the VM is fully 
>>> initialized.
>>
>> in the interest of keeping semi-duplicated code looking the similar, 
>> and what Frederic says, I'd like to keep the test there. It's not 
>> hurting anything.
>>>
>>> 889     Universe::heap()->print_on_error(st);
>>>
>>> Is this safe? I'm not sure what it does - print heap memory regions ??
>>
>> yes, it appears to be safe.
>>>
>>>  891     st->print_cr("Polling page: " INTPTR_FORMAT, 
>>> p2i(os::get_polling_page()));
>>>
>>> Is this useful ??
>>
>> Sure, why not?   Somebody added it to the hs_err_pid file for some 
>> reason.
>>
>> Thanks!
>> Coleen
>>>
>>> Thanks,
>>> David
>>>
>>>> I tested with jcmd <pid> VM.info concurrently and with an inserted 
>>>> crash
>>>> during VM.info call to make sure the original hs_err_pid report comes
>>>> out okay.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 11/11/15 4:16 PM, Coleen Phillimore wrote:
>>>>>
>>>>>
>>>>> On 11/10/15 7:12 PM, David Holmes wrote:
>>>>>> Hi Coleen,
>>>>>>
>>>>>> I still do not like the intertwining of the info and error code, but
>>>>>> I won't say more on that.
>>>>>
>>>>> Yes, I understand the approach has good arguments on both sides. I'm
>>>>> trying to find a compromise that can move this feature forward.
>>>>>>
>>>>>> However you may now have a concurrency problem:
>>>>>>
>>>>>> 842 // Report for the vm_info_cmd, skipping crash specific code
>>>>>>  843 void VMError::report_info(outputStream* st) {
>>>>>>  844
>>>>>>  845   // don't allocate large buffer on stack
>>>>>>  846   static char buf[O_BUFLEN];
>>>>>>
>>>>>> Can you have concurrent info requests in flight? For the info case
>>>>>> the buffer could be allocated on the stack I think, or even
>>>>>> dynamically. (Aside: will this be executed in the service thread?)
>>>>>
>>>>> There is a lock added to protect the VM.info dcmd, so there would be
>>>>> only sequential access to the static buffer.  I don't think this is
>>>>> executed in the Service Thread.  I see no evidence of this anyway.
>>>>>
>>>>>>
>>>>>> report_process could possibly be refactored further as there's a lot
>>>>>> of initial stuff that won't apply to the info case:
>>>>>>  - we won't print all threads
>>>>>>  - the VM state will always be "not at safepoint"
>>>>>>  - The Universe must be initialized (not clear at what point during
>>>>>> shutdown a dcmd can still come in?)
>>>>>>  - we don't print owned locks
>>>>>
>>>>> Yes, I can do this.  It actually improves how this looks.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>>
>>>>>> That's all from me.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On 6/11/2015 7:01 AM, Coleen Phillimore wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've done some refactoring and cleanup on this which the 
>>>>>>> refactoring
>>>>>>> enabled.   Again, it's important that the VM.info output be 
>>>>>>> mostly the
>>>>>>> same order as the hs_err_pid* report file so that sustaining can 
>>>>>>> easily
>>>>>>> match and find output that they are looking for. It's also 
>>>>>>> important
>>>>>>> not to copy this code because there may be new information added
>>>>>>> that we
>>>>>>> also want in the VM.info output.  If any future information that we
>>>>>>> want
>>>>>>> to print seems unsafe, the vm_info_cmd can be used to exclude it 
>>>>>>> or it
>>>>>>> can be printed only in the error reporting function. Rejecting the
>>>>>>> entire change because of the theoretical unsafe addition of code 
>>>>>>> isn't
>>>>>>> the right thing to do, although yeah, I've argued the other way 
>>>>>>> in the
>>>>>>> past.  This is something that is helpful to sustaining and our
>>>>>>> licensees.
>>>>>>>
>>>>>>> In this patch, the functions that VM.info calls are safe while 
>>>>>>> the JVM
>>>>>>> is running, but the stepping is required because at crash time, 
>>>>>>> they
>>>>>>> might not be.  Although I've never seen the error reporting 
>>>>>>> crash in
>>>>>>> these places.
>>>>>>>
>>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8027429.01/
>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>
>>>>>>> This refactoring has been run through RBT.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>> On 11/4/15 2:25 AM, David Holmes wrote:
>>>>>>>> On 4/11/2015 1:36 AM, Mattis Castegren wrote:
>>>>>>>>> Also note that some of the crash information is not 
>>>>>>>>> interesting for
>>>>>>>>> VM.info. For example, we don't want to print a thread dump or the
>>>>>>>>> registers/stack data for any specific thread. If people want 
>>>>>>>>> thread
>>>>>>>>> dumps, they can use the separate Dcmd. We only want the general
>>>>>>>>> system state.
>>>>>>>>
>>>>>>>> I'm a little surprised that thread info is not of interest, but 
>>>>>>>> as you
>>>>>>>> say that is available by other means  - which was part of my 
>>>>>>>> earlier
>>>>>>>> point about there already being safe means to get some of the 
>>>>>>>> info.
>>>>>>>>
>>>>>>>> If this is really only about the small subset of safe "system
>>>>>>>> information" that VMError reports, then I see even less 
>>>>>>>> motivation to
>>>>>>>> piggy-back VM-info onto VMerror and entangle the two. If each 
>>>>>>>> block of
>>>>>>>> information can be considered a single "step" in the error case 
>>>>>>>> then
>>>>>>>> we can simply factor out the actual printing parts and share 
>>>>>>>> them. If
>>>>>>>> the error case wants fine-grained steps then we can't do that.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Kind Regards
>>>>>>>>> /Mattis
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Coleen Phillimore
>>>>>>>>> Sent: den 3 november 2015 16:25
>>>>>>>>> To: Bertrand Delsart; Thomas Stüfe; Mattis Castegren
>>>>>>>>> Cc: David Holmes; hotspot-runtime-dev at openjdk.java.net
>>>>>>>>> Subject: Re: RFR: 8027429: Add diagnostic command VM.info to get
>>>>>>>>> hs_err print-out
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've done a bit of refactoring with this change and have 
>>>>>>>>> tested it
>>>>>>>>> with and without intentional crashes to the VM.info 
>>>>>>>>> reporting.  If it
>>>>>>>>> crashes within VM.info, there's a stack trace that clearly shows
>>>>>>>>> where it comes from in the hs_err file.  And the hs_err_pid 
>>>>>>>>> file is
>>>>>>>>> correctly produced.  If we can make the jcmd come with a 
>>>>>>>>> warning that
>>>>>>>>> it *could* potentially crash the vm, I think this feature is a 
>>>>>>>>> good
>>>>>>>>> addition for sustaining.
>>>>>>>>>
>>>>>>>>> Without crashing, VM.info gives a lot of information that we 
>>>>>>>>> normally
>>>>>>>>> give with the hs_err file.  The reason for the careful 
>>>>>>>>> STEP'ping in
>>>>>>>>> hs_err production is that the JVM has already crashed so we can't
>>>>>>>>> count on anything.  The VM.info command is designed to be run 
>>>>>>>>> on a
>>>>>>>>> moderately stable JVM so is less likely to crash on the things 
>>>>>>>>> that
>>>>>>>>> we're printing out.
>>>>>>>>>
>>>>>>>>> I think we should fix the Threads walking code that Fred 
>>>>>>>>> pointed out
>>>>>>>>> to take the Threads_lock, but I don't think we should *copy* 
>>>>>>>>> code for
>>>>>>>>> VM.info or copy print_on_error functions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>> On 11/3/15 10:18 AM, Bertrand Delsart wrote:
>>>>>>>>>> On 03/11/2015 15:06, Bertrand Delsart wrote:
>>>>>>>>>>> Thanks Thomas,
>>>>>>>>>>>
>>>>>>>>>>> I fully agree it can work. The problem is that it may also fail
>>>>>>>>>>> if we
>>>>>>>>>>> are not careful.
>>>>>>>>>>>
>>>>>>>>>>> Fred noticed for instance that "Threads::print_on_error" is not
>>>>>>>>>>> protected by _vm_info_cmd (step 290). This is an obvious 
>>>>>>>>>>> source of
>>>>>>>>>>> potential crashes if you take into account the fact that the
>>>>>>>>>>> print_on_error methods were designed not to use locks.
>>>>>>>>>>
>>>>>>>>>> Fred just pointed out that this may be OK since _thread is NULL.
>>>>>>>>>>
>>>>>>>>>> I'm not sure of why we do not print the threads when there is 
>>>>>>>>>> notion
>>>>>>>>>> of current thread (and I would wonder whether we should print
>>>>>>>>>> them for
>>>>>>>>>> VMInfo) but luckily that may have been sufficient to prevent the
>>>>>>>>>> crash.
>>>>>>>>>>
>>>>>>>>>> However, I still do not like it from a design point of view 
>>>>>>>>>> for the
>>>>>>>>>> other reasons stated in my e-mail.
>>>>>>>>>>
>>>>>>>>>> Bertrand.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It seems that the focus has been on removing what was clearly
>>>>>>>>>>> useless, not what might be dangerous due to how the method (and
>>>>>>>>>>> what
>>>>>>>>>>> it calls) has been designed.
>>>>>>>>>>>
>>>>>>>>>>> In addition, adding _vm_info_cmd is IMHO not sufficient 
>>>>>>>>>>> unless you
>>>>>>>>>>> pass that argument to all layers. This is the biggest design 
>>>>>>>>>>> issue.
>>>>>>>>>>> For instance all calls to print_on_error could have this 
>>>>>>>>>>> additional
>>>>>>>>>>> argument so that the different components can adapt their 
>>>>>>>>>>> behavior
>>>>>>>>>>> (and are aware there are now two use cases). I would start by
>>>>>>>>>>> Threads::print_on_error, grabbing the threads lock in the info
>>>>>>>>>>> case,
>>>>>>>>>>> but the same if true for all methods called by 
>>>>>>>>>>> VMError::report. I
>>>>>>>>>>> would go as far as adding a comment in all print_on_error 
>>>>>>>>>>> methods
>>>>>>>>>>> stating why the method is safe for both use cases (e.g. why no
>>>>>>>>>>> locks
>>>>>>>>>>> are needed) or adding "assert(!vm_info_cmd)" for methods
>>>>>>>>>>> VMError:report is supposed to skip.
>>>>>>>>>>> The current design, where the various components may not be 
>>>>>>>>>>> aware
>>>>>>>>>>> that they are called for two different purposes, is not robust
>>>>>>>>>>> enough.
>>>>>>>>>>>
>>>>>>>>>>> Note also that being 'robust enough for the "crash" case' is 
>>>>>>>>>>> not a
>>>>>>>>>>> strong enough guarantee for a general purpose info DCmd...
>>>>>>>>>>> particularly if, as you point out, an error that occurs in one
>>>>>>>>>>> of the
>>>>>>>>>>> info STEP crashes the VM instead of just bringing us to the 
>>>>>>>>>>> next
>>>>>>>>>>> error reporting STEP. Now, as stated before, I could agree 
>>>>>>>>>>> if we
>>>>>>>>>>> use
>>>>>>>>>>> this DCmd only when a crash is acceptable.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Bertrand
>>>>>>>>>>>
>>>>>>>>>>> On 02/11/2015 14:32, Thomas Stüfe wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> we have implemented in the SAP JVM the same mechanism. Our
>>>>>>>>>>>> implementation is in use since a number of years and is quite
>>>>>>>>>>>> useful. So I would agree with Coleen.
>>>>>>>>>>>>
>>>>>>>>>>>> However, there are some things to keep in mind when doing 
>>>>>>>>>>>> this:
>>>>>>>>>>>>
>>>>>>>>>>>> Error handling is different between the "info-only" and 
>>>>>>>>>>>> "crash"
>>>>>>>>>>>> case. In
>>>>>>>>>>>> "info-only" case, VMError::report() is called without 
>>>>>>>>>>>> enclosing
>>>>>>>>>>>> VMError::report_and_die() and we run with the normal signal
>>>>>>>>>>>> handler
>>>>>>>>>>>> intact, not with the secondary signal handler. So, normal 
>>>>>>>>>>>> signal
>>>>>>>>>>>> handling works, which is good. But it also means that any 
>>>>>>>>>>>> error
>>>>>>>>>>>> inside one of the error reporting STEPs will bring down the 
>>>>>>>>>>>> VM,
>>>>>>>>>>>> instead of just bringing us to the next error reporting STEP
>>>>>>>>>>>> ("Error
>>>>>>>>>>>> occurred during error reporting").
>>>>>>>>>>>>
>>>>>>>>>>>> The problem is that a number of error reporting steps are a 
>>>>>>>>>>>> bit
>>>>>>>>>>>> risky and written with the assurance that the worst which can
>>>>>>>>>>>> happen
>>>>>>>>>>>> is that the STEP will be interrupted. These risky steps 
>>>>>>>>>>>> should not
>>>>>>>>>>>> be executed in the "only-info" case.
>>>>>>>>>>>>
>>>>>>>>>>>> But the proposed change already does this, it introduces
>>>>>>>>>>>> "_vm_info_cmd"
>>>>>>>>>>>> and conditionally excludes dangerous or pointless error 
>>>>>>>>>>>> reporting
>>>>>>>>>>>> STEPs (e.g. callstack is pointless).
>>>>>>>>>>>>
>>>>>>>>>>>> As far as I can see, the other concerns concern only the 
>>>>>>>>>>>> "crash"
>>>>>>>>>>>> case and simply do not apply to the "info-only" case. If an 
>>>>>>>>>>>> error
>>>>>>>>>>>> reporting step is robust enough for the "crash" case (low 
>>>>>>>>>>>> stack
>>>>>>>>>>>> usage, not allocating memory to avoid deadlocks etc), it 
>>>>>>>>>>>> should be
>>>>>>>>>>>> safe enough for the "info" case.
>>>>>>>>>>>>
>>>>>>>>>>>> Kind Regards, Thomas
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Nov 2, 2015 at 12:31 PM, Mattis Castegren
>>>>>>>>>>>> <mattis.castegren at oracle.com 
>>>>>>>>>>>> <mailto:mattis.castegren at oracle.com>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>      Hi
>>>>>>>>>>>>
>>>>>>>>>>>>      Of course, when I wrote "Review and push this fix" I 
>>>>>>>>>>>> meant
>>>>>>>>>>>> doing a
>>>>>>>>>>>>      thorough technical review. We do not want VM.info to be
>>>>>>>>>>>> unsafe, and
>>>>>>>>>>>>      I will let David reply to the technical questions that 
>>>>>>>>>>>> has
>>>>>>>>>>>> been
>>>>>>>>>>>>      raised. I know he did consider some of these issues, 
>>>>>>>>>>>> but last
>>>>>>>>>>>> week
>>>>>>>>>>>>      he was on Java one, and this week he is out. I will 
>>>>>>>>>>>> let him
>>>>>>>>>>>> comment
>>>>>>>>>>>>      once he gets back.
>>>>>>>>>>>>
>>>>>>>>>>>>      Kind Regards
>>>>>>>>>>>>      /Mattis
>>>>>>>>>>>>
>>>>>>>>>>>>      -----Original Message-----
>>>>>>>>>>>>      From: Bertrand Delsart
>>>>>>>>>>>>      Sent: den 2 november 2015 10:13
>>>>>>>>>>>>      To: David Holmes; Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>>>      hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>>      Subject: Re: RFR: 8027429: Add diagnostic command VM.info
>>>>>>>>>>>> to get
>>>>>>>>>>>>      hs_err print-out
>>>>>>>>>>>>
>>>>>>>>>>>>      On 02/11/2015 02:45, David Holmes wrote:
>>>>>>>>>>>>       > On 30/10/2015 8:38 PM, Mattis Castegren wrote:
>>>>>>>>>>>>       >> Hi
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> I agree that there may be things we could add to 
>>>>>>>>>>>> VM.info
>>>>>>>>>>>> that we
>>>>>>>>>>>>       >> couldn't add in an hs_err file due to the fact 
>>>>>>>>>>>> that we
>>>>>>>>>>>> are
>>>>>>>>>>>> crashing.
>>>>>>>>>>>>       >> However, I really do believe that we should use 
>>>>>>>>>>>> the same
>>>>>>>>>>>> code.
>>>>>>>>>>>> The
>>>>>>>>>>>>       >> goal of the hs_err file is exactly the same as the
>>>>>>>>>>>> goal of
>>>>>>>>>>>> VM.info,
>>>>>>>>>>>>       >> to print as much information about the system as
>>>>>>>>>>>> possible so
>>>>>>>>>>>> that
>>>>>>>>>>>>       >> when we look at an hs_err file or the output from
>>>>>>>>>>>> VM.info,
>>>>>>>>>>>> we get as
>>>>>>>>>>>>       >> much information about the system as possible. 
>>>>>>>>>>>> That way
>>>>>>>>>>>> we can
>>>>>>>>>>>>      reduce
>>>>>>>>>>>>       >> the need to go back and forth with the customer to 
>>>>>>>>>>>> ask
>>>>>>>>>>>> for more
>>>>>>>>>>>>      information.
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> If we enumerate what we need in VM.info, the list
>>>>>>>>>>>> would look
>>>>>>>>>>>> very
>>>>>>>>>>>>       >> much like what we already do in VMError, and 
>>>>>>>>>>>> therefore I
>>>>>>>>>>>> think we
>>>>>>>>>>>>       >> should use the same code. If we find something 
>>>>>>>>>>>> that we
>>>>>>>>>>>> would
>>>>>>>>>>>> want,
>>>>>>>>>>>>       >> that is not currently printed in hs_err files, I 
>>>>>>>>>>>> think we
>>>>>>>>>>>> have
>>>>>>>>>>>>      two options:
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> 1) Add it to hs_err files as well. If it is
>>>>>>>>>>>> interesting when
>>>>>>>>>>>>       >> gathering customer information before a crash, it
>>>>>>>>>>>> probably is
>>>>>>>>>>>>       >> interesting after a crash
>>>>>>>>>>>>       >> 2) If it for some reason cannot be captured when
>>>>>>>>>>>> crashing,
>>>>>>>>>>>> add it to
>>>>>>>>>>>>       >> the code with the condition that we don't print it 
>>>>>>>>>>>> during
>>>>>>>>>>>> crash
>>>>>>>>>>>>      time.
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> I would therefore suggest the following plan of 
>>>>>>>>>>>> action
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> 1) Review and push this fix to see if there are any
>>>>>>>>>>>> technical issues
>>>>>>>>>>>>       >> with regards to thread safety, etc. I know David 
>>>>>>>>>>>> has had
>>>>>>>>>>>> this in
>>>>>>>>>>>>      mind
>>>>>>>>>>>>       >> when doing the change. After that, we have a 
>>>>>>>>>>>> command that
>>>>>>>>>>>>      support can
>>>>>>>>>>>>       >> use.
>>>>>>>>>>>>       >> 2) Think about if there is any information 
>>>>>>>>>>>> missing, both
>>>>>>>>>>>> from hs_err
>>>>>>>>>>>>       >> files and from VM.info. Here, we may want to ask 
>>>>>>>>>>>> support
>>>>>>>>>>>> about if
>>>>>>>>>>>>       >> there is any additional information they need to 
>>>>>>>>>>>> ask for
>>>>>>>>>>>> when they
>>>>>>>>>>>>       >> get this data. If possible, this information 
>>>>>>>>>>>> should be
>>>>>>>>>>>> added
>>>>>>>>>>>> to both
>>>>>>>>>>>>       >> hs_err files and VM.info, we don't want support to
>>>>>>>>>>>> have to
>>>>>>>>>>>> ask for
>>>>>>>>>>>>       >> VM.info if we already have an hs_err file. Of 
>>>>>>>>>>>> course, if
>>>>>>>>>>>> there is
>>>>>>>>>>>>       >> anything Dev would want to see when analyzing 
>>>>>>>>>>>> crashes in
>>>>>>>>>>>> testing, we
>>>>>>>>>>>>       >> should add that too.
>>>>>>>>>>>>       >
>>>>>>>>>>>>       > I have no issue with VMError and VM.Info using the 
>>>>>>>>>>>> same
>>>>>>>>>>>> code
>>>>>>>>>>>> (where
>>>>>>>>>>>>       > appropriate), but I think it wrong to try and 
>>>>>>>>>>>> implement
>>>>>>>>>>>> VM.Info
>>>>>>>>>>>>       > directly using the actual VMError mechanism. It is
>>>>>>>>>>>> crude and
>>>>>>>>>>>> makes it
>>>>>>>>>>>>       > difficult to make independent changes to either 
>>>>>>>>>>>> facility.
>>>>>>>>>>>>       >
>>>>>>>>>>>>       > Sorry.
>>>>>>>>>>>>       >
>>>>>>>>>>>>       > David
>>>>>>>>>>>>
>>>>>>>>>>>>      Furthermore, we cannot just "Review and push this fix to
>>>>>>>>>>>> see if
>>>>>>>>>>>>      there are any technical issues". Two reviewers have 
>>>>>>>>>>>> already
>>>>>>>>>>>> stated
>>>>>>>>>>>>      that there are issues and these issues must be addressed
>>>>>>>>>>>> before
>>>>>>>>>>>> pushing.
>>>>>>>>>>>>
>>>>>>>>>>>>      The answer cannot just be that we need this feature, 
>>>>>>>>>>>> it must
>>>>>>>>>>>> address
>>>>>>>>>>>>      the technical issues that were reported.
>>>>>>>>>>>>
>>>>>>>>>>>>      One possible answer is that we all agree that VMInfo 
>>>>>>>>>>>> would be
>>>>>>>>>>>> unsafe.
>>>>>>>>>>>>      The method would have to be renamed UnsafeVMInfo and it
>>>>>>>>>>>> must be
>>>>>>>>>>>>      clear that its 'impact' would be very risky, potentially
>>>>>>>>>>>> crashing
>>>>>>>>>>>>      the JVM.
>>>>>>>>>>>>
>>>>>>>>>>>>      If this is sufficient for your needs then we could 
>>>>>>>>>>>> agree with
>>>>>>>>>>>> it. My
>>>>>>>>>>>>      only additional requirement would be to clearly mark 
>>>>>>>>>>>> that the
>>>>>>>>>>>> method
>>>>>>>>>>>>      has been called so that we do not waste time on 
>>>>>>>>>>>> non-issues if
>>>>>>>>>>>>      calling VMInfo leads to a crash (or, even worse but less
>>>>>>>>>>>> likely, to
>>>>>>>>>>>>      a silent data corruption that may lead to problems 
>>>>>>>>>>>> later).
>>>>>>>>>>>> The bare
>>>>>>>>>>>>      minimum would be to double check that a crash during an
>>>>>>>>>>>> execution of
>>>>>>>>>>>>      VM.info is clearly identifiable (this is worth double
>>>>>>>>>>>> checking
>>>>>>>>>>>>      because of the use of static buffers for the error 
>>>>>>>>>>>> reporting,
>>>>>>>>>>>> see
>>>>>>>>>>>>      below).
>>>>>>>>>>>>
>>>>>>>>>>>>      If an 'unsafe' VMInfo DCmd is not sufficient, you should
>>>>>>>>>>>> wonder
>>>>>>>>>>>>      whether a safe version of it is still a "small 
>>>>>>>>>>>> enhancement".
>>>>>>>>>>>> A JEP
>>>>>>>>>>>>      may be required.
>>>>>>>>>>>>
>>>>>>>>>>>>      VMError::report and the methods it calls (print_on_error)
>>>>>>>>>>>> have been
>>>>>>>>>>>>      written with assumptions clearly stated in the 
>>>>>>>>>>>> comments above
>>>>>>>>>>>>      VMError::report (see below the two most important ones).
>>>>>>>>>>>> HotSpot
>>>>>>>>>>>>      error reporting strategy differed from JRockit. This 
>>>>>>>>>>>> explains
>>>>>>>>>>>> why
>>>>>>>>>>>>      the JRockit approach cannot be applied as is.
>>>>>>>>>>>>
>>>>>>>>>>>>      I agree with David that you can try leveraging some of 
>>>>>>>>>>>> the
>>>>>>>>>>>> logic
>>>>>>>>>>>>      from VMError::report. However, to be sure there is no
>>>>>>>>>>>>      misunderstanding, let's be clearer about David's "(where
>>>>>>>>>>>>      appropriate)": we would for instance have to review 
>>>>>>>>>>>> all the
>>>>>>>>>>>> called
>>>>>>>>>>>>      methods, check which ones purposefully ignore locking
>>>>>>>>>>>> (potentially
>>>>>>>>>>>>      crashing) and provide alternate code for the unsafe ones
>>>>>>>>>>>> (which may
>>>>>>>>>>>>      not be trivial in some cases). As a simple example, a 
>>>>>>>>>>>> method
>>>>>>>>>>>> called
>>>>>>>>>>>>      "print_on_error" should not be called when we are not 
>>>>>>>>>>>> in an
>>>>>>>>>>>>      erroneous condition. However, in some components of the
>>>>>>>>>>>> JVM, the
>>>>>>>>>>>> low
>>>>>>>>>>>>      level implementations of 'print_on_error' and
>>>>>>>>>>>> 'print_info_safely'
>>>>>>>>>>>>      could share the same code.
>>>>>>>>>>>>
>>>>>>>>>>>>      Regards,
>>>>>>>>>>>>
>>>>>>>>>>>>      Bertrand.
>>>>>>>>>>>>
>>>>>>>>>>>>      Here are the important comments about VMError::report
>>>>>>>>>>>> assumptions:
>>>>>>>>>>>>
>>>>>>>>>>>>      // In general, a hang in error handler is much worse 
>>>>>>>>>>>> than a
>>>>>>>>>>>> crash or
>>>>>>>>>>>>      internal
>>>>>>>>>>>>      // error, as it's harder to recover from a hang. 
>>>>>>>>>>>> Deadlock can
>>>>>>>>>>>> happen
>>>>>>>>>>>>      if we
>>>>>>>>>>>>      // try to grab a lock that is already owned by current
>>>>>>>>>>>> thread,
>>>>>>>>>>>> or if the
>>>>>>>>>>>>      // owner is blocked forever (e.g. in
>>>>>>>>>>>> os::infinite_sleep()). If
>>>>>>>>>>>>      possible, the
>>>>>>>>>>>>      // error handler and all the functions it called 
>>>>>>>>>>>> should avoid
>>>>>>>>>>>>      grabbing any
>>>>>>>>>>>>      // lock. An important thing to notice is that memory
>>>>>>>>>>>> allocation
>>>>>>>>>>>> needs a
>>>>>>>>>>>>      lock.
>>>>>>>>>>>>      //
>>>>>>>>>>>>      // We should avoid using large stack allocated 
>>>>>>>>>>>> buffers. Many
>>>>>>>>>>>> errors
>>>>>>>>>>>>      happen
>>>>>>>>>>>>      // when stack space is already low. Making things even
>>>>>>>>>>>> worse is
>>>>>>>>>>>> that
>>>>>>>>>>>>      there
>>>>>>>>>>>>      // could be nested report_and_die() calls on stack (see
>>>>>>>>>>>> above).
>>>>>>>>>>>> Only one
>>>>>>>>>>>>      // thread can report error, so large buffers are 
>>>>>>>>>>>> statically
>>>>>>>>>>>> allocated in
>>>>>>>>>>>>      data
>>>>>>>>>>>>      // segment.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>       >
>>>>>>>>>>>>       >
>>>>>>>>>>>>       >> Kind Regards
>>>>>>>>>>>>       >> /Mattis
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> -----Original Message-----
>>>>>>>>>>>>       >> From: David Holmes
>>>>>>>>>>>>       >> Sent: den 29 oktober 2015 23:20
>>>>>>>>>>>>       >> To: Mattis Castegren; Coleen Phillimore;
>>>>>>>>>>>>       >> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>>       >> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>>> VM.info to
>>>>>>>>>>>> get
>>>>>>>>>>>>       >> hs_err print-out
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> On 29/10/2015 11:36 PM, Mattis Castegren wrote:
>>>>>>>>>>>>       >>> Some background: We have this command in JRockit. 
>>>>>>>>>>>> The
>>>>>>>>>>>>      information you
>>>>>>>>>>>>       >>> gather when you crash to give a good summary of what
>>>>>>>>>>>> system
>>>>>>>>>>>> you run
>>>>>>>>>>>>       >>> on is pretty much exactly the information you 
>>>>>>>>>>>> need to
>>>>>>>>>>>> get a
>>>>>>>>>>>> good
>>>>>>>>>>>>       >>> summary on a system that has not crashed. The 
>>>>>>>>>>>> JRockit
>>>>>>>>>>>> command is
>>>>>>>>>>>>       >>> extremely useful for support, and saves a lot of 
>>>>>>>>>>>> work
>>>>>>>>>>>> going
>>>>>>>>>>>>      back and
>>>>>>>>>>>>       >>> forth asking about system information.
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>> Also, if we write something new in the hs_err file,
>>>>>>>>>>>> like if
>>>>>>>>>>>>      there has
>>>>>>>>>>>>       >>> been any out of memory errors, we often would want
>>>>>>>>>>>> the same
>>>>>>>>>>>>       >>> information in the VM.info output. From my 
>>>>>>>>>>>> experience in
>>>>>>>>>>>>       >>> Sustaining/Support, I can't think of any 
>>>>>>>>>>>> information you
>>>>>>>>>>>> would want
>>>>>>>>>>>>       >>> in VM.info that you wouldn't also want in the hs_err
>>>>>>>>>>>> file
>>>>>>>>>>>> and the
>>>>>>>>>>>>       >>> other way around, apart from details about the crash
>>>>>>>>>>>> (obviously).
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>> I don't see a reason to exactly enumerate what
>>>>>>>>>>>> information
>>>>>>>>>>>> VM.info
>>>>>>>>>>>>       >>> should provide. From a sustaining/support
>>>>>>>>>>>> perspective, we
>>>>>>>>>>>> want a
>>>>>>>>>>>>       >>> one-stop command to gather as much useful 
>>>>>>>>>>>> information as
>>>>>>>>>>>> possible,
>>>>>>>>>>>>       >>> which is the same idea we have for the hs_err file.
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> The reason to enumerate what is required is to see 
>>>>>>>>>>>> where
>>>>>>>>>>>> that
>>>>>>>>>>>>       >> information already exists and can be collected 
>>>>>>>>>>>> from. The
>>>>>>>>>>>> VMError
>>>>>>>>>>>>       >> report has to be very careful about what it does and
>>>>>>>>>>>> how it
>>>>>>>>>>>> does it
>>>>>>>>>>>>       >> because of the fact we may have crashed and the 
>>>>>>>>>>>> general
>>>>>>>>>>>> process
>>>>>>>>>>>>      state
>>>>>>>>>>>>       >> is indeterminate. A Dcmd can simply gather whatever
>>>>>>>>>>>> information is
>>>>>>>>>>>>       >> required from the available sources in whatever 
>>>>>>>>>>>> order or
>>>>>>>>>>>> format
>>>>>>>>>>>>      desired.
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> I have no problem with this command, just how it 
>>>>>>>>>>>> has been
>>>>>>>>>>>>      proposed to
>>>>>>>>>>>>       >> implement it.
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >> David
>>>>>>>>>>>>       >>
>>>>>>>>>>>>       >>> Kind Regards
>>>>>>>>>>>>       >>> /Mattis
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>> -----Original Message-----
>>>>>>>>>>>>       >>> From: David Holmes
>>>>>>>>>>>>       >>> Sent: den 29 oktober 2015 13:08
>>>>>>>>>>>>       >>> To: Coleen Phillimore;
>>>>>>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>>>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>
>>>>>>>>>>>>       >>> Subject: Re: RFR: 8027429: Add diagnostic command
>>>>>>>>>>>> VM.info
>>>>>>>>>>>> to get
>>>>>>>>>>>>       >>> hs_err print-out
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>> On 29/10/2015 10:02 PM, Coleen Phillimore wrote:
>>>>>>>>>>>>       >>>>
>>>>>>>>>>>>       >>>>
>>>>>>>>>>>>       >>>> On 10/29/15 7:58 AM, David Holmes wrote:
>>>>>>>>>>>>       >>>>> On 29/10/2015 9:26 PM, Coleen Phillimore wrote:
>>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>>       >>>>>> So I actually disagree.  I don't think there 
>>>>>>>>>>>> should
>>>>>>>>>>>> be an
>>>>>>>>>>>>       >>>>>> additional separate mechanism to get the same
>>>>>>>>>>>> information
>>>>>>>>>>>>      that we
>>>>>>>>>>>>       >>>>>> get with hs_err reporting. I've wanted to have a
>>>>>>>>>>>> feature like
>>>>>>>>>>>>       >>>>>> this for a long time.
>>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>>       >>>>>> I pre-reviewed this change and I thought it 
>>>>>>>>>>>> looked
>>>>>>>>>>>> good in
>>>>>>>>>>>>      general.
>>>>>>>>>>>>       >>>>>> I didn't see the thread iteration problem that 
>>>>>>>>>>>> Fred
>>>>>>>>>>>> refers to
>>>>>>>>>>>>       >>>>>> below, but I think the individual problems can be
>>>>>>>>>>>> fixed.
>>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>>       >>>>>> The last thing I want is this code to be copied
>>>>>>>>>>>> somewhere else.
>>>>>>>>>>>>       >>>>>
>>>>>>>>>>>>       >>>>> Factored as needed not copied. VMError is not an
>>>>>>>>>>>> "info"
>>>>>>>>>>>> reporting
>>>>>>>>>>>>       >>>>> mechanism.
>>>>>>>>>>>>       >>>>
>>>>>>>>>>>>       >>>> If you look at the things that are reported in each
>>>>>>>>>>>> "STEP",
>>>>>>>>>>>>      there's a
>>>>>>>>>>>>       >>>> small amount of code and the order is important.
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>> The order is less important in an "info" request 
>>>>>>>>>>>> I would
>>>>>>>>>>>> think.
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>>> I'd like the vm info to use the same order and 
>>>>>>>>>>>> report
>>>>>>>>>>>> what it
>>>>>>>>>>>>      can do
>>>>>>>>>>>>       >>>> safely.   Refactoring 5 lines of code into 
>>>>>>>>>>>> functions
>>>>>>>>>>>> doesn't
>>>>>>>>>>>> make
>>>>>>>>>>>>       >>>> sense.
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>> I need to consider exactly what it is the "info"
>>>>>>>>>>>> needs to
>>>>>>>>>>>> report in
>>>>>>>>>>>>       >>> more detail. There are existing facilities (system
>>>>>>>>>>>> properties,
>>>>>>>>>>>>       >>> management
>>>>>>>>>>>>       >>> APIs) for various bits of runtime information, which
>>>>>>>>>>>> VMError can't
>>>>>>>>>>>>       >>> utilize but a Dcmd can.
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>> David
>>>>>>>>>>>>       >>>
>>>>>>>>>>>>       >>>> Coleen
>>>>>>>>>>>>       >>>>
>>>>>>>>>>>>       >>>>>
>>>>>>>>>>>>       >>>>> David
>>>>>>>>>>>>       >>>>>
>>>>>>>>>>>>       >>>>>> Coleen
>>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>>       >>>>>> On 10/28/15 8:48 PM, David Holmes wrote:
>>>>>>>>>>>>       >>>>>>> I agree with Fred, this kind of info reporting
>>>>>>>>>>>> should
>>>>>>>>>>>> not be
>>>>>>>>>>>>       >>>>>>> piggy-backed onto VMError handling for the 
>>>>>>>>>>>> reasons
>>>>>>>>>>>> stated
>>>>>>>>>>>>      (and the
>>>>>>>>>>>>       >>>>>>> error handling logic is complicated enough as it
>>>>>>>>>>>> is!). For
>>>>>>>>>>>>      things
>>>>>>>>>>>>       >>>>>>> like thread lists there are already safe 
>>>>>>>>>>>> management
>>>>>>>>>>>>      functions that
>>>>>>>>>>>>       >>>>>>> can/should be used.
>>>>>>>>>>>>       >>>>>>>
>>>>>>>>>>>>       >>>>>>> Thanks,
>>>>>>>>>>>>       >>>>>>> David H.
>>>>>>>>>>>>       >>>>>>>
>>>>>>>>>>>>       >>>>>>> On 29/10/2015 3:29 AM, Frederic Parain wrote:
>>>>>>>>>>>>       >>>>>>>> Hi David,
>>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>>       >>>>>>>> I haven't review all the code yet, but I'm
>>>>>>>>>>>> concerned
>>>>>>>>>>>> with the
>>>>>>>>>>>>       >>>>>>>> fact that the diagnostic command is calling
>>>>>>>>>>>> VMError::report().
>>>>>>>>>>>>       >>>>>>>> This method has been implemented to be executed
>>>>>>>>>>>> in the
>>>>>>>>>>>>      particular
>>>>>>>>>>>>       >>>>>>>> context of fatal errors, and its usage while 
>>>>>>>>>>>> the
>>>>>>>>>>>> VM is
>>>>>>>>>>>> running
>>>>>>>>>>>>       >>>>>>>> normally seems dangerous.
>>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>>       >>>>>>>> For instance, VMError::report() consciously 
>>>>>>>>>>>> avoids
>>>>>>>>>>>>      grabbing locks
>>>>>>>>>>>>       >>>>>>>> because of the risk of deadlock during the 
>>>>>>>>>>>> error
>>>>>>>>>>>> reporting.
>>>>>>>>>>>>       >>>>>>>> The consequence is that some data structures 
>>>>>>>>>>>> are
>>>>>>>>>>>> browsed in an
>>>>>>>>>>>>       >>>>>>>> unsafe way. One example: VMError::report() 
>>>>>>>>>>>> calls
>>>>>>>>>>>>       >>>>>>>> Threads::print_on_error() which iterates 
>>>>>>>>>>>> over the
>>>>>>>>>>>> thread list
>>>>>>>>>>>>       >>>>>>>> *without owning the Threads_lock*.
>>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>>       >>>>>>>> The implementation of the diagnostic command 
>>>>>>>>>>>> seems
>>>>>>>>>>>> also to
>>>>>>>>>>>>       >>>>>>>> exclude a lot of reporting from the initial
>>>>>>>>>>>> VMError::report()
>>>>>>>>>>>>       >>>>>>>> method. Have you considered implementing a new
>>>>>>>>>>>> MT-safe
>>>>>>>>>>>>      reporting
>>>>>>>>>>>>       >>>>>>>> method rather than trying to modify the special
>>>>>>>>>>>>      VMError::report()
>>>>>>>>>>>>       >>>>>>>> methods? (Note that some code factorization 
>>>>>>>>>>>> between
>>>>>>>>>>>>       >>>>>>>> VMError::report() and this new method should be
>>>>>>>>>>>> possible).
>>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>>       >>>>>>>> Thanks,
>>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>>       >>>>>>>> Fred
>>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>>       >>>>>>>> On 28/10/2015 17:18, david buck wrote:
>>>>>>>>>>>>       >>>>>>>>> Hi!
>>>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>>>       >>>>>>>>> Please review my change for this small
>>>>>>>>>>>> enhancement.
>>>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>>>       >>>>>>>>> bug:
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8027429
>>>>>>>>>>>>       >>>>>>>>> webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~dbuck/8027429_jdk9_01/
>>>>>>>>>>>>       >>>>>>>>>
>>>>>>>>>>>>       >>>>>>>>> Cheers,
>>>>>>>>>>>>       >>>>>>>>> -Buck
>>>>>>>>>>>>       >>>>>>>>
>>>>>>>>>>>>       >>>>>>
>>>>>>>>>>>>       >>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>      --
>>>>>>>>>>>>      Bertrand Delsart, Grenoble Engineering
>>>>>>>>>>>> Center
>>>>>>>>>>>>      Oracle,         180 av. de l'Europe, ZIRST de
>>>>>>>>>>>> Montbonnot
>>>>>>>>>>>>      38330 Montbonnot Saint Martin, FRANCE
>>>>>>>>>>>>      bertrand.delsart at oracle.com
>>>>>>>>>>>> <mailto:bertrand.delsart at oracle.com>
>>>>>>>>>>>>                Phone : +33 4 76 18 81 23
>>>>>>>>>>>>
>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>>>>>>>>>>
>>>>>>>>>>>>      NOTICE: This email message is for the sole use of the
>>>>>>>>>>>> intended
>>>>>>>>>>>>      recipient(s) and may contain confidential and privileged
>>>>>>>>>>>>      information. Any unauthorized review, use, disclosure or
>>>>>>>>>>>>      distribution is prohibited. If you are not the intended
>>>>>>>>>>>> recipient,
>>>>>>>>>>>>      please contact the sender by reply email and destroy all
>>>>>>>>>>>> copies of
>>>>>>>>>>>>      the original message.
>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>