RFR(xs): 8227031: Print NMT statistics on fatal errors

Wed Jul 10 13:02:17 UTC 2019

Hi Martin,

thanks for the review! I will remove the unnecessary comment before pushing.

Cheers, Thomas

On Wed, Jul 10, 2019 at 2:40 PM Doerr, Martin <martin.doerr at sap.com> wrote:

> Hi Thomas,
>
> looks good to me.
>
> vmError.cpp:
> // Print out NMT statistics if this was desired.
>
> What is the value of this comment?
> I suggest to remove it if we don't have anything important to explain.
>
> Best regards,
> Martin
>
>
> > -----Original Message-----
> > From: hotspot-runtime-dev <hotspot-runtime-dev-
> > bounces at openjdk.java.net> On Behalf Of Thomas Stüfe
> > Sent: Dienstag, 9. Juli 2019 09:32
> > To: Baesken, Matthias <matthias.baesken at sap.com>
> > Cc: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
> > Subject: Re: RFR(xs): 8227031: Print NMT statistics on fatal errors
> >
> > Hi Matthias,
> >
> >
> > On Tue, Jul 9, 2019 at 9:15 AM Baesken, Matthias
> > <matthias.baesken at sap.com>
> > wrote:
> >
> > > Hi Thomas,   In wonder  about  the following :
> > >
> > > MemTracker::final_report   is called also from  print_statistics()  :
> > >
> > > hotspot/share/runtime/java.cpp
> > > -----------------------------------------------------
> > > void print_statistics() {
> > > ...
> > > 353  // Native memory tracking data
> > > 354  if (PrintNMTStatistics) {
> > > 355    MemTracker::final_report(tty);
> > > 356  }
> > >
> > >
> > > Would this mean  that when  called  before  from  print_statistics() ,
> we
> > > would not call it again  from vmError because of the
> > > g_final_report_did_run  check ?
> > >
> > > src/hotspot/share/services/memTracker.cpp
> > > -----------------------------------------------
> > >
> > > 179 static volatile bool g_final_report_did_run = false;
> > > 180 void MemTracker::final_report(outputStream* output) {
> > > 181   // This function is called during both error reporting and
> normal VM
> > > exit.
> > > 182   // However, it should only ever run once.  E.g. if the VM crashes
> > > after
> > > 183   // printing the final report during normal VM exit, it should not
> > > print
> > > 184   // the final report again. In addition, it should be guarded from
> > > 185   // recursive calls in case NMT reporting itself crashes.
> > > 186   if (Atomic::cmpxchg(true, &g_final_report_did_run, false) ==
> false) {
> > > 187     NMT_TrackingLevel level = tracking_level();
> > > 188     if (level >= NMT_summary) {
> > > 189       report(level == NMT_summary, output);
> > > 190     }
> > > 191   }
> > > 192 }
> > >
> > > Is  this really what we want ?   Of course we want to avoid printing it
> > > twice (or more than that ) from error reporting.
> > > But I think we would miss it from error reporting in some situations
> when
> > > we want it there .
> > >
> > >
> > This is exactly what I wanted:
> >
> > MemTracker::final_report(tty) is supposed to print the final report. It
> is
> > called in two places, during normal VM shutdown (A) and during error
> > handling (B). By only allowing the code to run once I get the behaviour I
> > wanted:
> >
> > Case 1: normal shutdown: we execute (A) from before_exit(), all is well.
> > Case 2: we crash before normal shutdown: we execute (B) from within the
> > error handler.
> > Case 3: we crash during normal shutdown: we executed already (A) and
> > hence
> > (B) is a noop
> > Case 4: crash within (!) MemTracker::final_report():
> >     Case 4.1: MemTracker::final_report() was called during normal
> shutdown
> > and crashed (A):  - we enter error handling, but we will not re-enter NMT
> > reporting at (B) which is good since NMT reporting is not reentrant.
> >     Case 4.2: MemTracker::final_report() was called during error handling
> > and crashed (B):  - we enter the secondary signal handler, restart
> > VMError::report_and_die(), but will not attempt to print NMT report again
> >
> > Especially Case 4 is important, since it can lead to hanging error
> > reporting since NMT is not reentrant and will suffocate on its own lock.
> >
> > Arguably, Case 2 and 3 are "just" aesthetics and prevent seeing the same
> > report twice.
> >
> >
> >
> > > Otherwise looks okay to me .
> > >
> > >
> > Thanks!
> >
> >
> > > Best regards, Matthias
> > >
> > >
> > Cheers, Thomas
> >
> > >
> > >
> > > >Hi all,
> > > >
> > > >We have -XX:+-PrintNMTStatistics, a very useful switch which will
> cause
> > > the
> > > >VM to print out the NMT statistics if the VM exits normally.
> > > >
> > > >Currently it does not work if the VM exits due to a fatal error. But
> > > >especially in fatal exits due to native OOM a NMT report would be very
> > > >helpful.
> > > >
> > > >JBS: https://bugs.openjdk.java.net/browse/JDK-8227031
> > > >
> > > >cr:
> > > >
> > > http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print-
> > nmt-report-on-oom/webrev.00/webrev/index.html
> > > >
> > > >Changes in this patch:
> > > >- handle PrintNMTStatistics on fatal error
> > > >- make sure the final report is not called twice accidentally and it
> is
> > > not
> > > >called recursively due to secondary error handling
> > > >- change the Metaspace report portion of the NMT report to only
> include
> > > the
> > > >brief metaspace report - that one can be called at any time, it does
> not
> > > >lock nor require any resources.
> > > >
> > > >Please note: this will not work when we are in an OOM situation and
> > > request
> > > >a detailed NMT report; that scenario needs more work since NMT
> > detailed
> > > >reports need memory as well. That is a separate issue.
> > >
> > >
> > >
> > >
>