RFR(xs): 8227031: Print NMT statistics on fatal errors

Tue Jul 9 07:31:36 UTC 2019

Hi Matthias,

On Tue, Jul 9, 2019 at 9:15 AM Baesken, Matthias <matthias.baesken at sap.com>
wrote:

> Hi Thomas,   In wonder  about  the following :
>
> MemTracker::final_report   is called also from  print_statistics()  :
>
> hotspot/share/runtime/java.cpp
> -----------------------------------------------------
> void print_statistics() {
> ...
> 353  // Native memory tracking data
> 354  if (PrintNMTStatistics) {
> 355    MemTracker::final_report(tty);
> 356  }
>
>
> Would this mean  that when  called  before  from  print_statistics() , we
> would not call it again  from vmError because of the
> g_final_report_did_run  check ?
>
> src/hotspot/share/services/memTracker.cpp
> -----------------------------------------------
>
> 179 static volatile bool g_final_report_did_run = false;
> 180 void MemTracker::final_report(outputStream* output) {
> 181   // This function is called during both error reporting and normal VM
> exit.
> 182   // However, it should only ever run once.  E.g. if the VM crashes
> after
> 183   // printing the final report during normal VM exit, it should not
> print
> 184   // the final report again. In addition, it should be guarded from
> 185   // recursive calls in case NMT reporting itself crashes.
> 186   if (Atomic::cmpxchg(true, &g_final_report_did_run, false) == false) {
> 187     NMT_TrackingLevel level = tracking_level();
> 188     if (level >= NMT_summary) {
> 189       report(level == NMT_summary, output);
> 190     }
> 191   }
> 192 }
>
> Is  this really what we want ?   Of course we want to avoid printing it
> twice (or more than that ) from error reporting.
> But I think we would miss it from error reporting in some situations when
> we want it there .
>
>
This is exactly what I wanted:

MemTracker::final_report(tty) is supposed to print the final report. It is
called in two places, during normal VM shutdown (A) and during error
handling (B). By only allowing the code to run once I get the behaviour I
wanted:

Case 1: normal shutdown: we execute (A) from before_exit(), all is well.
Case 2: we crash before normal shutdown: we execute (B) from within the
error handler.
Case 3: we crash during normal shutdown: we executed already (A) and hence
(B) is a noop
Case 4: crash within (!) MemTracker::final_report():
    Case 4.1: MemTracker::final_report() was called during normal shutdown
and crashed (A):  - we enter error handling, but we will not re-enter NMT
reporting at (B) which is good since NMT reporting is not reentrant.
    Case 4.2: MemTracker::final_report() was called during error handling
and crashed (B):  - we enter the secondary signal handler, restart
VMError::report_and_die(), but will not attempt to print NMT report again

Especially Case 4 is important, since it can lead to hanging error
reporting since NMT is not reentrant and will suffocate on its own lock.

Arguably, Case 2 and 3 are "just" aesthetics and prevent seeing the same
report twice.

> Otherwise looks okay to me .
>
>
Thanks!

> Best regards, Matthias
>
>
Cheers, Thomas

>
>
> >Hi all,
> >
> >We have -XX:+-PrintNMTStatistics, a very useful switch which will cause
> the
> >VM to print out the NMT statistics if the VM exits normally.
> >
> >Currently it does not work if the VM exits due to a fatal error. But
> >especially in fatal exits due to native OOM a NMT report would be very
> >helpful.
> >
> >JBS: https://bugs.openjdk.java.net/browse/JDK-8227031
> >
> >cr:
> >
> http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print-nmt-report-on-oom/webrev.00/webrev/index.html
> >
> >Changes in this patch:
> >- handle PrintNMTStatistics on fatal error
> >- make sure the final report is not called twice accidentally and it is
> not
> >called recursively due to secondary error handling
> >- change the Metaspace report portion of the NMT report to only include
> the
> >brief metaspace report - that one can be called at any time, it does not
> >lock nor require any resources.
> >
> >Please note: this will not work when we are in an OOM situation and
> request
> >a detailed NMT report; that scenario needs more work since NMT detailed
> >reports need memory as well. That is a separate issue.
>
>
>
>