RFR(xs): 8227031: Print NMT statistics on fatal errors
Thomas Stüfe
thomas.stuefe at gmail.com
Wed Jul 10 13:02:17 UTC 2019
Hi Martin,
thanks for the review! I will remove the unnecessary comment before pushing.
Cheers, Thomas
On Wed, Jul 10, 2019 at 2:40 PM Doerr, Martin <martin.doerr at sap.com> wrote:
> Hi Thomas,
>
> looks good to me.
>
> vmError.cpp:
> // Print out NMT statistics if this was desired.
>
> What is the value of this comment?
> I suggest to remove it if we don't have anything important to explain.
>
> Best regards,
> Martin
>
>
> > -----Original Message-----
> > From: hotspot-runtime-dev <hotspot-runtime-dev-
> > bounces at openjdk.java.net> On Behalf Of Thomas Stüfe
> > Sent: Dienstag, 9. Juli 2019 09:32
> > To: Baesken, Matthias <matthias.baesken at sap.com>
> > Cc: Hotspot dev runtime <hotspot-runtime-dev at openjdk.java.net>
> > Subject: Re: RFR(xs): 8227031: Print NMT statistics on fatal errors
> >
> > Hi Matthias,
> >
> >
> > On Tue, Jul 9, 2019 at 9:15 AM Baesken, Matthias
> > <matthias.baesken at sap.com>
> > wrote:
> >
> > > Hi Thomas, In wonder about the following :
> > >
> > > MemTracker::final_report is called also from print_statistics() :
> > >
> > > hotspot/share/runtime/java.cpp
> > > -----------------------------------------------------
> > > void print_statistics() {
> > > ...
> > > 353 // Native memory tracking data
> > > 354 if (PrintNMTStatistics) {
> > > 355 MemTracker::final_report(tty);
> > > 356 }
> > >
> > >
> > > Would this mean that when called before from print_statistics() ,
> we
> > > would not call it again from vmError because of the
> > > g_final_report_did_run check ?
> > >
> > > src/hotspot/share/services/memTracker.cpp
> > > -----------------------------------------------
> > >
> > > 179 static volatile bool g_final_report_did_run = false;
> > > 180 void MemTracker::final_report(outputStream* output) {
> > > 181 // This function is called during both error reporting and
> normal VM
> > > exit.
> > > 182 // However, it should only ever run once. E.g. if the VM crashes
> > > after
> > > 183 // printing the final report during normal VM exit, it should not
> > > print
> > > 184 // the final report again. In addition, it should be guarded from
> > > 185 // recursive calls in case NMT reporting itself crashes.
> > > 186 if (Atomic::cmpxchg(true, &g_final_report_did_run, false) ==
> false) {
> > > 187 NMT_TrackingLevel level = tracking_level();
> > > 188 if (level >= NMT_summary) {
> > > 189 report(level == NMT_summary, output);
> > > 190 }
> > > 191 }
> > > 192 }
> > >
> > > Is this really what we want ? Of course we want to avoid printing it
> > > twice (or more than that ) from error reporting.
> > > But I think we would miss it from error reporting in some situations
> when
> > > we want it there .
> > >
> > >
> > This is exactly what I wanted:
> >
> > MemTracker::final_report(tty) is supposed to print the final report. It
> is
> > called in two places, during normal VM shutdown (A) and during error
> > handling (B). By only allowing the code to run once I get the behaviour I
> > wanted:
> >
> > Case 1: normal shutdown: we execute (A) from before_exit(), all is well.
> > Case 2: we crash before normal shutdown: we execute (B) from within the
> > error handler.
> > Case 3: we crash during normal shutdown: we executed already (A) and
> > hence
> > (B) is a noop
> > Case 4: crash within (!) MemTracker::final_report():
> > Case 4.1: MemTracker::final_report() was called during normal
> shutdown
> > and crashed (A): - we enter error handling, but we will not re-enter NMT
> > reporting at (B) which is good since NMT reporting is not reentrant.
> > Case 4.2: MemTracker::final_report() was called during error handling
> > and crashed (B): - we enter the secondary signal handler, restart
> > VMError::report_and_die(), but will not attempt to print NMT report again
> >
> > Especially Case 4 is important, since it can lead to hanging error
> > reporting since NMT is not reentrant and will suffocate on its own lock.
> >
> > Arguably, Case 2 and 3 are "just" aesthetics and prevent seeing the same
> > report twice.
> >
> >
> >
> > > Otherwise looks okay to me .
> > >
> > >
> > Thanks!
> >
> >
> > > Best regards, Matthias
> > >
> > >
> > Cheers, Thomas
> >
> > >
> > >
> > > >Hi all,
> > > >
> > > >We have -XX:+-PrintNMTStatistics, a very useful switch which will
> cause
> > > the
> > > >VM to print out the NMT statistics if the VM exits normally.
> > > >
> > > >Currently it does not work if the VM exits due to a fatal error. But
> > > >especially in fatal exits due to native OOM a NMT report would be very
> > > >helpful.
> > > >
> > > >JBS: https://bugs.openjdk.java.net/browse/JDK-8227031
> > > >
> > > >cr:
> > > >
> > > http://cr.openjdk.java.net/~stuefe/webrevs/8227031-optionally-print-
> > nmt-report-on-oom/webrev.00/webrev/index.html
> > > >
> > > >Changes in this patch:
> > > >- handle PrintNMTStatistics on fatal error
> > > >- make sure the final report is not called twice accidentally and it
> is
> > > not
> > > >called recursively due to secondary error handling
> > > >- change the Metaspace report portion of the NMT report to only
> include
> > > the
> > > >brief metaspace report - that one can be called at any time, it does
> not
> > > >lock nor require any resources.
> > > >
> > > >Please note: this will not work when we are in an OOM situation and
> > > request
> > > >a detailed NMT report; that scenario needs more work since NMT
> > detailed
> > > >reports need memory as well. That is a separate issue.
> > >
> > >
> > >
> > >
>
More information about the hotspot-runtime-dev
mailing list