RFR(s): 8214975: No hs-err file if fatal error is raised during dynamic initialization.
David Holmes
david.holmes at oracle.com
Tue Jan 15 04:15:30 UTC 2019
Hi Thomas,
That seems fine to me.
Please update copyright years to 2019.
Thanks,
David
On 15/01/2019 2:01 am, Thomas Stüfe wrote:
> Hi David, Dan,
>
> sorry for the extremely long delay. I have been away some weeks.
>
> Could you please give this patch a final glance?
>
> cr (full):
> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev.00/webrev/
> cr (delta);
> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev_delta.00/webrev/
>
> I only added some small comments and reworked the closing of the hs-err
> file since David was right - in case the hs-err file could not be
> opened, we'd call close(-1).
>
> Thank you!
>
> Cheers, Thomas
>
>
> On Tue, Dec 11, 2018 at 1:58 AM David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
> Hi Thomas,
>
> On 8/12/2018 9:21 pm, Thomas Stüfe wrote:
> > Hi,
> >
> > may I please have reviews for this small patch:
> >
> > https://bugs.openjdk.java.net/browse/JDK-8214975
> >
> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev.00/webrev
> >
> > A fatal error (e.g. assert), if triggered during dynamic
> > initialization, will plain kill the VM without a trace. (A)
> >
> > A alternative variant of this error: VM will get caught up in an
> > endless recursion, repeat "[Too many errors, abort]\n" endlessly,
> > quickly growing its RSS until the OOM Killer kills it. (B)
> >
> > These symptoms are all caused by VMError::report() attempting to
> write
> > to an uninitialized fdStream object (VMError::out and VMError::log).
> > These object instances are allocated at global file scope:
> >
> > 1199 fdStream VMError::out(defaultStream::output_fd());
> > 1200 fdStream VMError::log; // error log used by
> VMError::report_and_die()
> >
> > They are non-trivial (have vtables) and need to be initialized
> > themselves before being used. If the assert happens before they are
> > initialized, the vtables will not yet have been set up, and once we
> > attempt to call out::write() or log::write(), in an context where we
> > only have an outputStream* ptr (e.g. VMError::report()), we crash.
> >
> > Depending on which one of VMError::log and VMError::out are still
> > uninitialized, we end up in (A) or (B). In both cases, the secondary
> > signal handler (crash_handler()) will catch the signal, re-try error
> > reporting and crash again. This is an endless cycle. We do have
> > recursion detection in place to stop this cycle:
> >
> > 1388 if (recursive_error_count++ > 30) {
> > 1389 out.print_raw_cr("[Too many errors, abort]");
> > 1390 os::die();
> > 1391 }
> >
> > but this again uses methods of outputStream to write the "Too many
> > errors" text, which will crash again before ever reaching os::die().
> >
> > --
> >
> > Solution: do not rely on global non-trivial C++ objects, especially
> > not at global scope. Instead, keep the information which must be
> > preserved across recursions and different threads (which are only the
> > logfile handles themselves) in non-c++ (POD) ints.
> >
> > As added safety, do not keep them at global scope but at static
> > function-scope inside VMError::report_and_die() - that moves their
> > initialization back to the first time they are called. To print, we
> > still need outputStream objects of course, but they can be restricted
> > to function scope inside VMError::report_and_die(), and can be
> > recreated each time this function is entered, be it because of
> > recursions of from a different thread.
>
> Okay this seems reasonable - the log is local but the fd it uses is
> static. Only comment is that if prepare_log_file return -1 then we will
> call close(-1) here:
>
> 1476 if (fd_log != fd_out) {
> 1477 close(fd_log);
> 1478 }
>
> I would have expected to assign to a temporary and only update
> fd_log if
> not -1.
>
> > Additionally, a jtreg regression test is added.
>
> Nit: s/the hotspot/hotspot/
>
> I note that producing the hs_err log from the test actually induces
> numerous secondary errors. Seems there moay be opportunity to make
> other
> parts of error reporting even more robust.
>
> Thanks,
> David
>
> > Thanks, Thomas
> >
>
More information about the hotspot-runtime-dev
mailing list