RFR(s): 8214975: No hs-err file if fatal error is raised during dynamic initialization.

Tue Dec 11 00:57:56 UTC 2018

Hi Thomas,

On 8/12/2018 9:21 pm, Thomas Stüfe wrote:
> Hi,
> 
> may I please have reviews for this small patch:
> 
> https://bugs.openjdk.java.net/browse/JDK-8214975
> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev.00/webrev
> 
> A fatal error (e.g. assert), if triggered during dynamic
> initialization, will plain kill the VM without a trace. (A)
> 
> A alternative variant of this error: VM will get caught up in an
> endless recursion, repeat "[Too many errors, abort]\n" endlessly,
> quickly growing its RSS until the OOM Killer kills it. (B)
> 
> These symptoms are all caused by VMError::report() attempting to write
> to an uninitialized fdStream object (VMError::out and VMError::log).
> These object instances are allocated at global file scope:
> 
> 1199 fdStream VMError::out(defaultStream::output_fd());
> 1200 fdStream VMError::log; // error log used by VMError::report_and_die()
> 
> They are non-trivial (have vtables) and need to be initialized
> themselves before being used. If the assert happens before they are
> initialized, the vtables will not yet have been set up, and once we
> attempt to call out::write() or log::write(), in an context where we
> only have an outputStream* ptr (e.g. VMError::report()), we crash.
> 
> Depending on which one of VMError::log and VMError::out are still
> uninitialized, we end up in (A) or (B). In both cases, the secondary
> signal handler (crash_handler()) will catch the signal, re-try error
> reporting and crash again. This is an endless cycle. We do have
> recursion detection in place to stop this cycle:
> 
> 1388 if (recursive_error_count++ > 30) {
> 1389 out.print_raw_cr("[Too many errors, abort]");
> 1390 os::die();
> 1391 }
> 
> but this again uses methods of outputStream to write the "Too many
> errors" text, which will crash again before ever reaching os::die().
> 
> --
> 
> Solution: do not rely on global non-trivial C++ objects, especially
> not at global scope. Instead, keep the information which must be
> preserved across recursions and different threads (which are only the
> logfile handles themselves) in non-c++ (POD) ints.
> 
> As added safety, do not keep them at global scope but at static
> function-scope inside VMError::report_and_die() - that moves their
> initialization back to the first time they are called. To print, we
> still need outputStream objects of course, but they can be restricted
> to function scope inside VMError::report_and_die(), and can be
> recreated each time this function is entered, be it because of
> recursions of from a different thread.

Okay this seems reasonable - the log is local but the fd it uses is 
static. Only comment is that if prepare_log_file return -1 then we will 
call close(-1) here:

1476     if (fd_log != fd_out) {
1477       close(fd_log);
1478     }

I would have expected to assign to a temporary and only update fd_log if 
not -1.

> Additionally, a jtreg regression test is added.

Nit: s/the hotspot/hotspot/

I note that producing the hs_err log from the test actually induces 
numerous secondary errors. Seems there moay be opportunity to make other 
parts of error reporting even more robust.

Thanks,
David

> Thanks, Thomas
>