RFR(s): 8214975: No hs-err file if fatal error is raised during dynamic initialization.

Wed Jan 16 17:58:25 UTC 2019

> cr (delta); 
> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev_delta.00/webrev/

src/hotspot/share/utilities/vmError.cpp
     No comments.

Thumbs up.

Dan

On 1/14/19 11:15 PM, David Holmes wrote:
> Hi Thomas,
>
> That seems fine to me.
>
> Please update copyright years to 2019.
>
> Thanks,
> David
>
> On 15/01/2019 2:01 am, Thomas Stüfe wrote:
>> Hi David, Dan,
>>
>> sorry for the extremely long delay. I have been away some weeks.
>>
>> Could you please give this patch a final glance?
>>
>> cr (full): 
>> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev.00/webrev/
>> cr (delta); 
>> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev_delta.00/webrev/
>>
>> I only added some small comments and reworked the closing of the 
>> hs-err file since David was right - in case the hs-err file could not 
>> be opened, we'd call close(-1).
>>
>> Thank you!
>>
>> Cheers, Thomas
>>
>>
>> On Tue, Dec 11, 2018 at 1:58 AM David Holmes <david.holmes at oracle.com 
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     On 8/12/2018 9:21 pm, Thomas Stüfe wrote:
>>      > Hi,
>>      >
>>      > may I please have reviews for this small patch:
>>      >
>>      > https://bugs.openjdk.java.net/browse/JDK-8214975
>>      >
>> http://cr.openjdk.java.net/~stuefe/webrevs/8214975-no-hs-errfile-on-early-assert/webrev.00/webrev
>>      >
>>      > A fatal error (e.g. assert), if triggered during dynamic
>>      > initialization, will plain kill the VM without a trace. (A)
>>      >
>>      > A alternative variant of this error: VM will get caught up in an
>>      > endless recursion, repeat "[Too many errors, abort]\n" endlessly,
>>      > quickly growing its RSS until the OOM Killer kills it. (B)
>>      >
>>      > These symptoms are all caused by VMError::report() attempting to
>>     write
>>      > to an uninitialized fdStream object (VMError::out and 
>> VMError::log).
>>      > These object instances are allocated at global file scope:
>>      >
>>      > 1199 fdStream VMError::out(defaultStream::output_fd());
>>      > 1200 fdStream VMError::log; // error log used by
>>     VMError::report_and_die()
>>      >
>>      > They are non-trivial (have vtables) and need to be initialized
>>      > themselves before being used. If the assert happens before 
>> they are
>>      > initialized, the vtables will not yet have been set up, and 
>> once we
>>      > attempt to call out::write() or log::write(), in an context 
>> where we
>>      > only have an outputStream* ptr (e.g. VMError::report()), we 
>> crash.
>>      >
>>      > Depending on which one of VMError::log and VMError::out are still
>>      > uninitialized, we end up in (A) or (B). In both cases, the 
>> secondary
>>      > signal handler (crash_handler()) will catch the signal, re-try 
>> error
>>      > reporting and crash again. This is an endless cycle. We do have
>>      > recursion detection in place to stop this cycle:
>>      >
>>      > 1388 if (recursive_error_count++ > 30) {
>>      > 1389 out.print_raw_cr("[Too many errors, abort]");
>>      > 1390 os::die();
>>      > 1391 }
>>      >
>>      > but this again uses methods of outputStream to write the "Too 
>> many
>>      > errors" text, which will crash again before ever reaching 
>> os::die().
>>      >
>>      > --
>>      >
>>      > Solution: do not rely on global non-trivial C++ objects, 
>> especially
>>      > not at global scope. Instead, keep the information which must be
>>      > preserved across recursions and different threads (which are 
>> only the
>>      > logfile handles themselves) in non-c++ (POD) ints.
>>      >
>>      > As added safety, do not keep them at global scope but at static
>>      > function-scope inside VMError::report_and_die() - that moves 
>> their
>>      > initialization back to the first time they are called. To 
>> print, we
>>      > still need outputStream objects of course, but they can be 
>> restricted
>>      > to function scope inside VMError::report_and_die(), and can be
>>      > recreated each time this function is entered, be it because of
>>      > recursions of from a different thread.
>>
>>     Okay this seems reasonable - the log is local but the fd it uses is
>>     static. Only comment is that if prepare_log_file return -1 then 
>> we will
>>     call close(-1) here:
>>
>>     1476     if (fd_log != fd_out) {
>>     1477       close(fd_log);
>>     1478     }
>>
>>     I would have expected to assign to a temporary and only update
>>     fd_log if
>>     not -1.
>>
>>      > Additionally, a jtreg regression test is added.
>>
>>     Nit: s/the hotspot/hotspot/
>>
>>     I note that producing the hs_err log from the test actually induces
>>     numerous secondary errors. Seems there moay be opportunity to make
>>     other
>>     parts of error reporting even more robust.
>>
>>     Thanks,
>>     David
>>
>>      > Thanks, Thomas
>>      >
>>