RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux)

Thomas Stuefe stuefe at openjdk.java.net
Mon Oct 26 14:19:13 UTC 2020


On Mon, 26 Oct 2020 04:49:24 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> hi all,
>> 
>> Please review this fix for POSIX platforms, which addresses a time out that occurs while handling a crash with UseOSErrorReporting ON
>> 
>> The timeout was caused by the crash handling code, looping infinitively, because it incorrectly assumed that the signal handlers were reset to their defaults, while what really was happening was that the code was resetting the signal handlers to our default signal handler.
>> 
>> To avoid similar confusion in the future I did the following:
>> 
>> - renamed the VMError::reset_signal_handlers() to VMError:: rearm_signal_handlers()
>> - introduced a new API VMError::clear_signal_handlers() which is implemented in PosixSignals
>> 
>> PosixSignals::clear_signal_handlers() is where the actual fix is done and it simply resets all signal handlers to their system defaults.
>> 
>> A similar problem occurs on Windows, with the only difference being that before a process times out (takes 2 minutes) it runs out of stack space in about 250 loops, so that's the only reason it doesn't linger for that long. Windows issue is tracked separately by https://bugs.openjdk.java.net/browse/JDK-8250782
>> 
>> Note: The expectation for "UseOSErrorReporting" is for the OS to catch the crashed process and to produce its own crash log (in addition to Hotspot creating hs_err log file) - see https://bugs.openjdk.java.net/browse/JDK-8237727 for relevant discussion. It does not affect whether core dump is written or not (that is controlled by CreateCoredumpOnCrash)
>
> Changing review status to "Request changes".

> <snip>
> So my preferred approaches here would be:
> 
> 1. Make UseOSErrorReporting Windows only; or
> 2. Make UseOSErrorReporting Windows and macOS only. Then on macOS do a targeted crash after report_and_die() returns.
> 

I like (2). It is sure to preserve the stack of the crashing thread. Not perfect, but maybe its close to what Gerard likes to see on MacOS.

Only remark, this gets very close to what we do already, since os::abort() calls ::abort() which raises SIGABORT... but according to Gerard abort() does not seem to get noticed by MacOS crash handling. So artificially triggering a fault may be better.

..Thomas

> Thanks,
> David

-------------

PR: https://git.openjdk.java.net/jdk/pull/813


More information about the hotspot-dev mailing list