RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux)

Thomas Stuefe stuefe at openjdk.java.net
Thu Oct 22 20:16:15 UTC 2020


On Thu, 22 Oct 2020 16:40:43 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

> hi all,
> 
> Please review this fix for POSIX platforms, which addresses a time out that occurs while handling a crash with UseOSErrorReporting ON
> 
> The timeout was caused by the crash handling code, looping infinitively, because it incorrectly assumed that the signal handlers were reset to their defaults, while what really was happening was that the code was resetting the signal handlers to our default signal handler.
> 
> To avoid similar confusion in the future I did the following:
> 
> - renamed the VMError::reset_signal_handlers() to VMError:: rearm_signal_handlers()
> - introduced a new API VMError::clear_signal_handlers() which is implemented in PosixSignals
> 
> PosixSignals::clear_signal_handlers() is where the actual fix is done and it simply resets all signal handlers to their system defaults.
> 
> A similar problem occurs on Windows, with the only difference being that before a process times out (takes 2 minutes) it runs out of stack space in about 250 loops, so that's the only reason it doesn't linger for that long. Windows issue is tracked separately by https://bugs.openjdk.java.net/browse/JDK-8250782

Hi Gerard,

I have general concerns about the usefulness of this switch, see the comments in the JBS issue. Beyond that, some remarks below.

Cheers, Thomas

src/hotspot/os/posix/vmError_posix.cpp line 141:

> 139: }
> 140: 
> 141: void VMError::rearm_signal_handlers() {

I think "re-arm" is a misnomer. Nothing is re-armed if by "armed" you mean "a signal handler was installed". There was one installed before and there gets a different one installed now. 

I agree that "reset_signal_handlers" is just plain wrong, and has bugged me too. May I suggest "install_secondary_signal_handlers()" ? Or possibly "install_secondary_crash_handlers" which would have the added benefit of removing the term "signal" out of the shared name space, which is confusing on windows.

src/hotspot/os/posix/signals_posix.cpp line 1425:

> 1423:   for (int i = 0; i < NSIG + 1; i++) {
> 1424:     sigaction(i, &defaulthandler, NULL);
> 1425:   }

Some issues:
- 0 is not a valid signal number
- NSIG is obscure; may be ridiculously large (e.g. AIX I believe 1024) and still may not contain all signals for some platforms (e.g. real time signals on BSD (?) are not included)
- You now flatten here all signal handlers across the board. What if some third party app had installed their own. I would only reset those which had been originally installed for hotspot usage (beware ReduceSignalUsage). 

Which brings me to the point that if we fail to re-raise the original error signal, now we run with signal handlers completely disabled :( ... see my original concerns.

-------------

Changes requested by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/813


More information about the hotspot-dev mailing list