RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux)
Gerard Ziemski
gziemski at openjdk.java.net
Mon Oct 26 15:35:14 UTC 2020
On Mon, 26 Oct 2020 04:33:03 GMT, David Holmes <dholmes at openjdk.org> wrote:
> Hi Gerard,
>
> I think we have a fundamental problem here that UseOSErrorReporting was only ever intended for use on Windows. It simply allows VMError::report_and_die to return instead of actually making the VM "die". For Windows this means we can continue to propagate the windows exception and thus allow Windows Error Reporting (WER) to take over. Whether this actually works correctly or not is a different matter.
>
> For non-Windows there is no pre-established alternative code path for report_and_die() returning.
>
> In the bug report you write:
>
> > On Mac/Linux it would look more like this:
> > #1 catch signal in our handler
> > #2 generate hs_err log
> > #3 turn off our signal handler
> > #4 continue the process normally, allowing it to crash again in the same spot, with the same signal being generated
>
> To me you are now inventing what UseOSErrorReporting should mean on non-Windows, and I don't agree with it. I don't think it should mean that we re-crash using the "default" signal response and consider that as using "OS error reporting". To me that is just not valid, especially when we cannot return from a signal handling context in many cases without incurring undefined behaviour. To me #4 is not a valid expectation as we have no way to know what will happen next if the signal handler returns. It would also be wrong to just continue execution after an assertion or guarantee fails.
>
> I'm assuming that the motivation here is that on macOS if we use the default signal handling modes then macOS will do its own error reporting? If so I would suggest that the right response may be to return from report_and_die (on macOS only) and then deliberately crash after restoring the default handler. Obviously that will change which "crash" the OS reports but that is likely to happen anyway as you cannot guarantee how you will crash after trying to continue (and this goes beyond our general "best effort" approaches in signal handling.)
>
> Beyond that I share Thomas's concerns about making sweeping changes to installed signal handlers.
>
> So my preferred approaches here would be:
>
> 1. Make UseOSErrorReporting Windows only; or
> 2. Make UseOSErrorReporting Windows and macOS only. Then on macOS do a targeted crash after report_and_die() returns.
hi David,
Many thanks for the review and finding the background info on the history of this issue.
How we do things when a user turns ON the "UseOSErrorReporting" flag is just an implementation detail.
On Windows we forward the crash to the OS to handle it, but just because in this fix we "just" turn off our signal handlers, reset them to SIG_DFL and return to let it crash again, does not mean it's not a meaningful way to forward it to OS, if that's how the OS wants it - please see this technical note from Apple https://developer.apple.com/forums/thread/113742 where Apple suggest the way to let the macOS handle the crash is to:
"unregister your signal handler (set it to SIG_DFL) and then return. This will cause the crashed process to continue execution, crash again, and generate a crash report via the Apple crash reporter."
That's how Apple suggest we do it for Mac.
I can limit the scope of this fix to just macOS here, like I was planning it for JDK-8237727 and worry about Linux in a different issue.
-------------
PR: https://git.openjdk.java.net/jdk/pull/813
More information about the hotspot-dev
mailing list