RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting
Thomas Stuefe
stuefe at openjdk.org
Tue Feb 28 07:42:06 UTC 2023
On Mon, 27 Feb 2023 23:31:49 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> Please review this change to the implementation of the Windows-specific option
>> UseOSErrorReporting, toward allowing crash reporting functions to be declared
>> noreturn. VMError::report_and_die no longer conditionally returns if the
>> Windows-only option UseOSErrorReporting is true.
>>
>> The Windows-only sections of report_and_die now call RaiseFailFastException
>> (https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception),
>> which immediately invokes WER (Windows Error Reporting) if it is enabled,
>> without executing structured exception handler. If WER is not enabled, it
>> just immediately terminates the program. Thus, we no longer return to walk up
>> thestructured exception handler chain to pop out at the top as unhandled in
>> order to invoke WER.
>>
>> This permits declaring report_and_die as [[noreturn]], once some functions
>> from the os class are also so declared. Also adding that attribute as
>> appropriate to other functions in the os class. This of course assumes
>> the use of [[noreturn]] in HotSpot code is approved (JDK-8302124).
>>
>> There is a pre-existing bug that I'll be reporting separately. If
>> UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an
>> empty .mdmp file. We shouldn't create that file when UseOSErrorReporting.
>>
>> Testing:
>> mach5 tier1-3
>>
>> Manual testing with the following, to verify desired behavior.
>>
>> -XX:ErrorHandlerTest=N
>> 1: assertion failure
>> 2: guarantee failure
>> 14: SIGSEGV
>> 15: divide by zero
>> path/to/bin/java \
>> -XX:+UnlockDiagnosticVMOptions \
>> -XX:+ErrorLogSecondaryErrorDetails \
>> -XX:+UseOSErrorReporting \
>> -XX:ErrorHandlerTest=1 \
>> TestDebug.java
>>
>> --- TestDebug.java ---
>> import java.lang.String;
>> public class TestDebug {
>> static private volatile String dummy;
>> public static void main(String[] args) throws Exception {
>> while (true) {
>> dummy = new String("foo bar");
>> }
>> }
>> }
>> --- end TestDebug.java ---
>>
>> The state of WER can be examined and modified using Power Shell commands
>> {Get,Enable,Disable}-WindowsErrorReporting.
>>
>> The state of reporting WER captured errors can be examined and modified using
>> Control Panel > Security and Maintenance > Maintenance : Report Problems [on,off]
>>
>> With Report Problems off, reports are placed in
>> c:\ProgramData\Microsoft\Windows\WER\ReportArchive
>>
>> I verified that executing the above test with WER enabled adds an entry in
>> that directory, but not when it's disabled. Also nothing is added there when
>> the test is run with -XX:-UseOSErrorReporting.
>
> src/hotspot/share/utilities/vmError.cpp line 1448:
>
>> 1446: if (UseOSErrorReporting && log_done) {
>> 1447: raise_fail_fast(_siginfo, _context);
>> 1448: }
>
> I think this bit might not be needed anymore. I think the reason for the old code here is to deal with
> the repeated calls to report_and_die as we walk up the Windows exception filter list and return because
> of UseOSErrorReporting. We don't do that anymore. I think we could instead now not do anything here
> and treat repeated calls "normally", reporting them as such and eventually dying, either via Windows
> fail-fast or abort.
I think we should remove this part. Otherwise concurrent threads running into faults cause the VM to end abruptly while the primary thread is its in the last stages of error reporting: after log_done is set but before os::die was called (we do some things here: JFR shutdown, NMT statistics, compiler replay file writing).
In fact, this may be an existing bug which could explain missing/corrupt replay files or JFR files if UseOSErrorReporting is set.
-------------
PR: https://git.openjdk.org/jdk/pull/12759
More information about the hotspot-dev
mailing list