RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting

Kim Barrett kbarrett at openjdk.org
Wed Feb 22 07:35:39 UTC 2023


On Wed, 22 Feb 2023 05:20:25 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Please review this change to the implementation of the Windows-specific option
>> UseOSErrorReporting, toward allowing crash reporting functions to be declared
>> noreturn.  VMError::report_and_die no longer conditionally returns if the
>> Windows-only option UseOSErrorReporting is true.
>> 
>> The core of VM crash reporting has been renamed and privatized, and now takes
>> an additional argument indicating whether the caller just wants reporting or
>> also wants termination after reporting.  The Windows top-level and unhandled
>> exception filters call into that to request reporting, and also termination if
>> UseOSErrorReporting is false, otherwise returning from the call and taking
>> further action.  All other callers of VM crash reporting include termination.
>> 
>> So if we get a structured exception on Windows and the option is true then we
>> report it and then end up walking up the exception handlers until running out,
>> when Windows crash behavior gets a shot.  This is essentially the same as
>> before the change, just checking the option in a different place (in the
>> Windows-specific caller, rather than in an #ifdef block in the shared code).
>> 
>> In addition, the core VM crash reporter uses BREAKPOINT before termination if 
>> UseOSErrorReporting is true.  So if we have an assertion failure we report it
>> and then end up walking up the exception handlers (because of the breakpoint
>> exception) until running out, when Windows crash behavior gets a shot.
>> 
>> So if there is an attached debugger (or JIT debugging is enabled), this leads
>> an assertion failure to hit the debugger inside the crash reporter rather than
>> after it returned (due to the old behavior for UseOSErrorReporting) from the
>> failed assertion (hitting the BREAKPOINT there).
>> 
>> Note that on Windows, ::breakpoint() calls DebugBreak(), raising a breakpoint
>> exception, which ends up being unhandled if UseOSErrorReporting is true.
>> 
>> There is a pre-existing bug that I'll be reporting separately.  If
>> UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an
>> empty .mdmp file.  We shouldn't create that file when UseOSErrorReporting.
>> 
>> Testing:
>> mach5 tier1-3
>> 
>> Manual testing with the following, to verify expected behavior that is the
>> same as before the change.
>> 
>> # -XX:ErrorHandlerTest=N
>> # 1: assertion failure
>> # 2: guarantee failure
>> # 14: SIGSEGV
>> # 15: divide by zero
>> path/to/bin/java \
>>   -XX:+UnlockDiagnosticVMOptions \
>>   -XX:+ErrorLogSecondaryErrorDetails \
>>   -XX:+UseOSErrorReporting \
>>   -XX:ErrorHandlerTest=1 \
>>   TestDebug.java
>> 
>> --- TestDebug.java ---
>> import java.lang.String;
>> public class TestDebug {
>>     static private volatile String dummy;
>>     public static void main(String[] args) throws Exception {
>>         while (true) {
>>             dummy = new String("foo bar");
>>         }
>>     }
>> }
>> --- end TestDebug.java ---
>> 
>> I need someone with a better understanding of Windows to test the interaction
>> with Windows Error Reporting (WER).
>
> src/hotspot/share/utilities/vmError.cpp line 1710:
> 
>> 1708:     // Caller requested reporting without termination.
>> 1709:     return;
>> 1710:   } else if (after_report_action == AfterReportAction::Die) {
> 
> Nit: as there are only 2 values for ARA we don't really need an `else if` here.

I didn't want to add an assert.  But see reply to your next comment, about the ShouldNotReachHere at the end.

> src/hotspot/share/utilities/vmError.cpp line 1730:
> 
>> 1728:   }
>> 1729:   // We REALLY better not reach here.
>> 1730:   ShouldNotReachHere();
> 
> Maybe just `os::die()` again rather than a recursive call back into VMError code? (Not that we ever expect to execute this but ...

This blob of code could have been equivalently written as

  switch (after_report_action) {
  case AfterReportAction::Return:
    return;
  case AfterReportAction::Die:
    ... breakpoint and abort ...
    os::die();
    // should be unreachable because die is noreturn.  perhaps fall through
    // if get here, though we're in UB territory.
  default:
    ShouldNotReachHere();
  }

I didn't write it that way because the Die code is a bit longer and more
complicated than I like to put in a case clause.

-------------

PR: https://git.openjdk.org/jdk/pull/12651


More information about the hotspot-dev mailing list