RFR (M) 8237727: Mac: after we handle a crash, Apple's crash reporter is left with incorrect state

Mon Jul 20 04:15:56 UTC 2020

Hi Gerard,

On 18/07/2020 5:21 am, gerard ziemski wrote:
> 
> Hi all,
> 
> Please review this enhancement, which changes how we handle a crash on 
> macOS, so that the native macOS CrashReporter can create its own crash 
> report alongside ours, with correct crash signal and frame.

So ... we have a flag, UseOSErrorReporting, to control whether the VM 
handles error reporting or whether it lets the OS handle things. The VM 
doesn't directly interact with OS error reporting but obviously if that 
OS reporting differs whether ::exit or ::abort is called then the VM 
does affect that. So my initial question is:

Should the user not just set UseOSErrorReporting if they want the 
CrashReporter to have full control, and indeed will it work correctly if 
the user sets that?

I can see some motivation for have the VM handle things and get a nice 
hs_err file, whilst also having the OS handle things and get, e.g., 
stacktraces of all threads (yes I've often wished for that myself!).

That said there are practical implications of this change. Primarily 
from a testing perspective IIUC currently if we disable core dumps then 
the CrashReporter will not be involved at all, but with this change it 
will be - correct? In which case do we now risk filling up test machine 
disks with these error logs?

My own feeling here is that we may want this for the abort case but if 
we do an exit (because the user doesn't want a core dump) then we've 
chosen to treat this failure as not-a-crash, and so it is appropriate 
for the OS crash reporter to not be involved.

Cheers,
David
-----

> Normally after we handle a crash we terminate the process with either 
> with exit() or abort():
> 
> #1 - When we terminate with abort() (the case when the core dump is 
> enabled, i.e. -XX:+CreateCoredumpOnCrash) macOS CrashReported doesn’t 
> see the original crash that we handled, but only sees the abort, which 
> it correctly, but confusingly reports as the termination reason.
> 
> #2 - When we terminate with exit() (the case when the core dump is 
> disabled, i.e. -XX:-CreateCoredumpOnCrash) macOS CrashReporter doesn’t 
> see the crash and does not generate a report at all.
> 
> 
> With this proposed fix we handle the crash as usual, but then instead of 
> aborting/exiting, we allow the process to crash again, which allows the 
> macOS CrashReported to generate its crash log with correct exception 
> type and termination signal, showing the actual frame that crashed, in 
> all cases (regardless of whether the core dump is enabled or disabled)
> 
> Before, the CrashReported would only indicate the (abort) exception type 
> with no termination signal:
> 
> 
> Exception Type:  EXC_BAD_ACCESS (SIGABRT)
> Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000008
> Exception Note:        EXC_CORPSE_NOTIFY
> 
> 
> But now we get (correct) exception type and termination signal:
> 
> 
> Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
> Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
> Exception Note:        EXC_CORPSE_NOTIFY
> 
> Termination Signal:    Segmentation fault: 11
> Termination Reason:    Namespace SIGNAL, Code 0xb
> Terminating Process:   exc handler [1497]
> 
> 
> In addition, instead of a frame like this:
> 
> 
> Thread 2 Crashed:
> 0   libsystem_kernel.dylib 0x00007fff6ec4633a __pthread_kill + 10
> 1   libsystem_pthread.dylib 0x00007fff6ed02e60 pthread_kill + 430
> 2   libsystem_c.dylib 0x00007fff6ebcd808 abort + 120
> 3   libjvm.dylib 0x00000001037c6da1 os::abort(bool, void*, void const*) 
> + 49 (os_bsd.cpp:1069)
> 4   libjvm.dylib 0x0000000103a9a249 VMError::report_and_die(int, char 
> const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, 
> void*, char const*, int, unsigned long) + 3017 (vmError.cpp:1639)
> 5   libjvm.dylib 0x0000000103a99655 VMError::report_and_die(Thread*, 
> unsigned int, unsigned char*, void*, void*, char const*, ...) + 149 
> (vmError.cpp:1315)
> 6   libjvm.dylib 0x0000000103a9a341 VMError::report_and_die(Thread*, 
> unsigned int, unsigned char*, void*, void*) + 33 (vmError.cpp:1322)
> 7   libjvm.dylib 0x00000001037cbdfa JVM_handle_bsd_signal + 618 
> (os_bsd_x86.cpp:763)
> 8   libjvm.dylib 0x00000001037c8ce9 signalHandler(int, __siginfo*, 
> void*) + 89 (os_bsd.cpp:2589)
> 9   libsystem_platform.dylib 0x00007fff6ecf75fd _sigtramp + 29
> 10  ??? 000000000000000000 0 + 0
> 11  libjvm.dylib 0x0000000103a4aa51 Unsafe_PutInt(JNIEnv_*, _jobject*, 
> _jobject*, long, int) + 241 (unsafe.cpp:313)
> 12  ??? 0x00000001060baade 0 + 4396395230
> 13  ??? 0x00000001060b207e 0 + 4396359806
> 14  ??? 0x00000001060b207e 0 + 4396359806
> 15  ??? 0x00000001060b207e 0 + 4396359806
> 16  ??? 0x00000001060a89ca 0 + 4396321226
> 17  libjvm.dylib 0x0000000103225792 JavaCalls::call_helper(JavaValue*, 
> methodHandle const&, JavaCallArguments*, Thread*) + 1426 
> (javaCalls.cpp:430)
> 18  libjvm.dylib 0x00000001032e1ee1 jni_invoke_static(JNIEnv_*, 
> JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, 
> Thread*) + 417 (jni.cpp:975)
> 19  libjvm.dylib 0x00000001032e9605 jni_CallStaticVoidMethod + 645 
> (jni.cpp:1830)
> 20  libjli.dylib 0x0000000101122d3f JavaMain + 2495 (java.c:556)
> 21  libjli.dylib 0x00000001011254a9 ThreadJavaMain + 9 
> (java_md_macosx.m:720)
> 22  libsystem_pthread.dylib 0x00007fff6ed03109 _pthread_start + 148
> 23  libsystem_pthread.dylib 0x00007fff6ecfeb8b thread_start + 15
> 
> 
> CrashReporter will now show:
> 
> 
> Thread 2 Crashed:
> 0   libjvm.dylib 0x0000000108034415 MemoryAccess<int>::put(int) + 205 
> (unsafe.cpp:233)
> 1   libjvm.dylib 0x00000001080249a0 Unsafe_PutInt(JNIEnv_*, _jobject*, 
> _jobject*, long, int) + 206 (unsafe.cpp:313)
> 2   ??? 0x0000000111ad9ade 0 + 4591557342
> 3   ??? 0x0000000111ad107e 0 + 4591521918
> 4   ??? 0x0000000111ad107e 0 + 4591521918
> 5   ??? 0x0000000111ad107e 0 + 4591521918
> 6   ??? 0x0000000111ac79ca 0 + 4591483338
> 7   libjvm.dylib 0x0000000107f37656 JavaCalls::call_helper(JavaValue*, 
> methodHandle const&, JavaCallArguments*, Thread*) + 1006 
> (javaCalls.cpp:430)
> 8   libjvm.dylib 0x0000000108077b7b jni_invoke_static(JNIEnv_*, 
> JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, 
> Thread*) + 260 (jni.cpp:975)
> 9   libjvm.dylib 0x000000010807dffc jni_CallStaticVoidMethod + 529 
> (jni.cpp:1830)
> 10  libjli.dylib 0x0000000104452d3f JavaMain + 2495 (java.c:556)
> 11  libjli.dylib 0x00000001044554a9 ThreadJavaMain + 9 
> (java_md_macosx.m:720)
> 12  libsystem_pthread.dylib 0x00007fff6ed03109 _pthread_start + 148
> 13  libsystem_pthread.dylib 0x00007fff6ecfeb8b thread_start + 15
> 
> 
> Which correctly identifies the top frame that caused the crash (instead 
> of "??? 000000000000000000 0 + 0”)
> 
> Also, in our hs_err_pidX log we now show an approximate location to 
> the crash log report produced by the CrashReporter, ex:
> 
> 
> ---------------  S Y S T E M  ---------------
> 
> ...
> 
> CrashReporter log: /Users/gerard/Library/Logs/DiagnosticReports/java_*
> 
> END.
> 
> 
> Lastly, having macOS CrashReporter produce its crash log, in addition to 
> our own hs_err_pidX log, is valuable because it shows back traces for 
> all the native threads (ours only shows the crashed one) as well as line 
> offsets to the files (ours shows binary offsets). It’s also nice to 
> validate that the info in our crash log is correct.
> 
> For full before and after CrashReporter logs please see the bug.
> 
> bug link at https://bugs.openjdk.java.net/browse/JDK-8237727
> open webrev at http://cr.openjdk.java.net/~gziemski/8237727_rev1
> testing: passes Mach5 hs_tier1,2,3,4,5
> 
> cheers
>