RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v11]

Ralf Schmelter rschmelter at openjdk.org
Thu Dec 1 16:07:08 UTC 2022


On Thu, 1 Dec 2022 10:10:28 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017.
>> 
>> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. 
>> 
>> ---
>> 
>> Patch
>> 
>> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`.
>> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead.
>> - Removed a stray newline from print_native_stack to clean output.
>> - added regression testing for this feature. I removed my name and the bug id from the test since we don't do this anymore (?).
>> - added clarifying comments to the test and code
>> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895)
>> 
>> Output looks like this:
>> 
>> 
>> $ java ... -XX:+ErrorLogSecondaryErrorDetails
>> 
>> 
>> will produce, for secondary errors, siginfo and call stack.
>> 
>> 
>> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61]
>> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000]
>> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x1ceea61]  VMError::controlled_crash(int)+0x241  (vmError.cpp:1946)
>> V  [libjvm.so+0x1cf413f]  VMError::report(outputStream*, bool)+0x46bf  (vmError.cpp:564)
>> V  [libjvm.so+0x1cf516b]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b  (vmError.cpp:1709)
>> V  [libjvm.so+0x1cf5e8f]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f  (vmError.cpp:1467)
>> V  [libjvm.so+0x1cf5ec2]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22  (vmError.cpp:1473)
>> V  [libjvm.so+0x1a549e7]  JVM_handle_linux_signal+0x1f7  (signals_posix.cpp:656)
>> C  [libc.so.6+0x43090]
>> V  [libjvm.so+0x11d6965]  JNI_CreateJavaVM+0x5b5  (jni.cpp:3662)
>> C  [libjli.so+0x4013]  JavaMain+0x93  (java.c:1457)
>> C  [libjli.so+0x800d]  ThreadJavaMain+0xd  (java_md.c:650)
>> ]
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert accidental change

Loks good to me. I woulld really advocate to make it usable in product builds.

src/hotspot/share/runtime/globals.hpp line 520:

> 518:           range(0, (uint64_t)max_jlong/1000)                                \
> 519:                                                                             \
> 520:   develop(bool, ErrorLogSecondaryErrorDetails, false,                       \

Making this a debug flag severely limits its usefulness. You would have to switch the VM to a debug VM, which might be hard for some customers. Since the change doesn't adds much code and there is no performance impact, I would make it diagnostic or product.

src/hotspot/share/utilities/vmError.cpp line 1507:

> 1505:         // Any information (signal, context, siginfo etc) printed here should use the function
> 1506:         // arguments, not the information stored in *this, since those describe the primary crash.
> 1507:         static char tmp[256]; // cannot use global scratch buffer

This is not thread safe. You could have parallel threads which trigger a secondary crash. Not sure if it will be a problem in practice, since it is rather rare. I would still use a local buffer, since 256 bytes is small.

src/hotspot/share/utilities/vmError.cpp line 1527:

> 1525:         {
> 1526:           if (ErrorLogSecondaryErrorDetails) {
> 1527:             static bool recursed = false;

Same as above. Would not work if we have parallel crashes, but I don't think it warrants making the code more complicated to catch this.

-------------

Marked as reviewed by rschmelter (Reviewer).

PR: https://git.openjdk.org/jdk/pull/11118


More information about the hotspot-dev mailing list