RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v11]
Ralf Schmelter
rschmelter at openjdk.org
Thu Dec 1 16:07:08 UTC 2022
On Thu, 1 Dec 2022 10:10:28 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017.
>>
>> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds.
>>
>> ---
>>
>> Patch
>>
>> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`.
>> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead.
>> - Removed a stray newline from print_native_stack to clean output.
>> - added regression testing for this feature. I removed my name and the bug id from the test since we don't do this anymore (?).
>> - added clarifying comments to the test and code
>> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895)
>>
>> Output looks like this:
>>
>>
>> $ java ... -XX:+ErrorLogSecondaryErrorDetails
>>
>>
>> will produce, for secondary errors, siginfo and call stack.
>>
>>
>> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61]
>> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000]
>> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V [libjvm.so+0x1ceea61] VMError::controlled_crash(int)+0x241 (vmError.cpp:1946)
>> V [libjvm.so+0x1cf413f] VMError::report(outputStream*, bool)+0x46bf (vmError.cpp:564)
>> V [libjvm.so+0x1cf516b] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b (vmError.cpp:1709)
>> V [libjvm.so+0x1cf5e8f] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f (vmError.cpp:1467)
>> V [libjvm.so+0x1cf5ec2] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22 (vmError.cpp:1473)
>> V [libjvm.so+0x1a549e7] JVM_handle_linux_signal+0x1f7 (signals_posix.cpp:656)
>> C [libc.so.6+0x43090]
>> V [libjvm.so+0x11d6965] JNI_CreateJavaVM+0x5b5 (jni.cpp:3662)
>> C [libjli.so+0x4013] JavaMain+0x93 (java.c:1457)
>> C [libjli.so+0x800d] ThreadJavaMain+0xd (java_md.c:650)
>> ]
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert accidental change
Loks good to me. I woulld really advocate to make it usable in product builds.
src/hotspot/share/runtime/globals.hpp line 520:
> 518: range(0, (uint64_t)max_jlong/1000) \
> 519: \
> 520: develop(bool, ErrorLogSecondaryErrorDetails, false, \
Making this a debug flag severely limits its usefulness. You would have to switch the VM to a debug VM, which might be hard for some customers. Since the change doesn't adds much code and there is no performance impact, I would make it diagnostic or product.
src/hotspot/share/utilities/vmError.cpp line 1507:
> 1505: // Any information (signal, context, siginfo etc) printed here should use the function
> 1506: // arguments, not the information stored in *this, since those describe the primary crash.
> 1507: static char tmp[256]; // cannot use global scratch buffer
This is not thread safe. You could have parallel threads which trigger a secondary crash. Not sure if it will be a problem in practice, since it is rather rare. I would still use a local buffer, since 256 bytes is small.
src/hotspot/share/utilities/vmError.cpp line 1527:
> 1525: {
> 1526: if (ErrorLogSecondaryErrorDetails) {
> 1527: static bool recursed = false;
Same as above. Would not work if we have parallel crashes, but I don't think it warrants making the code more complicated to catch this.
-------------
Marked as reviewed by rschmelter (Reviewer).
PR: https://git.openjdk.org/jdk/pull/11118
More information about the hotspot-dev
mailing list