RFR: JDK-8296907: VMError: add optional callstacks, siginfo for secondary errors [v3]

Axel Boldt-Christmas aboldtch at openjdk.org
Wed Nov 16 17:44:07 UTC 2022


On Mon, 14 Nov 2022 07:29:25 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> This was motivated by discussions we had in https://github.com/openjdk/jdk/pull/11017.
>> 
>> To aid in analyzing secondary errors during error reporting, it would be useful to see their callstacks for secondary errors. But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting. Also, the hs-err file would be quite verbose. Therefore this feature is optional and limited to debug builds. 
>> 
>> ---
>> 
>> Patch
>> 
>> - adds optional callstack/siginfo printing via debug-only switch `-XX:+ErrorLogSecondaryErrorDetails`.
>> - fixes a bug in secondary error handling where we would use the global scratch buffer recursively (via stringStream); that could lead to confusing output since it is used by the error log stream already. We can print directly to that one instead.
>> - Removed a stray newline from print_native_stack to clean output.
>> - added regression testing for this feature. I removed my name from the test since we don't do this anymore.
>> - added clarifying comments to the test and code
>> - added SAP copyright to the regression test (we introduced it years ago for JDK-8065895)
>> 
>> Output looks like this:
>> 
>> 
>> $ java ... -XX:+ErrorLogSecondaryErrorDetails
>> 
>> 
>> will produce, for secondary errors, siginfo and call stack.
>> 
>> 
>> [error occurred during error reporting (test secondary crash 1), id 0xb, SIGSEGV (0xb) at pc=0x00007fddfe8a0a61]
>> [siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000]
>> [stack: Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
>> V  [libjvm.so+0x1ceea61]  VMError::controlled_crash(int)+0x241  (vmError.cpp:1946)
>> V  [libjvm.so+0x1cf413f]  VMError::report(outputStream*, bool)+0x46bf  (vmError.cpp:564)
>> V  [libjvm.so+0x1cf516b]  VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x19b  (vmError.cpp:1709)
>> V  [libjvm.so+0x1cf5e8f]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*, char const*, ...)+0x8f  (vmError.cpp:1467)
>> V  [libjvm.so+0x1cf5ec2]  VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x22  (vmError.cpp:1473)
>> V  [libjvm.so+0x1a549e7]  JVM_handle_linux_signal+0x1f7  (signals_posix.cpp:656)
>> C  [libc.so.6+0x43090]
>> V  [libjvm.so+0x11d6965]  JNI_CreateJavaVM+0x5b5  (jni.cpp:3662)
>> C  [libjli.so+0x4013]  JavaMain+0x93  (java.c:1457)
>> C  [libjli.so+0x800d]  ThreadJavaMain+0xd  (java_md.c:650)
>> ]
>
> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Merge branch 'JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors' of github.com:tstuefe/jdk into JDK-8296907-VMError-add-optional-callstacks-siginfo-for-secondary-errors
>  - Feedback David

Always nice with more, simple and unobtrusive development tooling. 
Just have a couple of comments/questions?

src/hotspot/share/utilities/vmError.cpp line 1635:

> 1633:         // Any information (signal, context, siginfo etc) printed here should use the function
> 1634:         // arguments, not the information stored in *this, since those describe the primary crash.
> 1635:         char tmp[256]; // cannot use global scratch buffer

Is there any problems with making this static? Given that we care about the stack depth when we have repeated crashes.

src/hotspot/share/utilities/vmError.cpp line 1641:

> 1639:                    _current_step_info, id);
> 1640:         if (os::exception_name(id, tmp, sizeof(tmp))) {
> 1641:           st->print(", %s (0x%x) at pc=" PTR_FORMAT, tmp, id, p2i(pc));

Not really relevant for this PR. But do not like the inconsistency that the id is printed as hex here, decimal in os::print_siginfo for posix, and hex in os::print_siginfo for windows.

src/hotspot/share/utilities/vmError.cpp line 1653:

> 1651:         st->print_cr("]");
> 1652: #ifdef ASSERT
> 1653:         if (ErrorLogSecondaryErrorDetails) {

> But printing callstacks during error reporting is unsafe - if we get a second crash or assert, it will cause infinite recursion and interrupt error reporting.

Should there be logic here that at least guarantee progress?
```c++
if (ErrorLogSecondaryErrorDetails && !_some_bool) {
  _some_bool = true;
  [...]
}
_some_bool = false;

-------------

PR: https://git.openjdk.org/jdk/pull/11118


More information about the hotspot-dev mailing list