RFR: 8273608: Deadlock when jcmd of OnError attaches to itself
Thomas Stuefe
stuefe at openjdk.java.net
Thu Sep 23 05:54:53 UTC 2021
On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu <xliu at openjdk.org> wrote:
> This patch allows the custom commands of OnError to attach to HotSpot itself.
> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd).
> This prevents cmds which require safepoint synchronization from deadlock.
> eg. OnError='jcmd %p Thread.print'.
>
> Without this patch, we will encounter a deadlock at safepoint synchronization.
> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`.
>
>
> Aborting due to java.lang.OutOfMemoryError: Java heap space
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error (debug.cpp:364), pid=94632, tid=94633
> # fatal error: OutOfMemory encountered: Java heap space
> #
> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
> #
> # -XX:OnError="jcmd %p Thread.print"
> # Executing /bin/sh -c "jcmd 94632 Thread.print" ...
> 94632:
> [10.616s][warning][safepoint]
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000]
> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE
> [10.616s][warning][safepoint]
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
Hi Xin,
Comments inline. As I said, I think this is useful (and probably should be backported at least to 17).
Can you please provide a regression test for this? This is not just for aesthetics, these switches are actually used a lot more than one thinks, and knowing that they work and did not bitrot would be reassuring :)
You could maybe expand runtime/ErrorHandling/TestOnError.
It would be nice to have tests for: `jcmd <my pid> XXX` (e.g. call `VM.info` and then scan for the pid).
TestOnError uses `-XX:ErrorHandlerTest`, which does a voluntary crash in CreateJavaVM, after VM initialization. It would be nice to have a second mode for TestOnError to test OnError in OOM situations. Then we have covered both variants (real crashes and OOMs). The latter could also be tested in release VMs (`-XX:ErrorHandlerTest` is debug only).
Finally (and optionally, depending on how far you want to go) we should test OnError with a sequence of commands too.
Thanks, Thomas
src/hotspot/share/utilities/vmError.cpp line 1330:
> 1328: _thread = JavaThread::cast(t);
> 1329: assert(_thread == Thread::current(), "must be current thread");
> 1330: assert(_thread->thread_state() == _thread_in_vm, "must be in VM");
Don't assert here. We are in the middle of error handling. This would just lead to recursive errors and very probably to "too many errors, abort".
-------------
Changes requested by stuefe (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/5590
More information about the hotspot-dev
mailing list