RFR: 8273608: Deadlock when jcmd of OnError attaches to itself
David Holmes
dholmes at openjdk.java.net
Tue Sep 21 05:39:57 UTC 2021
On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu <xliu at openjdk.org> wrote:
> This patch allows the custom commands of OnError to attach to HotSpot itself.
> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd).
> This prevents cmds which require safepoint synchronization from deadlock.
> eg. OnError='jcmd %p Thread.print'.
>
> Without this patch, we will encounter a deadlock at safepoint synchronization.
> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`.
>
>
> Aborting due to java.lang.OutOfMemoryError: Java heap space
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error (debug.cpp:364), pid=94632, tid=94633
> # fatal error: OutOfMemory encountered: Java heap space
> #
> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
> #
> # -XX:OnError="jcmd %p Thread.print"
> # Executing /bin/sh -c "jcmd 94632 Thread.print" ...
> 94632:
> [10.616s][warning][safepoint]
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000]
> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE
> [10.616s][warning][safepoint]
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
Hi Xin,
The basic idea is reasonable but I think some of the details need changing.
Thanks,
David
src/hotspot/share/utilities/vmError.cpp line 1325:
> 1323:
> 1324: public:
> 1325: VMErrorThreadToNativeFromVM(Thread* t) : _thread(nullptr) {
If `t` must be the current thread then it should not be passed in as that gives the impression you can pass any thread.
src/hotspot/share/utilities/vmError.cpp line 1333:
> 1331: }
> 1332:
> 1333: if (_thread != nullptr) {
No need to terminate the first if block.
src/hotspot/share/utilities/vmError.cpp line 1334:
> 1332:
> 1333: if (_thread != nullptr) {
> 1334: assert(!_thread->owns_locks(), "must release all locks when leaving VM");
This can't be an assertion as the thread is not knowingly leaving the VM without first releasing locks. If it does hold locks then that could lead to additional problems and strange errors when running the external command. We have to decide whether it is safest/best to simply not transition to native if holding locks, or whether it is okay to proceed knowing that there are risks of secondary crashes, or hangs, if we do.
src/hotspot/share/utilities/vmError.cpp line 1343:
> 1341: ThreadStateTransition::transition_from_native(_thread, _thread_in_vm);
> 1342: assert(!_thread->is_pending_jni_exception_check(), "Pending JNI Exception Check");
> 1343: // We don't need to clear_walkable because it will happen automagically when we return to java
We are not executing JNI code when the do the fork_and_exec so this does not seem necessary.
The comment about `clear_walkable` also doesn't make sense here - we are crashing so we are not returning to Java at all.
src/hotspot/share/utilities/vmError.cpp line 1663:
> 1661: out.print_raw_cr("\" ...");
> 1662:
> 1663: VMErrorThreadToNativeFromVM ttnfv(JavaThread::current_or_null());
Surely the current thread need not be a JavaThread here.
-------------
Changes requested by dholmes (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/5590
More information about the hotspot-dev
mailing list