RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v8]
Xin Liu
xliu at openjdk.java.net
Wed Oct 13 06:56:00 UTC 2021
On Tue, 12 Oct 2021 07:02:12 GMT, Xin Liu <xliu at openjdk.org> wrote:
>> This patch allows the custom commands of OnError to attach to HotSpot itself.
>> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd).
>> This prevents cmds which require safepoint synchronization from deadlock.
>> eg. OnError='jcmd %p Thread.print'.
>>
>> Without this patch, we will encounter a deadlock at safepoint synchronization.
>> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`.
>>
>>
>> Aborting due to java.lang.OutOfMemoryError: Java heap space
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # Internal Error (debug.cpp:364), pid=94632, tid=94633
>> # fatal error: OutOfMemory encountered: Java heap space
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> # Executing /bin/sh -c "jcmd 94632 Thread.print" ...
>> 94632:
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
>> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000]
>> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>
> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
>
> Change to VM unconditionally as long as current thread is JavaThread.
>
> Hoist VMErrorForceNative out of While.
Hi, Reviewers,
I run benchmark renaissance and strike SIGSEGV to any java threads randomly.
$java -XX:+SafepointTimeout -XX:OnError='jcmd %p Thread.print' -jar ./renaissance-gpl-0.13.0.jar log-regression &
$kill -11 11131 (not pid=11130, but its first java thread "main"#1, or some other java threads)
It seem that okay for both `_thread_in_Java` and `_thread_in_VM` in most cases. However, it's not safe when the java thread was in `_thread_in_blocked`. For instance, it was in java.lang.Runtime.gc()V+0 java.base` . It is easy to get stuck when VM state was at safepoint.
# -XX:OnError="jcmd %p Thread.print"
original thread state = _thread_blocked
# Executing /bin/sh -c "jcmd 37330 Thread.print" ...
37330:
I just realize this is a reckless hack. we can't guarantee to avoid deadlock. We shouldn't go this way. How about we drop this?
-------------
PR: https://git.openjdk.java.net/jdk/pull/5590
More information about the hotspot-dev
mailing list