RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v7]

David Holmes dholmes at openjdk.java.net
Tue Oct 12 05:35:51 UTC 2021


On Tue, 12 Oct 2021 04:35:17 GMT, Xin Liu <xliu at openjdk.org> wrote:

>> This patch allows the custom commands of OnError to attach to HotSpot itself. 
>> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). 
>> This prevents cmds which require safepoint synchronization from deadlock.
>> eg. OnError='jcmd %p Thread.print'.
>> 
>> Without this patch, we will encounter a deadlock at safepoint synchronization. 
>> `"main" #1`  is the very thread which executes `os::fork_and_exec(cmd)`.  
>> 
>> 
>> Aborting due to java.lang.OutOfMemoryError: Java heap space
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (debug.cpp:364), pid=94632, tid=94633
>> #  fatal error: OutOfMemory encountered: Java heap space
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> #   Executing /bin/sh -c "jcmd 94632 Thread.print" ...
>> 94632:
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
>> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable  [0x00007f01b7a08000]
>> [10.616s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>
> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revert JavaThreadInVMAndNative change.
>   
>   Only change the state to Native if current thread is Java Thread and it's in VM.
>   Restore thread state after fork_and_exec() if it's changed.

src/hotspot/share/runtime/interfaceSupport.inline.hpp line 215:

> 213: };
> 214: 
> 215: class JavaThreadInVMAndNative : public StackObj {

Shouldn't this have been removed again?

src/hotspot/share/utilities/vmError.cpp line 45:

> 43: #include "runtime/frame.inline.hpp"
> 44: #include "runtime/init.hpp"
> 45: #include "runtime/interfaceSupport.inline.hpp"

Why is this needed now?

src/hotspot/share/utilities/vmError.cpp line 1348:

> 1346:  public:
> 1347:   VMErrorForceInNative(Thread* t): _jt(t != NULL && t->is_Java_thread() ? JavaThread::cast(t) : NULL) {
> 1348:     if (_jt != NULL && _jt->thread_state() == _thread_in_vm) {

What if it is _thread_in_Java? Isn't that possible.

I think you can unconditionally change the state to `_thread_in_native` and then restore it.

src/hotspot/share/utilities/vmError.cpp line 1659:

> 1657:       VMErrorForceInNative fn(Thread::current_or_null());
> 1658:       if (os::fork_and_exec(cmd) < 0) {
> 1659:         out.print_cr("os::fork_and_exec failed: %s (%s=%d)",

I'm still concerned about using out.print_cr if you are _thread_in_native - is it safe? If it is, and if `print_raw` is also safe to execute when in native, then you could just apply the VMErrorForceInNative before the while loop.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5590


More information about the hotspot-dev mailing list