RFR: 8273608: Deadlock when jcmd of OnError attaches to itself

Xin Liu xliu at openjdk.java.net
Thu Sep 23 07:28:56 UTC 2021


On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu <xliu at openjdk.org> wrote:

> This patch allows the custom commands of OnError to attach to HotSpot itself. 
> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). 
> This prevents cmds which require safepoint synchronization from deadlock.
> eg. OnError='jcmd %p Thread.print'.
> 
> Without this patch, we will encounter a deadlock at safepoint synchronization. 
> `"main" #1`  is the very thread which executes `os::fork_and_exec(cmd)`.  
> 
> 
> Aborting due to java.lang.OutOfMemoryError: Java heap space
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (debug.cpp:364), pid=94632, tid=94633
> #  fatal error: OutOfMemory encountered: Java heap space
> #
> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
> #
> # -XX:OnError="jcmd %p Thread.print"
> #   Executing /bin/sh -c "jcmd 94632 Thread.print" ...
> 94632:
> [10.616s][warning][safepoint]
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable  [0x00007f01b7a08000]
> [10.616s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
> [10.616s][warning][safepoint]
> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)

hi, Reviewers, 
Thanks for the comments. 
> Can you please provide a regression test for this? 

yes, it should have a test. I am working on this. 

> We can always walk _mutex_array like we do in print_owned_locks_on_error(). Note that locks created outside mutex_init() will not be visible though. Maybe we should fix that.

Thank Patricio for this idea. I jog down a quick scan. I would like to have Thread::owns_locks() in release build, but as you said, it's different from `Thread::owns_locks() { return owned_locks() != NULL; }`.   It only covers mutex_init(). I think it should be a standalone issue. 

Could you also take a look at `jfrEmergencyDump::on_vm_shutdown` in jfrEmergencyDump.cpp ? 
I think it's very similar logic. Even if we don't use RAII, I think it's still possible to have a reusable procedure. 


// PreConds: 1) current != null 2) current->is_Java_Thread 3. current state is VM 4) successfully unlock all owning locks. 
// return: true if succeed. 
bool transition_current_to_native()

-------------

PR: https://git.openjdk.java.net/jdk/pull/5590


More information about the hotspot-dev mailing list