RFR: 8273608: Deadlock when jcmd of OnError attaches to itself

Thu Sep 23 05:54:54 UTC 2021

On Wed, 22 Sep 2021 19:26:28 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

>> This patch allows the custom commands of OnError to attach to HotSpot itself. 
>> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). 
>> This prevents cmds which require safepoint synchronization from deadlock.
>> eg. OnError='jcmd %p Thread.print'.
>> 
>> Without this patch, we will encounter a deadlock at safepoint synchronization. 
>> `"main" #1`  is the very thread which executes `os::fork_and_exec(cmd)`.  
>> 
>> 
>> Aborting due to java.lang.OutOfMemoryError: Java heap space
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (debug.cpp:364), pid=94632, tid=94633
>> #  fatal error: OutOfMemory encountered: Java heap space
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> #   Executing /bin/sh -c "jcmd 94632 Thread.print" ...
>> 94632:
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
>> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable  [0x00007f01b7a08000]
>> [10.616s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>
> src/hotspot/share/utilities/vmError.cpp line 1341:
> 
>> 1339:   ~VMErrorThreadToNativeFromVM() {
>> 1340:     if (_thread != nullptr) {
>> 1341:       ThreadStateTransition::transition_from_native(_thread, _thread_in_vm);
> 
> Just some thought on this part. Ideally we should avoid calling process_if_requested_with_exit_check() since attempting to process handshakes/stackwatermarks at this point might lead to all sorts of other issues. An alternative could be to just set the original state back and continue. But maybe we don't care about this because we are almost done with the error reporting and the OnError commands were already executed. That last part would argue to move the wrapper before the while loop.

I agree with @pchilano (if I understand him correctly) in that I think an RAII object here is not even needed. There is no need to re-instate the prior thread state.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5590