RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v6]

David Holmes dholmes at openjdk.java.net
Mon Oct 11 00:27:09 UTC 2021


On Thu, 7 Oct 2021 09:15:45 GMT, Xin Liu <xliu at openjdk.org> wrote:

>> This patch allows the custom commands of OnError to attach to HotSpot itself. 
>> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). 
>> This prevents cmds which require safepoint synchronization from deadlock.
>> eg. OnError='jcmd %p Thread.print'.
>> 
>> Without this patch, we will encounter a deadlock at safepoint synchronization. 
>> `"main" #1`  is the very thread which executes `os::fork_and_exec(cmd)`.  
>> 
>> 
>> Aborting due to java.lang.OutOfMemoryError: Java heap space
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  Internal Error (debug.cpp:364), pid=94632, tid=94633
>> #  fatal error: OutOfMemory encountered: Java heap space
>> #
>> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log
>> #
>> # -XX:OnError="jcmd %p Thread.print"
>> #   Executing /bin/sh -c "jcmd 94632 Thread.print" ...
>> 94632:
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected:
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint:
>> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable  [0x00007f01b7a08000]
>> [10.616s][warning][safepoint]    java.lang.Thread.State: RUNNABLE
>> [10.616s][warning][safepoint]
>> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list)
>
> Xin Liu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Don't unlock the holding mutexes for the thread State transition.

I wrote earlier:

> We have to decide whether it is safest/best to simply not transition to native if holding locks, or whether it is okay to proceed knowing that there are risks of secondary crashes, or hangs, if we do.

so it seems you are now taking the second option. Okay fair enough - it is a simpler approach.

But I don't see the point in using `JavaThreadInVMAndNative` here - nor "promoting" it to look like some kind of general utility function. AFAICS this boils down to just doing an explicit and simple:

`set_thread_state(_thread_in_native);`

which can be done before the onError loop. And you can save the state first and restore after the loop. Basically just three lines of code.

Thanks,
David

src/hotspot/share/utilities/vmError.cpp line 1639:

> 1637:       jtivm.transition_to_native();
> 1638:       if (os::fork_and_exec(cmd) < 0) {
> 1639:         out.print_cr("os::fork_and_exec failed: %s (%s=%d)",

I think you may need to restore the state if fork_and_exec fails - or perhaps switch to using print_raw_cr?

-------------

Changes requested by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5590


More information about the hotspot-dev mailing list