RFR: 8279124: VirtualMachineImpl should check signal handler has installed before sending SIGQUIT

David Holmes dholmes at openjdk.java.net
Mon Jan 17 06:56:28 UTC 2022


On Mon, 10 Jan 2022 05:19:26 GMT, Xin Liu <xliu at openjdk.org> wrote:

> There is a handshake protocol between attach and HotSpot. Linux VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as SIGQUIT) and create AttachListener. However, it is possible that attach starts "handshake" before os::initialize_jdk_signal_support() is called. The signal handler which handles SIGQUIT has not been installed. Prior to os::initialize_jdk_signal_support(), universe_init() is called. Its time complexity will be linear to the initial size of heap with 'AlwaysPreTouch'.  It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. EC2 m4.10xlarge). Many tools such jcmd, jstack etc may force initializing HotSpot quit prematurely. 
> 
> This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will be caught by the target process before striking it with SIGQUIT.  It will make HotSpot more robust.  The fields of procfs are well [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10) and have supported since Linux 2.6.30. libattach.so will not the only consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so. 
> 
> 
> Testing 
> 
> Before, this patch, once initialization takes long time, jcmd may quit the java process. 
> 
> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr -XX:ParallelGCThreads=1 &
> [1] 9589
> [0.028s][debug][gc,heap] Minimum heap 68719476736  Initial heap 68719476736  Maximum heap 68719476736
> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work units pre-touching 68719476736B.
> $jcmd 9589 VM.flags
> 9589:
> [1]  + 9589 quit       java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr
> java.io.IOException: No such process
>         at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native Method)
>         at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100)
>         at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>         at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
> 
> With this patch, jcmd will timeout but won't disrupt 15274. 
> 
> $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 &
> [1] 15274
> $ jcmd 15274 VM.flags
> 15274:
> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond within 10500ms or HotSpot VM not loaded
>         at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:105)
>         at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
>         at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
>         at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)

Hi Xin,

A couple of comments below. I'm still thinking about this one ... seems okay but I'm not certain ...

Thanks,
David

src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 119:

> 117:  * Return -1 when it runs into any error.
> 118:  */
> 119: static int check_sigquit_caught(jint pid) {

can't this just be a bool function? Even if int we don't need a ternary return value as only zero is of interest.

src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 164:

> 162: {
> 163:     // Only give up sending SIGQUIT if we see that SigCgt is not set.
> 164:     if (check_sigquit_caught(pid) == 0) return;

Suggestion (assumes bool function):

// Only send the SIGQUIT if we can see that the target JVM is ready to catch it.
if (check_sigquit_caught(pid) && kill((pid_t)pid, SIGQUIT) != 0) {
    JNU_ThrowIOExceptionWithLastError(env, "kill");
}

-------------

PR: https://git.openjdk.java.net/jdk/pull/7003


More information about the serviceability-dev mailing list