[crac] RFR: Handle open file descriptors with configurable policies [v3]

Jan Kratochvil jkratochvil at openjdk.org
Fri Jun 2 11:57:44 UTC 2023


On Fri, 12 May 2023 13:29:08 GMT, Radim Vansa <duke at openjdk.org> wrote:

>> When the application does not close some file descriptors through Resources we can use `jdk.crac.fd-policy.checkpoint` and `jdk.crac.fd-policy.restore` to configure the behaviour.
>> 
>> These properties can specify a list of File.pathSeparator-separated key=value pairs, where the key can be one of:
>> 
>> * numeric file descriptor
>> * path using 'glob' pattern matching (see FileSystem.getPathMatcher() for details)
>> * keywords FIFO and SOCKET that match pipes and sockets
>> 
>> The value should match one of possible values from OpenFDPolicies.BeforeCheckpoint and OpenFDPolicies.AfterRestore
>
> Radim Vansa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Effectively revert previous commit: Initialize logger in <clinit>

CRaC now started complaining to me when I try to snapshot: https://github.com/CRaC/example-jetty

Suppressed: jdk.crac.impl.CheckpointOpenFileException: FileDescriptor 12 left open: tcp6 local /[0:0:0:0:0:0:0:0]:8080 remote not bound 


1. I haven't found how to specify a property. I have tried below the normal `-D` command line parameter, is that correct?
2. But specifying the command line parameter has no effect. It still no longer works:

$ (cd ../example-jetty/;. JAVA_HOME-crac-git-slowdebug;(sleep 4;sudo rm -f /tmp/coredump.*;sudo jcmd target/example-jetty-1.0-SNAPSHOT.jar JDK.checkpoint) & sudo rm -rf cr-host2/;sudo bash -c 'ulimit -c unlimited;. JAVA_HOME-crac-git-slowdebug;$JAVA_HOME/bin/java -XX:CRaCCheckpointTo=cr-host2 -Djdk.crac.collect-fd-stacktraces=true -Djdk.crac.fd-policy.checkpoint=CLOSE -jar target/example-jetty-1.0-SNAPSHOT.jar';echo restore;sudo $JAVA_HOME/bin/java -XX:CRaCRestoreFrom=cr-host2)
2023-06-02 13:43:57.835:INFO::main: Logging initialized @2712ms to org.eclipse.jetty.util.log.StdErrLog
2023-06-02 13:43:58.647:INFO:oejs.Server:main: jetty-9.4.30.v20200611; built: 2020-06-11T12:34:51.929Z; git: 271836e4c1f4612f12b7bb13ef5a92a927634b0d; jvm 17-internal+0-adhoc.azul.crac-git
1519:
2023-06-02 13:43:59.647:INFO:oejs.AbstractConnector:main: Started ServerConnector at 55a1c291{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2023-06-02 13:43:59.735:INFO:oejs.Server:main: Started @4744ms
Jun 02, 2023 1:44:00 PM jdk.internal.crac.LoggerContainer info
INFO: /home/azul/azul/example-jetty/target/dependency/jetty-io-9.4.30.v20200611.jar is recorded as always available on restore
Jun 02, 2023 1:44:00 PM jdk.internal.crac.LoggerContainer info
INFO: /home/azul/azul/example-jetty/target/dependency/jetty-util-9.4.30.v20200611.jar is recorded as always available on restore
Jun 02, 2023 1:44:00 PM jdk.internal.crac.LoggerContainer info
INFO: /home/azul/azul/example-jetty/target/dependency/jetty-http-9.4.30.v20200611.jar is recorded as always available on restore
Jun 02, 2023 1:44:00 PM jdk.internal.crac.LoggerContainer info
INFO: /home/azul/azul/example-jetty/target/dependency/javax.servlet-api-3.1.0.jar is recorded as always available on restore
Jun 02, 2023 1:44:00 PM jdk.internal.crac.LoggerContainer info
INFO: /home/azul/azul/example-jetty/target/dependency/jetty-server-9.4.30.v20200611.jar is recorded as always available on restore
Jun 02, 2023 1:44:00 PM jdk.internal.crac.LoggerContainer info
INFO: /home/azul/azul/example-jetty/target/example-jetty-1.0-SNAPSHOT.jar is recorded as always available on restore
An exception during a checkpoint operation:
jdk.crac.CheckpointException
        at java.base/jdk.crac.Core.checkpointRestore1(Core.java:129)
        at java.base/jdk.crac.Core.checkpointRestore(Core.java:264)
        at java.base/jdk.crac.Core.checkpointRestoreInternal(Core.java:280)
        Suppressed: jdk.crac.impl.CheckpointOpenFileException: FileDescriptor 10 left open: tcp6 local /[0:0:0:0:0:0:0:0]:8080 remote not bound
                at java.base/java.io.FileDescriptor.beforeCheckpoint(FileDescriptor.java:391)
                at java.base/java.io.FileDescriptor$Resource.beforeCheckpoint(FileDescriptor.java:84)
                at java.base/jdk.crac.impl.PriorityContext$SubContext.invokeBeforeCheckpoint(PriorityContext.java:107)
                at java.base/jdk.crac.impl.OrderedContext.runBeforeCheckpoint(OrderedContext.java:70)
                at java.base/jdk.crac.impl.AbstractContextImpl.beforeCheckpoint(AbstractContextImpl.java:81)
                at java.base/jdk.crac.impl.AbstractContextImpl.invokeBeforeCheckpoint(AbstractContextImpl.java:41)
                at java.base/jdk.crac.impl.PriorityContext.runBeforeCheckpoint(PriorityContext.java:70)
                at java.base/jdk.crac.impl.AbstractContextImpl.beforeCheckpoint(AbstractContextImpl.java:81)
                at java.base/jdk.internal.crac.JDKContext.beforeCheckpoint(JDKContext.java:97)
                at java.base/jdk.crac.impl.AbstractContextImpl.invokeBeforeCheckpoint(AbstractContextImpl.java:41)
                at java.base/jdk.crac.impl.OrderedContext.runBeforeCheckpoint(OrderedContext.java:70)
                at java.base/jdk.crac.impl.AbstractContextImpl.beforeCheckpoint(AbstractContextImpl.java:81)
                at java.base/jdk.crac.Core.checkpointRestore1(Core.java:127)
                ... 2 more
        Caused by: java.lang.Exception: This file descriptor was created by main at epoch:1685706239263 here
        at java.base/java.io.FileDescriptor$Resource.<init>(FileDescriptor.java:75)
                at java.base/java.io.FileDescriptor.<init>(FileDescriptor.java:104)
                at java.base/sun.nio.ch.IOUtil.newFD(IOUtil.java:544)
                at java.base/sun.nio.ch.Net.serverSocket(Net.java:534)
                at java.base/sun.nio.ch.ServerSocketChannelImpl.<init>(ServerSocketChannelImpl.java:128)
                at java.base/sun.nio.ch.ServerSocketChannelImpl.<init>(ServerSocketChannelImpl.java:109)
                at java.base/sun.nio.ch.SelectorProviderImpl.openServerSocketChannel(SelectorProviderImpl.java:72)
                at java.base/java.nio.channels.ServerSocketChannel.open(ServerSocketChannel.java:145)
                at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:339)
                at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:310)
                at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
                at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:234)
                at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
                at org.eclipse.jetty.server.Server.doStart(Server.java:386)
                at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
                at com.example.ServerManager.<init>(App.java:27)
                at com.example.App.main(App.java:73)
restore
open cppath: No such file or directory

I am curious it says `Suppressed:` there but the snapshot still crashes due to `An exception during a checkpoint operation:`.
3. The setting `-Djdk.crac.fd-policy.checkpoint=CLOSE` would be wrong anyway. I do not want to silently close some ongoing TCP client-server communication. But in this my case a socket in LISTEN state could be safely automatically restored without any coordination from the Java application: `tcp6 local /[0:0:0:0:0:0:0:0]:8080 remote not bound`

-------------

PR Comment: https://git.openjdk.org/crac/pull/69#issuecomment-1573614068


More information about the crac-dev mailing list