[crac] RFR: Linux file system watcher support

Radim Vansa duke at openjdk.org
Mon May 15 07:10:16 UTC 2023


On Sun, 14 May 2023 10:00:26 GMT, joeylee <duke at openjdk.org> wrote:

> inotify monitors changes on filesystem, this support automatic restore for LinuxFileWatcher.
> 
> FileWatcherAfterRestoreTest verifies watcher service works after restore.
> FileWatcherTest verifies automatic closing inotiify fd
> 
> The watcher keys are still managed by user, so exception will be thrown if no watcher keys are leaked, as in FileWatcherWithOpenKeysTest

Thanks for the contribution!

I wonder if you have an example of code where it's useful to automatically suspend the service but leave `WatchKey` management up to the user. To me this would be either fully transparent or fully managed, breaking the cycle that polls the events and closing the service as well.

src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 176:

> 174:         private final long address;
> 175:         private volatile CheckpointRestoreState checkpointState = CheckpointRestoreState.NORMAL_OPERATION;
> 176:         private final Object checkpointLock = new Object();

Do we need another object just for locking, or would synchronizing on the Poller instance be sufficient?

src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 185:

> 183:             this.address = unsafe.allocateMemory(BUFFER_SIZE);
> 184:             this.socketpair = new int[2];
> 185:             initFDs();

When `initFD` throws an exception the native memory would stay allocated (you've flipped the order of allocation and FD initialization).

src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 368:

> 366:                     try {
> 367:                         do {
> 368:                             nReady = poll(ifd, socketpair[0]);

When the `poll` is followed by a C/R we call it second time, ignoring the old value. Will this forget any events?
Obviously all the events happening when the application is in snapshot will be lost, but I wonder whether we should queue and replay anything that's already recorded. In the future (not necessarily in this PR) it would be nice to detect if anything happened when the application was in snapshot and generate events to keep its view up to date.

src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 501:

> 499:             synchronized (checkpointLock) {
> 500:                 checkpointState = CheckpointRestoreState.CHECKPOINT_TRANSITION;
> 501:                 write(socketpair[1], address, 1);

Any reason to do this directly rather than calling `wakeup()`? It seems that there's some handling for translating errno into messages...

src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 509:

> 507:                 }
> 508:                 if (checkpointState == CheckpointRestoreState.CHECKPOINT_ERROR) {
> 509:                     throw new IllegalSelectorException();

This is not the right exception type.

src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 519:

> 517:                 return;
> 518:             }
> 519:             synchronized (checkpointLock) {

I wonder, if the `initFD` throws during restore, shouldn't we throw here as well? I know you were inspired in EPollSelector, but let's give it a thought.

test/jdk/jdk/crac/fileDescriptors/FileWatcherAfterRestoreTest.java line 48:

> 46:     @Override
> 47:     public void exec() throws Exception {
> 48:         WatchService watchService = FileSystems.getDefault().newWatchService();

Could you call `close()` on this service at the end of the test to check that it works as it should after C/R?

test/jdk/jdk/crac/fileDescriptors/FileWatcherAfterRestoreTest.java line 54:

> 52:         Path directory = Paths.get(System.getProperty("user.dir"));
> 53:         WatchKey key = directory.register(watchService, StandardWatchEventKinds.ENTRY_CREATE);
> 54:         Thread.sleep(200);

Why the sleep?

test/jdk/jdk/crac/fileDescriptors/FileWatcherTest.java line 36:

> 34:  * @run driver jdk.test.lib.crac.CracTest
> 35:  */
> 36: public class FileWatcherTest implements CracTest {

Does this test anything more than what `FileWatcherAfterRestoreTest` does anyway?

-------------

PR Review: https://git.openjdk.org/crac/pull/72#pullrequestreview-1425849060
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193381783
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193383268
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193392586
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193394358
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193394899
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193400167
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193405625
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193401546
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1193400802


More information about the crac-dev mailing list