[crac] RFR: Linux file system watcher support [v2]

joeylee duke at openjdk.org
Wed May 17 12:36:15 UTC 2023


On Mon, 15 May 2023 06:48:08 GMT, Radim Vansa <duke at openjdk.org> wrote:

>> joeylee has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>> 
>>   Linux file system watcher support
>
> src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 367:
> 
>> 365:                     // wait for close or inotify event
>> 366:                     try {
>> 367:                         do {
> 
> When the `poll` is followed by a C/R we call it second time, ignoring the old value. Will this forget any events?
> Obviously all the events happening when the application is in snapshot will be lost, but I wonder whether we should queue and replay anything that's already recorded. In the future (not necessarily in this PR) it would be nice to detect if anything happened when the application was in snapshot and generate events to keep its view up to date.

I think it's ok to ignore the previous poll result, in this case:
1. wakeup() called by other thread, and polling thread received the notify
2. the process begin checkpoint and block at processCheckpointRestore, forget the previous notify
3. the process restored, and proceed with no notify.

my previous design was to auto save and reopen the inotify and socketpair, and let user control the watch keys, the only place where notify is used is during request, after restore all keys should be re-registered by user, so I thought it's ok to drop the notify.

`I wonder if you have an example of code where it's useful to automatically suspend the service but leave WatchKey management up to the user. `
I thought the watch key path is dependent on the running environment, so during restore I couldn't take over, because at restore step the path might not exist or change, so I am leaving watch keys management for users.

> src/java.base/linux/classes/sun/nio/fs/LinuxWatchService.java line 519:
> 
>> 517:                 return;
>> 518:             }
>> 519:             synchronized (checkpointLock) {
> 
> I wonder, if the `initFD` throws during restore, shouldn't we throw here as well? I know you were inspired in EPollSelector, but let's give it a thought.

I added a check at after restore to check if initFD succeed.

                if (checkpointState != CheckpointRestoreState.NORMAL_OPERATION) {
                    throw new CheckpointException("LinuxWatchService restore exception");
                }

> test/jdk/jdk/crac/fileDescriptors/FileWatcherAfterRestoreTest.java line 54:
> 
>> 52:         Path directory = Paths.get(System.getProperty("user.dir"));
>> 53:         WatchKey key = directory.register(watchService, StandardWatchEventKinds.ENTRY_CREATE);
>> 54:         Thread.sleep(200);
> 
> Why the sleep?

removed useless sleep

> test/jdk/jdk/crac/fileDescriptors/FileWatcherTest.java line 36:
> 
>> 34:  * @run driver jdk.test.lib.crac.CracTest
>> 35:  */
>> 36: public class FileWatcherTest implements CracTest {
> 
> Does this test anything more than what `FileWatcherAfterRestoreTest` does anyway?

removed duplicate test

-------------

PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1196437208
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1196440583
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1196441418
PR Review Comment: https://git.openjdk.org/crac/pull/72#discussion_r1196441121


More information about the crac-dev mailing list