Zombie FileHandler locks can exhaust all available log file locks.
Daniel, Jim, In JDK8 the FileHandler file locking was changed to use FileChannel.open with CREATE_NEW. If the file exists (locked or not) the FileHandler will rotate due to safety concerns about writing to the same log file. The FileHandler has an upper bound of 100 as the number of file locks that can be attempted to be acquired. Given the right pattern, enough time, and enough failures it seems at it is possible for a program to end up in a state where the FileHandler is surrounded by zombie lock files refuses log or perform a clean up action. (A.K.A Farmer Rick Grimes). This means that the lck files have to manually be deleted or the FileHandler will just fail with an IOException every time it is created. A simple test to emulate crashing is to run (JDK7 vs. JDK8) the following code twice without deleting the log and lck files: public static void main(String[] args) throws Exception { System.out.println(System.getProperty("java.runtime.version")); ReferenceQueue<FileHandler> q = new ReferenceQueue<>(); for (int i=0; i<100; i++) { WeakReference<FileHandler> h = new WeakReference<>( new FileHandler("./test_%u.log", 10000, 2, true), q); while (q.poll() != h) { System.runFinalization(); System.gc(); System.runFinalization(); Thread.yield(); } } } I understand that if you can't trust that the file locking always works then there isn't much that can be done. Leaving the number of locks as unbounded isn't really an option either. Seems like there should be a way to identify zombie lock files (ATOMIC_MOVE) and delete them. Any thoughts on this? The source discussion on this can be found at http://stackoverflow.com/questions/24321098/is-java-util-logging-filehandler... Regards, Jason
Hi Jason, Looking at the diff for 6244047 - I see that, as you pointed out, the unwanted behavior described comes from the fact that the new code is using CREATE_NEW - which prevents the 'zombie lock files' from being reused. I am not an expert in file locks - but I wonder whether we could revert to using CREATE instead: wouldn't tryLock() later tell us if the file is used by another entity? Another possibility would be to combine CREATE_NEW and DELETE_ON_CLOSE, which according to StandardOpenOptions will attempt to delete the file on JVM termination if close() hasn't been called. This probably wouldn't help in case on VM crash, but it would help in the case demonstrated by your test below. I have however some reluctance because I see that we call FileChannel.close() in the case were the lock can't be obtained, and I'm not sure what that would do... Also StandardOpenOptions has some warning about using DELETE_ON_CLOSE for files that can be opened concurrently by other entities - so I'm not sure it would be appropriate. The last possibility I see would be to change the lock HashMap to take instances of a subclass of WeakReference<FileHandler> as values (instead of String), and add some code that attempts to close & remove the lock file when the FileHandler is no longer referenced. Again - this will probably not help in the case of crash, and also adds the question on when the weak reference queue should be polled to ensure that the no longer referenced FileChannel are closed in a timely fashion. All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-) On another track - we could also make MAX configurable - but that would just be shifting the issue around - wouldn't it? best regards, -- daniel On 6/20/14 10:54 PM, Jason Mehrens wrote:
Daniel, Jim,
In JDK8 the FileHandler file locking was changed to use FileChannel.open with CREATE_NEW. If the file exists (locked or not) the FileHandler will rotate due to safety concerns about writing to the same log file. The FileHandler has an upper bound of 100 as the number of file locks that can be attempted to be acquired. Given the right pattern, enough time, and enough failures it seems at it is possible for a program to end up in a state where the FileHandler is surrounded by zombie lock files refuses log or perform a clean up action. (A.K.A Farmer Rick Grimes). This means that the lck files have to manually be deleted or the FileHandler will just fail with an IOException every time it is created.
A simple test to emulate crashing is to run (JDK7 vs. JDK8) the following code twice without deleting the log and lck files:
public static void main(String[] args) throws Exception { System.out.println(System.getProperty("java.runtime.version")); ReferenceQueue<FileHandler> q = new ReferenceQueue<>();
for (int i=0; i<100; i++) { WeakReference<FileHandler> h = new WeakReference<>( new FileHandler("./test_%u.log", 10000, 2, true), q); while (q.poll() != h) { System.runFinalization(); System.gc(); System.runFinalization(); Thread.yield(); } }
}
I understand that if you can't trust that the file locking always works then there isn't much that can be done. Leaving the number of locks as unbounded isn't really an option either. Seems like there should be a way to identify zombie lock files (ATOMIC_MOVE) and delete them. Any thoughts on this?
The source discussion on this can be found at http://stackoverflow.com/questions/24321098/is-java-util-logging-filehandler...
Regards,
Jason
Daniel, My understanding is that changing CREATE_NEW to use CREATE would make it work like does in JDK7. Closing the lock files when the FileHandler is unreferenced I is probably the fix for JDK-6774110: lock file is not deleted when child logger is used. If we could prove that system FileLock is mandatory we could use the JDK7 behavior otherwise if they are advisory then use the JDK8 behavior. If we knew the boot time of the JVM host O/S we could delete all lock files older than the boot time + some constant. Jason ----------------------------------------
Date: Mon, 23 Jun 2014 11:48:18 +0200 From: daniel.fuchs@oracle.com To: jason_mehrens@hotmail.com; core-libs-dev@openjdk.java.net Subject: Re: Zombie FileHandler locks can exhaust all available log file locks.
Hi Jason,
Looking at the diff for 6244047 - I see that, as you pointed out, the unwanted behavior described comes from the fact that the new code is using CREATE_NEW - which prevents the 'zombie lock files' from being reused.
I am not an expert in file locks - but I wonder whether we could revert to using CREATE instead: wouldn't tryLock() later tell us if the file is used by another entity?
Another possibility would be to combine CREATE_NEW and DELETE_ON_CLOSE, which according to StandardOpenOptions will attempt to delete the file on JVM termination if close() hasn't been called. This probably wouldn't help in case on VM crash, but it would help in the case demonstrated by your test below. I have however some reluctance because I see that we call FileChannel.close() in the case were the lock can't be obtained, and I'm not sure what that would do... Also StandardOpenOptions has some warning about using DELETE_ON_CLOSE for files that can be opened concurrently by other entities - so I'm not sure it would be appropriate.
The last possibility I see would be to change the lock HashMap to take instances of a subclass of WeakReference<FileHandler> as values (instead of String), and add some code that attempts to close & remove the lock file when the FileHandler is no longer referenced. Again - this will probably not help in the case of crash, and also adds the question on when the weak reference queue should be polled to ensure that the no longer referenced FileChannel are closed in a timely fashion.
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
On another track - we could also make MAX configurable - but that would just be shifting the issue around - wouldn't it?
best regards,
-- daniel
On 6/20/14 10:54 PM, Jason Mehrens wrote:
Daniel, Jim,
In JDK8 the FileHandler file locking was changed to use FileChannel.open with CREATE_NEW. If the file exists (locked or not) the FileHandler will rotate due to safety concerns about writing to the same log file. The FileHandler has an upper bound of 100 as the number of file locks that can be attempted to be acquired. Given the right pattern, enough time, and enough failures it seems at it is possible for a program to end up in a state where the FileHandler is surrounded by zombie lock files refuses log or perform a clean up action. (A.K.A Farmer Rick Grimes). This means that the lck files have to manually be deleted or the FileHandler will just fail with an IOException every time it is created.
A simple test to emulate crashing is to run (JDK7 vs. JDK8) the following code twice without deleting the log and lck files:
public static void main(String[] args) throws Exception { System.out.println(System.getProperty("java.runtime.version")); ReferenceQueue<FileHandler> q = new ReferenceQueue<>();
for (int i=0; i<100; i++) { WeakReference<FileHandler> h = new WeakReference<>( new FileHandler("./test_%u.log", 10000, 2, true), q); while (q.poll() != h) { System.runFinalization(); System.gc(); System.runFinalization(); Thread.yield(); } }
}
I understand that if you can't trust that the file locking always works then there isn't much that can be done. Leaving the number of locks as unbounded isn't really an option either. Seems like there should be a way to identify zombie lock files (ATOMIC_MOVE) and delete them. Any thoughts on this?
The source discussion on this can be found at http://stackoverflow.com/questions/24321098/is-java-util-logging-filehandler...
Regards,
Jason
Hi Jason, On 6/23/14 2:51 PM, Jason Mehrens wrote:
Daniel,
My understanding is that changing CREATE_NEW to use CREATE would make it work like does in JDK7. Closing the lock files when the FileHandler is unreferenced I is probably the fix for JDK-6774110: lock file is not deleted when child logger is used.
I'll see if I can dig more info about the reason for the change. Thanks for pointing at JDK-6774110 - I'll have a look at that as well. -- daniel
If we could prove that system FileLock is mandatory we could use the JDK7 behavior otherwise if they are advisory then use the JDK8 behavior. If we knew the boot time of the JVM host O/S we could delete all lock files older than the boot time + some constant.
Jason
----------------------------------------
Date: Mon, 23 Jun 2014 11:48:18 +0200 From: daniel.fuchs@oracle.com To: jason_mehrens@hotmail.com; core-libs-dev@openjdk.java.net Subject: Re: Zombie FileHandler locks can exhaust all available log file locks.
Hi Jason,
Looking at the diff for 6244047 - I see that, as you pointed out, the unwanted behavior described comes from the fact that the new code is using CREATE_NEW - which prevents the 'zombie lock files' from being reused.
I am not an expert in file locks - but I wonder whether we could revert to using CREATE instead: wouldn't tryLock() later tell us if the file is used by another entity?
Another possibility would be to combine CREATE_NEW and DELETE_ON_CLOSE, which according to StandardOpenOptions will attempt to delete the file on JVM termination if close() hasn't been called. This probably wouldn't help in case on VM crash, but it would help in the case demonstrated by your test below. I have however some reluctance because I see that we call FileChannel.close() in the case were the lock can't be obtained, and I'm not sure what that would do... Also StandardOpenOptions has some warning about using DELETE_ON_CLOSE for files that can be opened concurrently by other entities - so I'm not sure it would be appropriate.
The last possibility I see would be to change the lock HashMap to take instances of a subclass of WeakReference<FileHandler> as values (instead of String), and add some code that attempts to close & remove the lock file when the FileHandler is no longer referenced. Again - this will probably not help in the case of crash, and also adds the question on when the weak reference queue should be polled to ensure that the no longer referenced FileChannel are closed in a timely fashion.
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
On another track - we could also make MAX configurable - but that would just be shifting the issue around - wouldn't it?
best regards,
-- daniel
On 6/20/14 10:54 PM, Jason Mehrens wrote:
Daniel, Jim,
In JDK8 the FileHandler file locking was changed to use FileChannel.open with CREATE_NEW. If the file exists (locked or not) the FileHandler will rotate due to safety concerns about writing to the same log file. The FileHandler has an upper bound of 100 as the number of file locks that can be attempted to be acquired. Given the right pattern, enough time, and enough failures it seems at it is possible for a program to end up in a state where the FileHandler is surrounded by zombie lock files refuses log or perform a clean up action. (A.K.A Farmer Rick Grimes). This means that the lck files have to manually be deleted or the FileHandler will just fail with an IOException every time it is created.
A simple test to emulate crashing is to run (JDK7 vs. JDK8) the following code twice without deleting the log and lck files:
public static void main(String[] args) throws Exception { System.out.println(System.getProperty("java.runtime.version")); ReferenceQueue<FileHandler> q = new ReferenceQueue<>();
for (int i=0; i<100; i++) { WeakReference<FileHandler> h = new WeakReference<>( new FileHandler("./test_%u.log", 10000, 2, true), q); while (q.poll() != h) { System.runFinalization(); System.gc(); System.runFinalization(); Thread.yield(); } }
}
I understand that if you can't trust that the file locking always works then there isn't much that can be done. Leaving the number of locks as unbounded isn't really an option either. Seems like there should be a way to identify zombie lock files (ATOMIC_MOVE) and delete them. Any thoughts on this?
The source discussion on this can be found at http://stackoverflow.com/questions/24321098/is-java-util-logging-filehandler...
Regards,
Jason
Hi, I wonder whether the following patch would solve the issue. In the case where the lock file already exists, we check whether it's writable, and whether its directory is writable, and if so, we open it for 'write+append', otherwise, we continue with the next lock name. I we have manage to open the lock file, then we proceed with trying to lock it, as before. If locking succeeds - we use that file. If locking throws exception - we only fall through if we actually created the file, otherwise we proceed with the next lock name. Would that make sense? It passes the CheckLockLocationTest that was added for 6244047. Note: I also added a deleteOnExit() hook for good measure. -- daniel --- a/src/share/classes/java/util/logging/FileHandler.java +++ b/src/share/classes/java/util/logging/FileHandler.java @@ -25,6 +25,7 @@ package java.util.logging; +import static java.nio.file.StandardOpenOption.APPEND; import static java.nio.file.StandardOpenOption.CREATE_NEW; import static java.nio.file.StandardOpenOption.WRITE; @@ -35,6 +36,8 @@ import java.io.OutputStream; import java.nio.channels.FileChannel; import java.nio.file.FileAlreadyExistsException; +import java.nio.file.Files; +import java.nio.file.Path; import java.nio.file.Paths; import java.security.AccessController; import java.security.PrivilegedAction; @@ -434,24 +437,44 @@ continue; } + final Path lockFilePath = Paths.get(lockFileName); + boolean fileCreated; try { - lockFileChannel = FileChannel.open(Paths.get(lockFileName), + lockFileChannel = FileChannel.open(lockFilePath, CREATE_NEW, WRITE); + fileCreated = true; } catch (FileAlreadyExistsException ix) { + // This may be a zombie file left over by a previous + // execution. Reuse it - but since we didn't create + // it we won't delete it either. + if (Files.isRegularFile(lockFilePath) + && Files.isWritable(lockFilePath) + && Files.isWritable(lockFilePath.getParent())) { + lockFileChannel = FileChannel.open(lockFilePath, + WRITE, APPEND); + fileCreated = false; + } else { // try the next lock file name in the sequence continue; } + } boolean available; try { available = lockFileChannel.tryLock() != null; + if (available) { + // we got the lock - ensure that the lock file + // will be deleted on exit. + lockFilePath.toFile().deleteOnExit(); + } // We got the lock OK. } catch (IOException ix) { // We got an IOException while trying to get the lock. // This normally indicates that locking is not supported // on the target directory. We have to proceed without - // getting a lock. Drop through. - available = true; + // getting a lock. Drop through, but only if we did + // create the file... + available = fileCreated; } if (available) { // We got the lock. Remember it. On 6/23/14 3:18 PM, Daniel Fuchs wrote:
Hi Jason,
On 6/23/14 2:51 PM, Jason Mehrens wrote:
Daniel,
My understanding is that changing CREATE_NEW to use CREATE would make it work like does in JDK7. Closing the lock files when the FileHandler is unreferenced I is probably the fix for JDK-6774110: lock file is not deleted when child logger is used.
I'll see if I can dig more info about the reason for the change. Thanks for pointing at JDK-6774110 - I'll have a look at that as well.
-- daniel
If we could prove that system FileLock is mandatory we could use the JDK7 behavior otherwise if they are advisory then use the JDK8 behavior. If we knew the boot time of the JVM host O/S we could delete all lock files older than the boot time + some constant.
Jason
----------------------------------------
Date: Mon, 23 Jun 2014 11:48:18 +0200 From: daniel.fuchs@oracle.com To: jason_mehrens@hotmail.com; core-libs-dev@openjdk.java.net Subject: Re: Zombie FileHandler locks can exhaust all available log file locks.
Hi Jason,
Looking at the diff for 6244047 - I see that, as you pointed out, the unwanted behavior described comes from the fact that the new code is using CREATE_NEW - which prevents the 'zombie lock files' from being reused.
I am not an expert in file locks - but I wonder whether we could revert to using CREATE instead: wouldn't tryLock() later tell us if the file is used by another entity?
Another possibility would be to combine CREATE_NEW and DELETE_ON_CLOSE, which according to StandardOpenOptions will attempt to delete the file on JVM termination if close() hasn't been called. This probably wouldn't help in case on VM crash, but it would help in the case demonstrated by your test below. I have however some reluctance because I see that we call FileChannel.close() in the case were the lock can't be obtained, and I'm not sure what that would do... Also StandardOpenOptions has some warning about using DELETE_ON_CLOSE for files that can be opened concurrently by other entities - so I'm not sure it would be appropriate.
The last possibility I see would be to change the lock HashMap to take instances of a subclass of WeakReference<FileHandler> as values (instead of String), and add some code that attempts to close & remove the lock file when the FileHandler is no longer referenced. Again - this will probably not help in the case of crash, and also adds the question on when the weak reference queue should be polled to ensure that the no longer referenced FileChannel are closed in a timely fashion.
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
On another track - we could also make MAX configurable - but that would just be shifting the issue around - wouldn't it?
best regards,
-- daniel
On 6/20/14 10:54 PM, Jason Mehrens wrote:
Daniel, Jim,
In JDK8 the FileHandler file locking was changed to use FileChannel.open with CREATE_NEW. If the file exists (locked or not) the FileHandler will rotate due to safety concerns about writing to the same log file. The FileHandler has an upper bound of 100 as the number of file locks that can be attempted to be acquired. Given the right pattern, enough time, and enough failures it seems at it is possible for a program to end up in a state where the FileHandler is surrounded by zombie lock files refuses log or perform a clean up action. (A.K.A Farmer Rick Grimes). This means that the lck files have to manually be deleted or the FileHandler will just fail with an IOException every time it is created.
A simple test to emulate crashing is to run (JDK7 vs. JDK8) the following code twice without deleting the log and lck files:
public static void main(String[] args) throws Exception { System.out.println(System.getProperty("java.runtime.version")); ReferenceQueue<FileHandler> q = new ReferenceQueue<>();
for (int i=0; i<100; i++) { WeakReference<FileHandler> h = new WeakReference<>( new FileHandler("./test_%u.log", 10000, 2, true), q); while (q.poll() != h) { System.runFinalization(); System.gc(); System.runFinalization(); Thread.yield(); } }
}
I understand that if you can't trust that the file locking always works then there isn't much that can be done. Leaving the number of locks as unbounded isn't really an option either. Seems like there should be a way to identify zombie lock files (ATOMIC_MOVE) and delete them. Any thoughts on this?
The source discussion on this can be found at http://stackoverflow.com/questions/24321098/is-java-util-logging-filehandler...
Regards,
Jason
On 23/06/2014 10:48, Daniel Fuchs wrote:
:
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
The logging use of FileLock is a bit unusual but looking at it now then I don't see the reason why it needs to use CREATE_NEW. I don't think DELETE_ON_CLOSE is suitable here as that work is for transient work files which isn't the case here. Instead it could use deleteOnExit to have some chance of removing it on a graceful exit. BTW: Do you know why locks is Map? Would a Set do? -Alan.
On 6/23/14 4:53 PM, Alan Bateman wrote:
On 23/06/2014 10:48, Daniel Fuchs wrote:
:
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
The logging use of FileLock is a bit unusual but looking at it now then I don't see the reason why it needs to use CREATE_NEW.
OK - thanks - in the discussion that ended up in the patch where CREATE_NEW was introduced Jason (I think it was Jason) pointed at https://bugs.openjdk.java.net/browse/JDK-4420020 I'm guessing here that maybe it would be wrong to use an already existing lock file if you don't have the rights to write to its directory - and that using CREATE_NEW ensures that you have all necessary access rights all along the path.
I don't think DELETE_ON_CLOSE is suitable here as that work is for transient work files which isn't the case here. Instead it could use deleteOnExit to have some chance of removing it on a graceful exit.
Right - I suggested a patch in another mail where I just use that.
BTW: Do you know why locks is Map? Would a Set do?
It certainly looks as if we could use a simple HashSet here. Thanks Alan! -- daniel
-Alan.
Daniel, With regard to JDK-4420020, in the original webrev Jim was incorrectly using java.io.File.canWrite() but that webrev was replaced by the current version. The NIO.2 code performs the effective access checks correctly. With regard to JDK-6244047 my concern was that checking the permissions and preconditions up front is a small 'TOCTOU' race and redundant because the next step after that is to open the FileChannel. I would assume both CREATE or CREATE_NEW plus WRITE would perform the effective access checks on open. The use of delete on exit is a deal breaker because you can't undo a registration on close of the FileHandler which can trigger an OOME. See https://bugs.openjdk.java.net/browse/JDK-4872014. I've used custom proxy handlers that generate a file name based off of the LogRecord which results in generating lots of file names and opening and close the FileHandler on demand. If the behavior is reverted to use CREATE instead of CREATE_NEW then we have to deal with JDK-8031438 because any user code can lock a file ahead of opening the FileHandler. Jason ----------------------------------------
Date: Mon, 23 Jun 2014 17:13:04 +0200 From: daniel.fuchs@oracle.com To: Alan.Bateman@oracle.com; jason_mehrens@hotmail.com; core-libs-dev@openjdk.java.net Subject: Re: Zombie FileHandler locks can exhaust all available log file locks.
On 6/23/14 4:53 PM, Alan Bateman wrote:
On 23/06/2014 10:48, Daniel Fuchs wrote:
:
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
The logging use of FileLock is a bit unusual but looking at it now then I don't see the reason why it needs to use CREATE_NEW.
OK - thanks - in the discussion that ended up in the patch where CREATE_NEW was introduced Jason (I think it was Jason) pointed at https://bugs.openjdk.java.net/browse/JDK-4420020
I'm guessing here that maybe it would be wrong to use an already existing lock file if you don't have the rights to write to its directory - and that using CREATE_NEW ensures that you have all necessary access rights all along the path.
I don't think DELETE_ON_CLOSE is suitable here as that work is for transient work files which isn't the case here. Instead it could use deleteOnExit to have some chance of removing it on a graceful exit.
Right - I suggested a patch in another mail where I just use that.
BTW: Do you know why locks is Map? Would a Set do?
It certainly looks as if we could use a simple HashSet here.
Thanks Alan!
-- daniel
-Alan.
Hi Jason, Thanks for the feedback! On 6/24/14 7:08 PM, Jason Mehrens wrote:
Daniel,
With regard to JDK-4420020, in the original webrev Jim was incorrectly using java.io.File.canWrite() but that webrev was replaced by the current version. The NIO.2 code performs the effective access checks correctly.
Thanks.
With regard to JDK-6244047 my concern was that checking the permissions and preconditions up front is a small 'TOCTOU' race and redundant because the next step after that is to open the FileChannel. I would assume both CREATE or CREATE_NEW plus WRITE would perform the effective access checks on open.
OK.
The use of delete on exit is a deal breaker because you can't undo a registration on close of the FileHandler which can trigger an OOME. See https://bugs.openjdk.java.net/browse/JDK-4872014. I've used custom proxy handlers that generate a file name based off of the LogRecord which results in generating lots of file names and opening and close the FileHandler on demand.
ah. hmmmm. I didn't think there would be that many FileHandlers... Well - I guess that if we find a way to reuse the zombie lock files then we don't really need the delete on exit. Someone creating a FileHandler programmatically should be responsible for closing it - so if an application creates FileHandlers without closing them then it's a bug in the application.
If the behavior is reverted to use CREATE instead of CREATE_NEW then we have to deal with JDK-8031438 because any user code can lock a file ahead of opening the FileHandler.
Yes - I discovered that while writing a test for my suggested fix ;-) Catching OverlappingFileLockException in FileHandler.openFiles and setting available to false when it's thrown fixes the issue. So let's just remove the deleteOnExit & add the catch for OverlappingFileLockException to my patch and we should be good. On the other hand we could just use replace CREATE_NEW by CREATE but I have some misgivings about the part that sets available to true when tryLock throws an IOException. I'm not sure we should be doing that if we haven't actually created the lock file. So I think I'd prefer keeping the two steps behavior - first try CREATE_NEW - then attempt to use CREATE if CREATE_NEW failed... I'm not sure CREATE will check the access rights for writing in the directory if the file already exists - so checking the directory for write access in that case is probably the right thing to do. Would you agree? -- daniel
Jason
----------------------------------------
Date: Mon, 23 Jun 2014 17:13:04 +0200 From: daniel.fuchs@oracle.com To: Alan.Bateman@oracle.com; jason_mehrens@hotmail.com; core-libs-dev@openjdk.java.net Subject: Re: Zombie FileHandler locks can exhaust all available log file locks.
On 6/23/14 4:53 PM, Alan Bateman wrote:
On 23/06/2014 10:48, Daniel Fuchs wrote:
:
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
The logging use of FileLock is a bit unusual but looking at it now then I don't see the reason why it needs to use CREATE_NEW.
OK - thanks - in the discussion that ended up in the patch where CREATE_NEW was introduced Jason (I think it was Jason) pointed at https://bugs.openjdk.java.net/browse/JDK-4420020
I'm guessing here that maybe it would be wrong to use an already existing lock file if you don't have the rights to write to its directory - and that using CREATE_NEW ensures that you have all necessary access rights all along the path.
I don't think DELETE_ON_CLOSE is suitable here as that work is for transient work files which isn't the case here. Instead it could use deleteOnExit to have some chance of removing it on a graceful exit.
Right - I suggested a patch in another mail where I just use that.
BTW: Do you know why locks is Map? Would a Set do?
It certainly looks as if we could use a simple HashSet here.
Thanks Alan!
-- daniel
-Alan.
Daniel, I agree with you. I think you have it all covered. Thanks for working on this! Jason ----------------------------------------
Date: Tue, 24 Jun 2014 19:37:24 +0200 From: daniel.fuchs@oracle.com To: jason_mehrens@hotmail.com; alan.bateman@oracle.com; core-libs-dev@openjdk.java.net Subject: Re: Zombie FileHandler locks can exhaust all available log file locks.
Hi Jason,
Thanks for the feedback!
On 6/24/14 7:08 PM, Jason Mehrens wrote:
Daniel,
With regard to JDK-4420020, in the original webrev Jim was incorrectly using java.io.File.canWrite() but that webrev was replaced by the current version. The NIO.2 code performs the effective access checks correctly.
Thanks.
With regard to JDK-6244047 my concern was that checking the permissions and preconditions up front is a small 'TOCTOU' race and redundant because the next step after that is to open the FileChannel. I would assume both CREATE or CREATE_NEW plus WRITE would perform the effective access checks on open.
OK.
The use of delete on exit is a deal breaker because you can't undo a registration on close of the FileHandler which can trigger an OOME. See https://bugs.openjdk.java.net/browse/JDK-4872014. I've used custom proxy handlers that generate a file name based off of the LogRecord which results in generating lots of file names and opening and close the FileHandler on demand.
ah. hmmmm. I didn't think there would be that many FileHandlers... Well - I guess that if we find a way to reuse the zombie lock files then we don't really need the delete on exit. Someone creating a FileHandler programmatically should be responsible for closing it - so if an application creates FileHandlers without closing them then it's a bug in the application.
If the behavior is reverted to use CREATE instead of CREATE_NEW then we have to deal with JDK-8031438 because any user code can lock a file ahead of opening the FileHandler.
Yes - I discovered that while writing a test for my suggested fix ;-) Catching OverlappingFileLockException in FileHandler.openFiles and setting available to false when it's thrown fixes the issue.
So let's just remove the deleteOnExit & add the catch for OverlappingFileLockException to my patch and we should be good.
On the other hand we could just use replace CREATE_NEW by CREATE but I have some misgivings about the part that sets available to true when tryLock throws an IOException. I'm not sure we should be doing that if we haven't actually created the lock file. So I think I'd prefer keeping the two steps behavior - first try CREATE_NEW - then attempt to use CREATE if CREATE_NEW failed... I'm not sure CREATE will check the access rights for writing in the directory if the file already exists - so checking the directory for write access in that case is probably the right thing to do.
Would you agree?
-- daniel
Jason
----------------------------------------
Date: Mon, 23 Jun 2014 17:13:04 +0200 From: daniel.fuchs@oracle.com To: Alan.Bateman@oracle.com; jason_mehrens@hotmail.com; core-libs-dev@openjdk.java.net Subject: Re: Zombie FileHandler locks can exhaust all available log file locks.
On 6/23/14 4:53 PM, Alan Bateman wrote:
On 23/06/2014 10:48, Daniel Fuchs wrote:
:
All in all - I feel our best options would be to revert to using CREATE, or use CREATE_NEW + DELETE_ON_CLOSE, or not do anything and live with the issue. Hopefully some nio experts will chime in ;-)
The logging use of FileLock is a bit unusual but looking at it now then I don't see the reason why it needs to use CREATE_NEW.
OK - thanks - in the discussion that ended up in the patch where CREATE_NEW was introduced Jason (I think it was Jason) pointed at https://bugs.openjdk.java.net/browse/JDK-4420020
I'm guessing here that maybe it would be wrong to use an already existing lock file if you don't have the rights to write to its directory - and that using CREATE_NEW ensures that you have all necessary access rights all along the path.
I don't think DELETE_ON_CLOSE is suitable here as that work is for transient work files which isn't the case here. Instead it could use deleteOnExit to have some chance of removing it on a graceful exit.
Right - I suggested a patch in another mail where I just use that.
BTW: Do you know why locks is Map? Would a Set do?
It certainly looks as if we could use a simple HashSet here.
Thanks Alan!
-- daniel
-Alan.
Hi, Please find below a patch [1] for [2]: 8048020 - Regression on java.util.logging.FileHandler This corresponds to the problem that was discussed on this list in the following email thread: "Zombie FileHandler locks can exhaust all available log file locks." http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-June/027296.html [1] http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.00/ [2] https://bugs.openjdk.java.net/browse/JDK-8048020 This patch would be a good candidate for being backported on 8, since the issue was introduced in 8. best regards, -- daniel
Daniel, FileChannel.open(WRITE,APPEND) could throw NoSuchFileException during a startup and shutdown race between two VMs. That case needs to either perform a bounded retry or continue and rotate. Jason ----------------------------------------
Date: Tue, 24 Jun 2014 20:34:46 +0200 From: daniel.fuchs@oracle.com To: core-libs-dev@openjdk.java.net CC: jason_mehrens@hotmail.com; Alan.Bateman@oracle.com Subject: RFR: 8048020 - Regression on java.util.logging.FileHandler
Hi,
Please find below a patch [1] for [2]:
8048020 - Regression on java.util.logging.FileHandler
This corresponds to the problem that was discussed on this list in the following email thread:
"Zombie FileHandler locks can exhaust all available log file locks." http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-June/027296.html
[1] http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.00/ [2] https://bugs.openjdk.java.net/browse/JDK-8048020
This patch would be a good candidate for being backported on 8, since the issue was introduced in 8.
best regards,
-- daniel
On 6/25/14 2:34 PM, Jason Mehrens wrote:
Daniel,
FileChannel.open(WRITE,APPEND) could throw NoSuchFileException during a startup and shutdown race between two VMs. That case needs to either perform a bounded retry or continue and rotate.
Hi Jason, Alan, Here is an updated version of the fix that does a bounded retry (retries once - and if it fails, proceeds with the next name). I have also added NO_FOLLOW_LINKS - for the case where we try to open an existing Lockfile, and suppressed the Files.isWritable check since that will be taken care of by the call to FileChannel.open. http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.01/ I also updated the comments to make it clear that the file could not have been locked by another instance of FileHandler (since that would have been taken care of by our global 'locks' synchronization). best regards, -- daniel
Jason
----------------------------------------
Date: Tue, 24 Jun 2014 20:34:46 +0200 From: daniel.fuchs@oracle.com To: core-libs-dev@openjdk.java.net CC: jason_mehrens@hotmail.com; Alan.Bateman@oracle.com Subject: RFR: 8048020 - Regression on java.util.logging.FileHandler
Hi,
Please find below a patch [1] for [2]:
8048020 - Regression on java.util.logging.FileHandler
This corresponds to the problem that was discussed on this list in the following email thread:
"Zombie FileHandler locks can exhaust all available log file locks." http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-June/027296.html
[1] http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.00/ [2] https://bugs.openjdk.java.net/browse/JDK-8048020
This patch would be a good candidate for being backported on 8, since the issue was introduced in 8.
best regards,
-- daniel
Daniel, Looks bullet proof to me. Jason ----------------------------------------
Date: Tue, 1 Jul 2014 11:25:33 +0200 From: daniel.fuchs@oracle.com To: jason_mehrens@hotmail.com; core-libs-dev@openjdk.java.net CC: alan.bateman@oracle.com Subject: Re: RFR: 8048020 - Regression on java.util.logging.FileHandler
On 6/25/14 2:34 PM, Jason Mehrens wrote:
Daniel,
FileChannel.open(WRITE,APPEND) could throw NoSuchFileException during a startup and shutdown race between two VMs. That case needs to either perform a bounded retry or continue and rotate.
Hi Jason, Alan,
Here is an updated version of the fix that does a bounded retry (retries once - and if it fails, proceeds with the next name). I have also added NO_FOLLOW_LINKS - for the case where we try to open an existing Lockfile, and suppressed the Files.isWritable check since that will be taken care of by the call to FileChannel.open.
http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.01/
I also updated the comments to make it clear that the file could not have been locked by another instance of FileHandler (since that would have been taken care of by our global 'locks' synchronization).
best regards,
-- daniel
Jason
----------------------------------------
Date: Tue, 24 Jun 2014 20:34:46 +0200 From: daniel.fuchs@oracle.com To: core-libs-dev@openjdk.java.net CC: jason_mehrens@hotmail.com; Alan.Bateman@oracle.com Subject: RFR: 8048020 - Regression on java.util.logging.FileHandler
Hi,
Please find below a patch [1] for [2]:
8048020 - Regression on java.util.logging.FileHandler
This corresponds to the problem that was discussed on this list in the following email thread:
"Zombie FileHandler locks can exhaust all available log file locks." http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-June/027296.html
[1] http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.00/ [2] https://bugs.openjdk.java.net/browse/JDK-8048020
This patch would be a good candidate for being backported on 8, since the issue was introduced in 8.
best regards,
-- daniel
On 01/07/2014 10:25, Daniel Fuchs wrote:
Hi Jason, Alan,
Here is an updated version of the fix that does a bounded retry (retries once - and if it fails, proceeds with the next name). I have also added NO_FOLLOW_LINKS - for the case where we try to open an existing Lockfile, and suppressed the Files.isWritable check since that will be taken care of by the call to FileChannel.open.
http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.01/
I also updated the comments to make it clear that the file could not have been locked by another instance of FileHandler (since that would have been taken care of by our global 'locks' synchronization).
best regards,
-- daniel
This looks okay to me except for the zombie file case. I think I missed the reason for using APPEND. Also you catch FileNotFoundException (which is not thrown by FileChannel.open) so I wonder if you mean to catch IOException so that you handle all possible I/O exceptions. If you meant IOException to handle any error then the isWritable on the parent directory isn't needed (we can make the isRegularFile check go away too if you want). In the test case then the the use of getAbsolutePath seems redundant. Also just to mention that setup method could use Files.createTempDirectory. -Alan
On 7/4/14 6:35 PM, Alan Bateman wrote:
On 01/07/2014 10:25, Daniel Fuchs wrote:
Hi Jason, Alan,
Here is an updated version of the fix that does a bounded retry (retries once - and if it fails, proceeds with the next name). I have also added NO_FOLLOW_LINKS - for the case where we try to open an existing Lockfile, and suppressed the Files.isWritable check since that will be taken care of by the call to FileChannel.open.
http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.01/
I also updated the comments to make it clear that the file could not have been locked by another instance of FileHandler (since that would have been taken care of by our global 'locks' synchronization).
best regards,
-- daniel
This looks okay to me except for the zombie file case. I think I missed the reason for using APPEND.
Given that nothing is going to be written to the file then maybe I don't need APPEND. I just don't want the call to FileChannel.open() to truncate the file.
Also you catch FileNotFoundException (which is not thrown by FileChannel.open) oh? What will FileChannel.open throw if the file does not exist then? Is there another exception? Or do you mean it's not possible to know why FileChannel.open fails? That would be bad... so I wonder if you mean to catch IOException so that you handle all possible I/O exceptions. If you meant IOException to handle any error then the isWritable on the parent directory isn't needed (we can make the isRegularFile check go away too if you want). No I just want to catch the case where the file does not exist.
In the test case then the the use of getAbsolutePath seems redundant. Also just to mention that setup method could use Files.createTempDirectory.
The test has several @run lines and depends on the lines to be invoked in sequence. In particular this sequence: 32 * @run main/othervm CheckZombieLockTest CLEANUP 33 * @run main/othervm CheckZombieLockTest WRITABLE 34 * @run main/othervm CheckZombieLockTest CREATE_FIRST 35 * @run main/othervm CheckZombieLockTest CREATE_NEXT 36 * @run main/othervm CheckZombieLockTest CREATE_NEXT 37 * @run main/othervm CheckZombieLockTest CLEANUP where it is important that the directory *is not* deleted between two invocations. The CLEANUP action will delete the directory if everything goes well, and the try { } catch { } in main will take care of it if the test fails... -- daniel
-Alan
On 04/07/2014 18:25, Daniel Fuchs wrote:
Given that nothing is going to be written to the file then maybe I don't need APPEND. I just don't want the call to FileChannel.open() to truncate the file.
APPEND should be harmless here, just not needed.
Also you catch FileNotFoundException (which is not thrown by FileChannel.open) oh? What will FileChannel.open throw if the file does not exist then? Is there another exception? Or do you mean it's not possible to know why FileChannel.open fails? That would be bad...
All the specific file system exceptions are defined in java.nio.file but it's not clear to me that you only want to handle NoSuchFileException. Don't you want to handle cases where the zombie file can't be opened for other reasons (file permission issues for example that would manifest as an AccessDeniedException)? -Alan.
On 7/4/14 7:58 PM, Alan Bateman wrote:
On 04/07/2014 18:25, Daniel Fuchs wrote:
Given that nothing is going to be written to the file then maybe I don't need APPEND. I just don't want the call to FileChannel.open() to truncate the file.
APPEND should be harmless here, just not needed.
Also you catch FileNotFoundException (which is not thrown by FileChannel.open) oh? What will FileChannel.open throw if the file does not exist then? Is there another exception? Or do you mean it's not possible to know why FileChannel.open fails? That would be bad...
All the specific file system exceptions are defined in java.nio.file Argh. Thanks for pointing that out. I need NoSuchFileException, not FileNotFoundException. but it's not clear to me that you only want to handle NoSuchFileException. Don't you want to handle cases where the zombie file can't be opened for other reasons (file permission issues for example that would manifest as an AccessDeniedException)? hmmm. yes - you're right. I should catch that to and break the loop in that case. So that would become:
465 } catch (NoSuchFileException x) { 466 // Race condition - retry once, and if that 467 // fails again just try the next name in 468 // the sequence. 469 continue; 470 } catch (IOException x) { // the file may not be writable for us. // try the next name in the sequence break; } Thanks for the feedback Alan! I had missed those cases. -- daniel
-Alan.
On 04/07/2014 19:10, Daniel Fuchs wrote:
hmmm. yes - you're right. I should catch that to and break the loop in that case. So that would become:
465 } catch (NoSuchFileException x) { 466 // Race condition - retry once, and if that 467 // fails again just try the next name in 468 // the sequence. 469 continue; 470 } catch (IOException x) { // the file may not be writable for us. // try the next name in the sequence break; }
Thanks for the feedback Alan! I had missed those cases. If that is the final change then I think I'm okay (no need to generate a new webrev). I do think it could be improved a bit more by dropping the isXXX checks but it's not important.
-Alan
Thanks Alan! Just for the record the 'final' webrev is here: http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.02 I will push this shortly... -- daniel On 7/4/14 8:24 PM, Alan Bateman wrote:
On 04/07/2014 19:10, Daniel Fuchs wrote:
hmmm. yes - you're right. I should catch that to and break the loop in that case. So that would become:
465 } catch (NoSuchFileException x) { 466 // Race condition - retry once, and if that 467 // fails again just try the next name in 468 // the sequence. 469 continue; 470 } catch (IOException x) { // the file may not be writable for us. // try the next name in the sequence break; }
Thanks for the feedback Alan! I had missed those cases. If that is the final change then I think I'm okay (no need to generate a new webrev). I do think it could be improved a bit more by dropping the isXXX checks but it's not important.
-Alan
On 24/06/2014 19:34, Daniel Fuchs wrote:
Hi,
Please find below a patch [1] for [2]:
8048020 - Regression on java.util.logging.FileHandler
This corresponds to the problem that was discussed on this list in the following email thread:
"Zombie FileHandler locks can exhaust all available log file locks." http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-June/027296.html
[1] http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.00/ [2] https://bugs.openjdk.java.net/browse/JDK-8048020
This patch would be a good candidate for being backported on 8, since the issue was introduced in 8. Catching OverlappingFileLockException is usually a sign of a mis-use. Can you summarize how this comes about (given the locking on "locks").
The three Files.isXXX are expensive. This may be a case where you just attempt to create the file with FileChannel.open. It probably should be CREATE + WRITE + APPEND to avoid the case of another process or VM removing the file at around the same time. -Alan.
On 6/25/14 5:07 PM, Alan Bateman wrote:
On 24/06/2014 19:34, Daniel Fuchs wrote:
Hi,
Please find below a patch [1] for [2]:
8048020 - Regression on java.util.logging.FileHandler
This corresponds to the problem that was discussed on this list in the following email thread:
"Zombie FileHandler locks can exhaust all available log file locks." http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-June/027296.html
[1] http://cr.openjdk.java.net/~dfuchs/webrev_8048020/webrev.00/ [2] https://bugs.openjdk.java.net/browse/JDK-8048020
This patch would be a good candidate for being backported on 8, since the issue was introduced in 8. Catching OverlappingFileLockException is usually a sign of a mis-use. Can you summarize how this comes about (given the locking on "locks").
A file is just a file. So nothing prevents some other piece of code in the same process to call FileChannel.lock() directly. For instance, another Handler implementation might do that - given that our 'locks' HashMap is private - it wouldn't be able to use it.
The three Files.isXXX are expensive. This may be a case where you just attempt to create the file with FileChannel.open. It probably should be CREATE + WRITE + APPEND to avoid the case of another process or VM removing the file at around the same time.
The issue here is that CREATE + WRITE + APPEND will probably succeed if the file exists - even though we can't write in its parent dir. I am not sure we should be concerned about the performance of opening the file for the FileHandler. It's usually not something you do very often. However - Jason is right - there is the potential for a race condition - so I should probably catch FileNotFoundException for the second call to FileChannel.open. -- daniel
-Alan.
On 25/06/2014 16:21, Daniel Fuchs wrote:
On 6/25/14 5:07 PM, Alan Bateman wrote:
Catching OverlappingFileLockException is usually a sign of a mis-use. Can you summarize how this comes about (given the locking on "locks").
A file is just a file. So nothing prevents some other piece of code in the same process to call FileChannel.lock() directly. For instance, another Handler implementation might do that - given that our 'locks' HashMap is private - it wouldn't be able to use it. Okay, so this concern isn'tFileHandler but rather malicious or misguided usages in other code running in the same VM. In that case I think the comment could be a bit clearer that it's something other than the FileHandler that is locking the file.
The issue here is that CREATE + WRITE + APPEND will probably succeed if the file exists - even though we can't write in its parent dir. I am not sure we should be concerned about the performance of opening the file for the FileHandler. It's usually not something you do very often.
I don't think the Files.isXXX is useful here, especially if there are other things changing the file system at the same time. There is no guarantee that FileChannel.open will succeed. This may be a case where you have to catch the IOException and continue (it's not nice to use exceptions for control-flow but this may be one of these cases where it becomes necessary. -Alan.
Alan,
Catching OverlappingFileLockException is usually a sign of a mis-use. Can you summarize how this comes about (given the locking on "locks").
Malicious intent or ignorant mis-use by code outside of the FileHandler can turn the FileHandler into a victim of that misuse. ======= public static void main(String[] args) throws Exception { Path p = Files.createTempFile("test", "test"); try (FileChannel pathogen = FileChannel.open(p, CREATE, WRITE)) { FileLock lck = pathogen.lock(); System.out.println(lck); try (FileChannel fileHandler = FileChannel.open(p, WRITE, APPEND)) { FileLock fhl = fileHandler.lock(); System.out.println(fhl); } } finally { Files.deleteIfExists(p); } } ========= sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid] Exception in thread "main" java.nio.channels.OverlappingFileLockException ================= Jason
participants (3)
-
Alan Bateman
-
Daniel Fuchs
-
Jason Mehrens