Recent Java 9 commit (e5b66323ae45) breaks fsync on directory

Uwe Schindler uschindler at apache.org
Fri Jan 9 15:46:28 UTC 2015


Hi,

I just subscribed to this mailing list on behalf of the Apache Lucene committers. You might know that we recently test Apache Lucene/Solr and also Elasticsearch to detect problems with especially Hotspot. We recently updated our testing infrastructure to make use of JDK 9 preview build 40. We mainly did this to check for issues around Jigsaw, but, toi toi toi, nothing breaks the build. :-)

Unfortunately some recent commit in OpenJDK 9, caused some headaches: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/e5b66323ae45, the corresponding issue is https://bugs.openjdk.java.net/browse/JDK-8066915. To keep track on this, we opened an issue on our side, too: https://issues.apache.org/jira/browse/LUCENE-6169

Let me first describe what we currently do: Apache Lucene is using "write once" approach (every file is written only once). When we "commit" a given "commit point" in Lucene, we have the following semantics: We write to some temporary file name, then we fsync this file (and all related files). This is easy with a file channel: Just call fc.force(). The final "publish" of the commit is done by an atomic rename using Files.move(Path, Path, StandardCopyOption.ATOMIC_MOVE). This works fine unless you have a real disaster :-) - like power outage. In that case on POSIX operating systems, the rename operation might not be visible at all. On Linux, the whole thing explicitly has the following statement in the MAN page (http://linux.die.net/man/2/fsync): "Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed."

Basically we currently do the same on Apache Lucene through the following: We open a FileChannel (for READ) on the directory itself and then also call fc.force(). Of course, as this is not really documented in the Java API (but we know it always worked!) we do this on a "buest guess": We cannot fail if this throws any IOException. We know, for example, that this does not work on Windows*). 

The issue is now, that the above commit now causes this approach to fail with a FileSystemException on OpenJDK 9: FileSystemException("Is a directory"). This does not break our Lucene releases outside, because - as said before - we swallow any exceptions on this. But in our testing infrastructure, we at least assert that this works on Linux and MacOSX. And this assert failed: The current code is here: http://goo.gl/vKhtsW

We really would like to keep the possibility to fsync a directory on supported operating systems. We hope that the above commit will not be backported into 8u40 and 7u80 releases! In Java 9 we can discuss about other solutions how to handle this:
- Keep current semantics as of Java 7 and Java 8 and just fail if you really want to READ/WRITE from this FileChannel? This is how the underlying operatinmg system and libc handles this. You can open a file descriptor on anything, file/directory/device/..., but not all operations work on this descriptor, some of them throw exception/return error.
- Add a new API for fsyncing a directory (maybe for any file type). Like Files.fsync(Path)? On Windows this could just be a no-op for directories? Basically something like our IOUtils.fsync() from the link above.

What's you opinion and how should we proceed?

Uwe

*) But there, the semantics on the file system make sure that we can see the file, so this is not really an issue - out of scope here (Opening a directory for read causes a "Access Denied" mapped to Java IOException, but that's perfectly fine).

-----
Uwe Schindler
uschindler at apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




More information about the nio-dev mailing list