purpose of FileDispatcher.preClose()

Wed Jan 30 20:14:11 UTC 2008

On Wed, 30 Jan 2008, Alan Bateman wrote:

> Michael Allman wrote:
>> Hello,
>> 
>> Can someone with knowledge of such matters explain what 
>> FileDispatcher.preClose() is supposed to do on Solaris/Linux.  I mean, I 
>> see the code, but I don't understand why it exists or what problem it's 
>> supposed to avoid or something.
>> 
>> I ask because I'm trying to fix a file-locking problem on soylatte and it 
>> seems the solution to that problem is to remove this code (on that 
>> platform).  But before I charge ahead, I need a better understanding of why 
>> this code exists.
>> 
>> In particular, I'm really interested in the stuff that happens in 
>> FileDispatcher.c, functions Java_sun_nio_ch_FileDispatcher_init and 
>> Java_sun_nio_ch_FileDispatcher_preClose0.  They're setting something up 
>> that looks important, but I just don't get it.
> In a multi-threaded application it is always difficult to know when you can 
> safely close and release a file descriptor (or other resource). If one thread 
> is using a file descriptor to read or write and another thread releases 
> (closes) it then it it possible for the first thread to read or write to the 
> wrong file or socket in the event that the file descriptor is recycled 
> quickly. The approach that we use in both classic networking and NIO is to 
> use a two-step process. In the first step we duplicate (dup2) the file 
> descriptor to another that is one end of a half shutdown socket pair. Other 
> threads that are reading or writing but haven't called the read or write 
> system calls yet will get an immediate EOF or pipe error when they do so. As 
> the threads complete the read or write method then they examine their state. 
> If there is a close pending then the last one releases the file descriptor. 
> Hopefully this brief overview gives you some idea what this code is about. 
> The FileDescriptor#init method is where the socketpair is created, and that 
> preClose0 method does the dup2. I haven't been following the Soylatte port 
> very closely so I'm curious what problem you are seeing - when you say "file 
> locking" do you mean FileChannel#lock? If so then the issue may be that the 
> asynchronous close mechanism isn't completely extended to FileChannel yet.

I think I get it.  So let me explain the problem I'm seeing here.

If I close a file channel on which I have acquired (but not released) a 
file lock, I get an IOException: Bad file descriptor.  For example, the 
Lock regression test does this and fails (on soylatte).

I think the problem here is that FileChannelImpl.implCloseChannel() calls 
nd.preClose(fd) before the block that releases its file locks.  On 
non-windows, nd.preClose(fd) doesn't just "pre close" fd, it closes it. 
Then implCloseChannel() tries to release its file locks.  fd now points to 
a socket descriptor and on Solaris/Linux, such attempt seems to be 
harmless.  On Mac OS X, it complains with the EBADF error code.

It seems that the preClose semantics are not correctly handled by the 
FileChannelImpl.implCloseChannel() method.  On non-windows, it attempts to 
release file locks that no longer exist (because preClose() releases 
them).  It seems that the file lock release block should be moved into 
NativeDispatcher.preClose().  It will be run on Windows, but will not be 
run on non-Windows.  That seems correct to me, given that on non-Windows, 
preClose0 releases the file locks.

Obviously, this kind of change is much more than a soylatte patch.  It 
changes code that already works on Windows, Solaris, and Linux.  But if my 
analysis is correct, it looks like it's just a silent bug.

Thoughts?

Michael