Real async file IO on Linux?
Tim Fox
timvolpe at gmail.com
Wed Jul 27 08:58:30 PDT 2011
On 27/07/2011 14:17, Alan Bateman wrote:
> Tim Fox wrote:
>> Hello All,
>>
>> In anticipation of the iminent Java 7 release, I took a look at the
>> source for asynchronous file IO, and it seems to be "faking" async IO
>> by hiding old synchronous IO behind a thread pool.
>>
>> I'm interested in understanding why real OS async file IO hasn't been
>> used for those operating systems that support it. I'm particularly
>> interested in Linux support.
> The issue at the time on Linux was that it wasn't supported for
> buffered file I/O (only direct I/O or block device). I haven't checked
> it recently to see if that was changed. It wouldn't be too hard to
> provide an implementation that uses io_submit etc. but it would like
> require us to provide a special open option and also provide a means
> to ensure that applications get direct buffers that are aligned
> appropriately.
IMO, the value of async IO with buffered IO is not great. If you're just
writing into a cache and then flushing it from time to time with a sync
then you may as well just use synchronous IO and stick an executor on
the front to make it appear async, which, AIUI, is what appears to have
been done in Java 7 (so far). It's of some value providing that in the
JDK, but to be honest, any decent programmer could provide such a
wrapper in their own application very easily.
Real direct async IO is the desirable feature since it allows the
programmer to write applications that do a lot of persistence in a
scalable way, not possible with synchronous or buffered IO.
Consider an example of a server which has many client connections which
send it data which has to be persisted. Once the data has been persisted
the client needs to be informed of this so they can proceed. Examples of
servers using this pattern might be a database server, a messaging
system, an order processing system, basically anything that needs to
scalable and reliably persist data.
Using buffered IO this is hard to implement scalably. A naive
implementation will write data as it arrives (into the OS buffer), and
then call sync (or fsync or whatever). When that call has returned it
sends its response back to the client saying "data persisted ok".
Problem is this doesn't scale when you have many (possibly many
thousands) of connections calling sync for each write. Sync is a
heavyweight operation.
This can be worked around to some extent, by "batching" of fsync calls.
E.g. you only sync at most, say, every 10 milliseconds, after which all
writes waiting for a sync can return their completion. The problem with
this is introduces extra latency to completion for each client
connection. It also fairly tricky to code.
True non buffered async IO solves this problem, by not using any
explicit sync, but by providing a callback when the data has actually
made it to disk. Once the callback has fired the completion can be sent
to the client connection since its known the data is persisted. No sync
is required, and certainly no tricky batching of syncs is required in
order to make syncing scale.
True async IO also provides latency benefits. Consider that, for a
typically sync call, the data being flushed from the buffer may be
written all over the disk, so it requires a complete rotation of the
disk to let everything pass under the head so it can be written. This
limits sync rate to the rotation speed of the disk (usually around
200-300 syncs per second for a quality disk).
With direct async IO, each individually write (assuming they're not too
big) usually resides on a more localised part of the disk. So, on
average it only takes one half revolution of the disk for that random
point on the disk to pass under the write heads. This allows a direct
async approach to have on average half the write latency of a
traditional buffered IO synchronous approach. This is a big deal for
messaging and database applications.
I believe non buffered IO is what async IO is all about. Focussing on
the buffered use case is looking in the wrong place, IMHO.
Just my 2c! ;)
More information about the nio-discuss
mailing list