Real async file IO on Linux?

Tim Fox timvolpe at gmail.com
Thu Jul 28 03:27:31 PDT 2011


On 28/07/2011 07:25, Mingfai wrote:
>
>
>     Using buffered IO this is hard to implement scalably. A naive
>     implementation will write data as it arrives (into the OS buffer),
>     and then call sync (or fsync or whatever). When that call has
>     returned it sends its response back to the client saying "data
>     persisted ok". Problem is this doesn't scale when you have many
>     (possibly many thousands) of connections calling sync for each
>     write. Sync is a heavyweight operation.
>
>     This can be worked around to some extent, by "batching" of fsync
>     calls. E.g. you only sync at most, say, every 10 milliseconds,
>     after which all writes waiting for a sync can return their
>     completion. The problem with this is introduces extra latency to
>     completion for each client connection. It also fairly tricky to code.
>
>
> how about a server system has a RAID controller card with 
> Battery/Flash Backed Write Cache? your "doesn't scale well" statement 
> is true in the case without BBWC, but if BBWC is available, it seems 
> to me it is desirable to keep sending fsync (instead of in batch), and 
> let the RAID controller card to take care of batching. AFAIK, all db 
> server shall have that piece of hardware and that made a huge 
> difference in io performance. *[1] what do you think?
If you have are using a BBWC on a disk, then you will indeed be able to 
sustain a higher fsync rate, than if you had the cache absent/disabled 
(maybe a few thousand per sec), but the OS still has to flush the OS 
buffer to the disk buffer on every fsync which is still a heavyweight 
operation, and you still have to code the logic to batch the fsyncs, do 
the locking, and return the completions when the fsync returns.

Using direct AIO you can avoid the fsync altogether and get better latency.
>
> regards,
> mingfai
>
> *[1] an example of benchmark with/without write cache
> http://blog.a2o.si/2009/06/19/hp-dl380-g5-drive-write-cache-bbwc/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-discuss/attachments/20110728/97f716f8/attachment.html 


More information about the nio-discuss mailing list