Real async file IO on Linux?

Thu Jul 28 10:01:23 PDT 2011

Alan,

I completely & totally agree w/ Tim. I would love to see true non buffered
async IO support instead of the emulation of async IO. The current
implementation reminds me of green threads from way back when, in that they
never actually addressed the issue of fully utilizing a SMP (or in today's
terminology, multi-core) system, it just gave the programmer the illusion
that it did (w/ the added cost of being slow and inefficient), in the same
way that the current async IO implementation is an illusion of true async
IO. I never said anything before because I had assumed that the emulation
mechanism was a fallback mechanism for when async IO support was not
available on a platform. I did not realize, until this thread started, that
platform async IO support meant the buffered case to you.

Dane Foster
CTO & Chief Architect
Reservation Systems Online, LLC.

On Thu, Jul 28, 2011 at 6:27 AM, Tim Fox <timvolpe at gmail.com> wrote:

>  On 28/07/2011 07:25, Mingfai wrote:
>
>
>
>  Using buffered IO this is hard to implement scalably. A naive
>> implementation will write data as it arrives (into the OS buffer), and then
>> call sync (or fsync or whatever). When that call has returned it sends its
>> response back to the client saying "data persisted ok". Problem is this
>> doesn't scale when you have many (possibly many thousands) of connections
>> calling sync for each write. Sync is a heavyweight operation.
>>
>> This can be worked around to some extent, by "batching" of fsync calls.
>> E.g. you only sync at most, say, every 10 milliseconds, after which all
>> writes waiting for a sync can return their completion. The problem with this
>> is introduces extra latency to completion for each client connection. It
>> also fairly tricky to code.
>>
>>
> how about a server system has a RAID controller card with Battery/Flash
> Backed Write Cache? your "doesn't scale well" statement is true in the case
> without BBWC, but if BBWC is available, it seems to me it is desirable to
> keep sending fsync (instead of in batch), and let the RAID controller card
> to take care of batching. AFAIK, all db server shall have that piece of
> hardware and that made a huge difference in io performance. *[1] what do you
> think?
>
> If you have are using a BBWC on a disk, then you will indeed be able to
> sustain a higher fsync rate, than if you had the cache absent/disabled
> (maybe a few thousand per sec), but the OS still has to flush the OS buffer
> to the disk buffer on every fsync which is still a heavyweight operation,
> and you still have to code the logic to batch the fsyncs, do the locking,
> and return the completions when the fsync returns.
>
> Using direct AIO you can avoid the fsync altogether and get better latency.
>
>
> regards,
> mingfai
>
> *[1] an example of benchmark with/without write cache
> http://blog.a2o.si/2009/06/19/hp-dl380-g5-drive-write-cache-bbwc/
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-discuss/attachments/20110728/620ec85e/attachment.html