Differentiate synchronous and asynchronous error of an AIO operation

Fri Jun 10 16:41:48 PDT 2011

Greetings NIO team, kudos for your hard work and excellent results!

I have a problem. I was very surprised by the fact that a completion
handler may be invoked directly by the initiating thread. This is
quite strange, and it's not trivial to write a thread safe handler to
handle it properly. I think it would be much better that, if an IO
operation completes immediately, the operation throws an exception,
instead of invoking the handler directly.

I know it's very late and there's no much chance for API change, but
I'll record my thoughts here for reference.

When an AIO operation is issued at OS level, it may complete
synchronously (due to immediate error), or it may complete
asynchronously (error or success). The Java methods however don't
syntactically differentiate the two cases. One handler is required to
handle both possibilities.

    void operation( handler )

However there's a difference in execution: in the sync case, the
handler may be invoked immediately inside operation() frame, before
the method returns.

This is confusing on many levels, let's examine how concurrency is affected.

    handler1
        onEvent()
            ...
            operation( handler2 )
            ...

In any nontrivial application, handler1 and handler2 must share some
mutable variables. Access to these variables must be properly
synchronized, because Java AIO doesn't specify any stronger
concurrency semantics(unlike e.g. Swing event model). Here's a
seemingly innocent example:

    newConnection() {
        init();
        read( readHandler );
    }

The read() is the last statement of the method. It would seem that
synchronization is not needed - certainly the readHandler will be
invoked after init(). But without synchronization, there is no
"happens-before" guarantee - when readHandler is invoked, it may see
none of the writes from init().

So there must be adequate synchronizations in any Java AIO program,
which deserves more public awareness. That does not require much work
though, usually just associate a lock with each channel. (For fully
duplex applications more locks are needed.)

Now, back to handler1 and handler2. Their methods are now protected by
the lock. That's still not enough, because, surprise surprise,
handler2 may be invoked within the frame of handler1. It's natural for
handler1 to think that it owns the lock therefore the exclusive access
to system state, overlooking the case that handler2 sneaks in the same
frame, messing with the same mutables.

handler2 may invoke handler3 and so on, in the same frame - or not. If
we don't want to drive ourselves insane by tracking all the
intermediate states, we must follow this principle: before calling any
operation(), always bring system(wrt a channel) to a consistent state;
after calling the operation(), we must abandon previous knowledge of
the system state. So in principle we should always code in this
pattern:

    lock
    ....
    // invariant must hold here
    operation( handler2 )
    // invariant may be updated
    ...
    unlock

That pattern is very unfamiliar and inconvenient. Ordinarily we only
need to preserve invariant at the time of unlock, while inside
lock-unlock block we can put system in an inconsistent state,
comfortably knowing that others are not accessing it.

So we don't want to following that coding pattern, it is too
excessive. Actually we know the only time that handler2 can be invoked
in handler1's frame is when operation() returns an
immediate/synchronous error. So let's adopt a different and simpler
principle - whenever handler2 is invoked for such reason, handler2
doesn't read/write any system state; instead it somehow notifies
handler1 about the error, and let handler1 handle it.

    lock
    ...
    operation( handler2 )
    if there_was_a_sync_error
        ...
    ...
    unlock

This is simple, familiar, safe and correct.

On top of the existing Java AIO API, we can implement wrapper methods
that detect attempt of synchronous handler invocation for error, and
throw that error to the caller. The handler will only be invoked for
asynchronous completions, in a new frame.

    void w_operation( handler ) throws SynchronousError

    // usage
    lock
    ...
    try{
        w_operation( handler );
    }catch(SynchronousError e){
        error = e.getCause());
        ...
    }
    ...
    unlock

Now, let's look back, and see how we got here. For all the reasons
listed above, it's clear that we have to always use the w_operation()
version. The original operation() is extremely difficult to use
correctly. (Or maybe it's just me - if you know a safe and simple way
to use operation() directly, do tell.)

If I'm correct, then we should really change the signature of
operation() to that of w_operation().

It appears to me that, the current design is out of some economic
concern - why not abstract the two error cases into one kind, and let
the one handler deal with it? But the abstraction leaks badly - you
have no choice but to differentiate them by different modes of
execution, and this difference must be known by the programmer, and
the programmer then must translate this difference back to 2 different
error cases, and finally de-abstract the abstraction. The economic
motive turns out to be very costly for everybody!

If we simply throw the synchronous error instead of calling the
handler in the same frame, it'll drastically simplify both the JDK
code and the client code. Most importantly, it's much easier to
understand.

- Zhong Yu