Proposal for Hybrid Threading Model and simpler Async IO

Wed May 7 23:01:05 UTC 2014

Now I understand your approach better and yes I agree, it doesn't make
sense to contribute it to the OpenJDK. However, I really think that we need
a solution for that in the JVM.

It seems there are two levels where that could be implemented:
- On the Java level: This would require to re-implement all blocking IO
Java classes in the standard library with NIO and continuation support.
- On the JNI level: Here we would have to add a non-blocking implementation
for all blocking JNI functions.

Right now I am quite busy with other things, but I would really like to
implement a prototype for the first idea which seems much simpler to me
(based on Hiroshi's work or something like Quasar).

By the way, here you can find Hiroshi's presentation about his
continuations implementation (including some benchmarks):
http://wiki.jvmlangsummit.com/images/2/25/ContinuationInServers.pdf

On Wed, May 7, 2014 at 7:54 PM, Jeremy Manson <jeremymanson at google.com>wrote:

> Our kernel hackers implemented the thread switching, so I can't give you
> too many details.  But, in principle, it isn't complicated - you have a
> stack pointer, a bunch of registers and a program counter associated with
> any given thread, so to do a context switch semi-manually, all you have to
> do is change them to the appropriate value for another thread.  This is
> drastically oversimplified, of course.
>
> One big win in our scenario is to add the ability to state which thread
> you want to switch to.  We can context switch to the thread that *should*
> wake up next (e.g., the next thread to acquire a lock, or the thread that
> will respond to an IO event) very quickly.  This is drastically
> oversimplified, of course.
>
> We already have this in C++.  Programmers love it, because they don't have
> to write asynchronous code, and it scales very well.
>
> To support it in Java, I'm basically making an API that does some of the
> user-level scheduling, and intercepting anywhere that the JVM blocks (for
> example, calls to pthread_cond_wait).  This involves no actual changes to
> the JDK - you just have to write a jsig-like interposer for pthreads and
> things like epoll.  I also have to write a JNI blob of my own and some
> supporting APIs for the user-level scheduling, but again, those things
> don't require JDK support.
>
> I have the basic functionality in place, but I have to do some hardening /
> additional testing.  The major performance / scalability concerns are
> things like the fact that the default thread stack size is 1M, which is
> enormous, so the big win of being able to have lots and lots of threads
> might not be there because of RAM limitations; or the fact that the JDK
> does a lot of spinning before acquiring locks, which is completely
> unnecessary in the case where you switch directly and cheaply to a thread
> when it is that thread's turn to acquire a lock, so I'll have to take it
> out.
>
> I haven't done a lot of testing, partially because my managerial
> responsibilities have increased dramatically recently, and partially
> because priorities shifted a bit.  I'll probably circle back to it towards
> the end of the year.
>
> Since these are changes to libc or the kernel, or additions of non-JDK
> libraries, it doesn't make much sense to contribute them to OpenJDK.  The
> couple of times I've brought up these topics with JDK hackers, I've gotten
> the sense that there is less than full enthusiasm for them (basically, I
> get responses like David's), so I haven't pushed it.
>
> A note on doing this in the JVM: it's much harder!  My colleague Hiroshi
> Yamauchi tried to add continuations in 2010<http://hiroshiyamauchi.blogspot.com/2012_10_01_archive.html>(or so).  He gave a presentation at the JVM languages summit on it.  There
> is a lot of user-controlled thread-local state to deal with the JVM, and
> fixing that up is hard.  Plus, you have to make two compilers and an
> interpreter aware of it.  By contrast, all of the thread-local state in
> libc is stored at (or reached from) the bottom of the stack, and it is
> compiler agnostic.
>
> Jeremy
>
>
>
> On Wed, May 7, 2014 at 12:17 AM, Joel Richard <ri.joel at gmail.com> wrote:
>
>> Hi Jeremy,
>>
>> Thank you very much for sharing this with us. Would you mind to elaborate
>> on your work a little bit further? I am particularly interested in answers
>> for the following questions:
>>
>> How do you save the stack state before executing a pausing call and then
>> be able to resume the same program flow on another thread? What changes
>> have you applied to JNI? Can you tell us already something about the
>> performance and scalability characteristic? Are there already any plans to
>> contribute your work to the OpenJDK or will that remain an internal project?
>>
>> Thanks, Joel
>>
>>
>> On Tue, May 6, 2014 at 11:33 PM, Jeremy Manson <jeremymanson at google.com>wrote:
>>
>>> FWIW, we're looking at doing this internally by implementing the whole
>>> kit and kaboodle in native code.  We're going to intercept all low-level
>>> blocking calls to libc so that they just result in a change to a different
>>> stack.  Requires approximately no Java-level changes (unless you want
>>> control over which stack you switch to, which is an easy addition).
>>>
>>> I spent a fair bit of time working on this last year, but had to
>>> back-burner it for a while in favor of some other work.  There is a *lot*
>>> of demand from our server developers, who all loathe the existing async IO
>>> APIs.  We'll probably circle back to it by the end of the year.
>>>
>>> Jeremy
>>>
>>>
>>> On Tue, May 6, 2014 at 6:15 AM, Florian Weimer <fweimer at redhat.com>wrote:
>>>
>>>> On 05/04/2014 02:19 PM, Joel Richard wrote:
>>>>
>>>>  Right now, InputStream.read(byte b[]) is a blocking method. Hence the
>>>>> native thread waits until the byte array got filled. With my proposal,
>>>>> the
>>>>> underlaying blocking native method (for example
>>>>> java.net.SocketInputStream#socketRead0) would not block the native
>>>>> thread
>>>>> anymore. Instead, it would call the C function with the _async suffix
>>>>> and
>>>>> continue to process another task. As soon the async operation has
>>>>> completed, it can then continue with the first task (maybe even in
>>>>> another
>>>>> native thread).
>>>>>
>>>>
>>>> You cannot do this without completely redesigning JNI.  You cannot
>>>> resume native code on a different native thread than the one it initially
>>>> ran on because that would change the set of thread-local variables, and
>>>> native code is compiled with the assumption that references to thread-local
>>>> variables are stable (relative to the current stack).
>>>>
>>>> On the Java side, you'd likely have to duplicate thread local
>>>> variables, to preserve the current semantics.  Better support for
>>>> coroutines would be nice, but I don't think it's prudent to attempt to
>>>> provide this functionality purely at the JVM layer because of the resulting
>>>> interoperability issues.
>>>>
>>>>
>>>> --
>>>> Florian Weimer / Red Hat Product Security Team
>>>>
>>>
>>>
>>
>