Effectively final

Rémi Forax forax at univ-mlv.fr
Sat Jul 30 06:04:34 PDT 2011


On 07/30/2011 09:10 AM, Tim Fox wrote:
> Remi,
>
> Please see comments below
>
> On 30/07/2011 00:27, Rémi Forax wrote:
>> Hi Tim,
>> see below
>>
>> On 07/29/2011 08:42 PM, Tim Fox wrote:
>>> Brian,
>>>
>>> Thanks for your reply. Comments below
>>>
>>> On 29/07/2011 19:08, Brian Goetz wrote:
>>>> You are right to have these concerns.
>>>>
>>>> The canonical example is:
>>>>
>>>> int fooSum = 0;
>>>> list.forEach(#{ x ->    sum += x.getFoo() });
>>>>
>>>> You are correct that there are purely sequential use cases that
>>>> benefit from this approach, which are not subject to data races.  (On
>>>> the other hand, it is nearly impossible to write the above
>>>> primitive-using code so that it is not subject to data races in a
>>>> parallel environment.)  We have explored approaches of capturing this
>>>> constraint in the language, so that we could prevent or detect when
>>>> such a "thread-confined" lambda is used from the wrong thread.  While
>>>> these are likely feasible, they add complexity.
>>>>
>>>> Examples like the above have been around for 50+ years.  However, it
>>>> is worth noting that they became popular in the context of a
>>>> sequential, uni-processor world.  Rather than expend energy and
>>>> introduce additional complexity to prop up an aging and increasingly
>>>> irrelevant programming idiom,
>>> I'd have to disagree that this approach is aging, the success of node.js
>>> and ruby event machine are good counter-examples. They both use the
>>> reactor pattern (i.e. a single event loop which executes everything), so
>>> the developer does not have to worry about concurrency concerns. This is
>>> a huge win in terms of simplicity for the developer.
>>>
>>> Frameworks like node and eventmachine scale over cores by spinning up
>>> more processes, not threads (since there's only one event loop per
>>> process). This is less than ideal when you want to share state between
>>> event loops.
>>>
>>> New frameworks like node.x https://github.com/purplefox/node.x (which is
>>> what I am working on) allow multiple event loops per process (typically
>>> one event loop per core), and then partition objects so they are "owned"
>>> by one of the event loops. The framework will then guarantee that all
>>> callbacks on those objects are always executed by the same event loop.
>>>
>>> What you get out of this is the user can write all their code as single
>>> threaded, but the system as a whole scales well over available cores
>>> without having to spin up more processes.
>>>
>>> If the framework can guarantee this code is always executed by the same
>>> thread, it seems wrong to force users to use AtomicReferences (or
>>> whatever) to co-ordinate results from different callbacks.
>> Then you will realize that if the computation that you have to do
>> dominate the read phase and write phase, you will need worker threads
>> (by example, the HTTP parser (even an async one) can take more time
>> than a read or a write)
>> otherwise your server will be stuck waiting because you need to
>> access to a particular thread which is not available (because parsing
>> another request).
> I'm not really sure what you're getting at here. Frameworks like these
> are typically used to write network servers, that support many
> connections, e.g. an HTTP server, the "computation" is in creating the
> response. Nothing should get "stuck". Can you elaborate?

Your HTTP server will deliver services like generating a SHA key,
computing a shortest path, etc.
Sometime the network is the bottleneck but sometime the service
is the bottleneck.

The reactor pattern works well if the network is the bottleneck
not if the service you deliver takes time to be computed.

>>
>> So you will refine again your model saying it's ok to have a context
>> object (a scope)
>> that contains the results of each phases (read, decode, work, encode,
>> write, etc)
>> that is passed from thread to thread because only one thread can access
>> to that context at a time.
> I don't see why that would be necessary.
>> This model restricts side effect to only one object
> It restricts side effects to a set of objects. Typically the set of
> objects is all the objects created by a specific connection. I.e. all
> the objects that comes off a specific connection lives in an "island of
> single-threadedness". Since you typically have many more connections
> than cores, this is how you scale. All callbacks registered in that
> island will subsequently be invoked by the system in the same context
> (i.e. with the same event loop thread). The dev can therefore write
> their code as single threaded.

The question is why your set of objects is stored in the callbacks ?
You have more freedom if you decouple the set of objects and
the callback.
Basically, you only need to ensure that only one thread access
to the set of objects at a time (+ some publication trouble
that comes from the memory model), reusing the same thread
as you do in your framework is a simplification, not a generic model.

If you take a look to the AIO API provided in JDK7,
http://download.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousByteChannel.html#read%28java.nio.ByteBuffer,%20A,%20java.nio.channels.CompletionHandler%29
a read takes a callback *and* an object (here called attachment) that 
encapsulates
your set of objects.
When the callback will be called, you will by example starts a write 
sending the same
attachment or a newly created object computed from the value of the 
current attachment
as a new attachment.
The write may be done by another thread but it's ok because the two 
threads will not access
to the same object at the same time.

So my point is this pattern is better than the reactor pattern (multi or 
not)
because you can transform your data between a read and a write using
a thread-pool. This pattern failed if you modify any state which is 
accessible
not from the attachment but by example bound to the callback
so it's better to use lambda here, because lambda aren't object (read 
has no field)
and doesn't allow to capture mutable local variable.

> In many (most?) cases connections never need to talk to each other,
> (again, e.g. an HTTP server), however, if they do, we allow them to
> communicate via sending messages (kind of like the actor model). Again
> this is done with callbacks, you register a callback with the system and
> get a handle. Given the handle you can send a message to a specific
> callback. The callback is executed in the context of the callee, so
> everything is still nice and single threaded from the PoV of the dev.
>
> Everything is done via callbacks, which always execute on the context of
> the callee. We also provide helpers which allow you to compose the
> results of callbacks in nice ways e.g.
>
> Something like:
>
> int callback1Result;
> int callback2Result;
> Composer.when(callback1,
> callback2).do(#{sendResponse(callback1Result+callback2Result)})
>
> I.e. when both callback1 and callback2 have fired create a response and
> write it back to the client. Note, no thread blocks during this.
>> and
>> you will be cheerful that Java lambda is not able to capture local state :)
>>
>> Note that you can still refine the model above by adding work stealing
>> between all reader (resp. writer) threads and fork/join between worker
>> thread.
> We don't have a distinction between "reader" or "writer" threads.
>
> If you're interested in hearing more about how node.x works (or discuss
> reactor or multi-reactor model in more detail, I suggest the node.x
> forum is the appropriate place, rather than this list. :)
>
> Cheers

cheers,
Rémi




More information about the lambda-dev mailing list