Effectively final

Sat Jul 30 00:10:07 PDT 2011

Remi,

Please see comments below

On 30/07/2011 00:27, Rémi Forax wrote:
> Hi Tim,
> see below
>
> On 07/29/2011 08:42 PM, Tim Fox wrote:
>> Brian,
>>
>> Thanks for your reply. Comments below
>>
>> On 29/07/2011 19:08, Brian Goetz wrote:
>>> You are right to have these concerns.
>>>
>>> The canonical example is:
>>>
>>> int fooSum = 0;
>>> list.forEach(#{ x ->   sum += x.getFoo() });
>>>
>>> You are correct that there are purely sequential use cases that
>>> benefit from this approach, which are not subject to data races.  (On
>>> the other hand, it is nearly impossible to write the above
>>> primitive-using code so that it is not subject to data races in a
>>> parallel environment.)  We have explored approaches of capturing this
>>> constraint in the language, so that we could prevent or detect when
>>> such a "thread-confined" lambda is used from the wrong thread.  While
>>> these are likely feasible, they add complexity.
>>>
>>> Examples like the above have been around for 50+ years.  However, it
>>> is worth noting that they became popular in the context of a
>>> sequential, uni-processor world.  Rather than expend energy and
>>> introduce additional complexity to prop up an aging and increasingly
>>> irrelevant programming idiom,
>> I'd have to disagree that this approach is aging, the success of node.js
>> and ruby event machine are good counter-examples. They both use the
>> reactor pattern (i.e. a single event loop which executes everything), so
>> the developer does not have to worry about concurrency concerns. This is
>> a huge win in terms of simplicity for the developer.
>>
>> Frameworks like node and eventmachine scale over cores by spinning up
>> more processes, not threads (since there's only one event loop per
>> process). This is less than ideal when you want to share state between
>> event loops.
>>
>> New frameworks like node.x https://github.com/purplefox/node.x (which is
>> what I am working on) allow multiple event loops per process (typically
>> one event loop per core), and then partition objects so they are "owned"
>> by one of the event loops. The framework will then guarantee that all
>> callbacks on those objects are always executed by the same event loop.
>>
>> What you get out of this is the user can write all their code as single
>> threaded, but the system as a whole scales well over available cores
>> without having to spin up more processes.
>>
>> If the framework can guarantee this code is always executed by the same
>> thread, it seems wrong to force users to use AtomicReferences (or
>> whatever) to co-ordinate results from different callbacks.
> Then you will realize that if the computation that you have to do
> dominate the read phase and write phase, you will need worker threads
> (by example, the HTTP parser (even an async one) can take more time
> than a read or a write)
> otherwise your server will be stuck waiting because you need to
> access to a particular thread which is not available (because parsing
> another request).
I'm not really sure what you're getting at here. Frameworks like these 
are typically used to write network servers, that support many 
connections, e.g. an HTTP server, the "computation" is in creating the 
response. Nothing should get "stuck". Can you elaborate?
>
>
> So you will refine again your model saying it's ok to have a context
> object (a scope)
> that contains the results of each phases (read, decode, work, encode,
> write, etc)
> that is passed from thread to thread because only one thread can access
> to that context at a time.
I don't see why that would be necessary.
>
> This model restricts side effect to only one object
It restricts side effects to a set of objects. Typically the set of 
objects is all the objects created by a specific connection. I.e. all 
the objects that comes off a specific connection lives in an "island of 
single-threadedness". Since you typically have many more connections 
than cores, this is how you scale. All callbacks registered in that 
island will subsequently be invoked by the system in the same context 
(i.e. with the same event loop thread). The dev can therefore write 
their code as single threaded.

In many (most?) cases connections never need to talk to each other, 
(again, e.g. an HTTP server), however, if they do, we allow them to 
communicate via sending messages (kind of like the actor model). Again 
this is done with callbacks, you register a callback with the system and 
get a handle. Given the handle you can send a message to a specific 
callback. The callback is executed in the context of the callee, so 
everything is still nice and single threaded from the PoV of the dev.

Everything is done via callbacks, which always execute on the context of 
the callee. We also provide helpers which allow you to compose the 
results of callbacks in nice ways e.g.

Something like:

int callback1Result;
int callback2Result;
Composer.when(callback1, 
callback2).do(#{sendResponse(callback1Result+callback2Result)})

I.e. when both callback1 and callback2 have fired create a response and 
write it back to the client. Note, no thread blocks during this.
> and
> you will be cheerful that Java lambda is not able to capture local state :)
>
> Note that you can still refine the model above by adding work stealing
> between all reader (resp. writer) threads and fork/join between worker
> thread.
We don't have a distinction between "reader" or "writer" threads.

If you're interested in hearing more about how node.x works (or discuss 
reactor or multi-reactor model in more detail, I suggest the node.x 
forum is the appropriate place, rather than this list. :)

Cheers