Simplifying sequential() / parallel()

Fri Mar 22 08:09:56 PDT 2013

I'm just asking about the rules for each() in groovy. gpars is special
purpose, akin to adding parallel() to ones code.
On Mar 22, 2013 10:57 AM, "Remi Forax" <forax at univ-mlv.fr> wrote:

> On 03/22/2013 03:07 PM, Joe Bowbeer wrote:
>
>>
>> Stateful programming has its issues but that ship has already sailed (in
>> Java).
>>
>> The programs where these new expressions will live are full of state...
>>
>> With the introduction of streams, programmers and refactoring tools will
>> be introducing the cool new expressions into existing code. (forEach is the
>> groovy guy's for loop, right?)
>>
>> I don't want to create danger zones in the code where these
>> transformations are accidents waiting to happen. Also think of the code
>> maintainers trying to determine, as they are enhancing and debugging the
>> code, where they are allowed to add state.
>>
>> Before, the existence of parallel() created a danger zone, but
>> sequential() restored safety. That's an easy rule to understand.
>>
>> BTW, what are the rules in Scala and groovy?
>>
>> Joe
>>
>>
> for Groovy, gpars the groovy parallel library doesn't allow stateful
> closure,
> you have too use special constructs like Agent for that.
>
> http://gpars.codehaus.org/
>
> Rémi
>
>  On Mar 22, 2013 9:33 AM, "Brian Goetz" <brian.goetz at oracle.com <mailto:
>> brian.goetz at oracle.com**>> wrote:
>>
>>     The problem with stateful lambdas is that, unless one block of
>>     code has control over the entire pipeline, it is an accident
>>     waiting to happen.
>>
>>     Let's say you receive a stream as a parameter:
>>
>>       void foo(Stream s) { ... }
>>
>>     and you want to do something that requires a stateful mapper:
>>
>>       void foo(Stream s) {
>>           s.map(... stateful ...)...
>>       }
>>
>>     That's a bug already.  Because you don't know that the stream you
>>     were passed in is sequential.  But I doubt that people will
>>     remember, even most of the time, they need to do:
>>
>>           s.sequential().map(... stateful ...)...
>>
>>     instead.  Won't happen.
>>
>>     Stateful lambdas introduce the need for non-modular reasoning
>>     about stream pipelines (who created this? who will consume this?
>>     in what state was it created?).  And, it has all the same
>>     self-deception problems as thread-safety.  People convince
>>     themselves "I don't need to think about synchronization because no
>>     one will ever use this object concurrently."
>>
>>     So, while I sympathize with the desire to let people say "I know
>>     that this entire stream pipeline has been carefully controlled
>>     such as to not have statefulness distort its results", I think in
>>     reality, this will quickly turn into "statefulness is OK" in most
>>     people's minds.  With the attended inevitable foot-shooting.
>>
>>     On 3/21/2013 9:57 PM, Joe Bowbeer wrote:
>>
>>         I'm traveling now and won't be able to respond promptly but
>>         this topic
>>         has been raised a couple of times already. Feel free to copy
>>         and paste
>>         my response from previous discussions:)
>>
>>         Rephrasing, I'm OK with non-interference but I object to banning
>>         stateful in sequential ops.
>>
>>         I think there should be a one-one correspondence between any
>>         for loop
>>         and a sequential forEach.
>>
>>         Can you compare your restrictions with those in Scala and
>>         Groovy? Scala
>>         in particular, because it is more strictly defined, and I'm
>>         pretty sure
>>         I've combined stateful expressions with functional forms in
>>         Scala, to
>>         good effect. (One of the benefits of being multi-paradigmatic?)
>>
>>         In addition, I'm wary of the new form of forEach. If anything,
>>         I'd like
>>         its name to be simpler, e.g., each, not longer.
>>
>>         Joe
>>
>>         On Mar 21, 2013 3:48 PM, "Brian Goetz" <brian.goetz at oracle.com
>>         <mailto:brian.goetz at oracle.com**>
>>         <mailto:brian.goetz at oracle.com
>>         <mailto:brian.goetz at oracle.com**>>> wrote:
>>
>>             Doug and I have been revisiting sequential() and
>>         parallel().  I
>>             think there's a nice simplification here.
>>
>>             The original motivation for sequential() was because we
>>         originally
>>             had .into(collection), and many collections were not
>>         thread-safe so
>>             we needed a means to bring the computation back to the current
>>             thread.  The .sequential() method did that, but it also
>>         brought back
>>             a constraint of encounter ordering with it, because if
>>         people did:
>>
>>                 stuff.parallel().map(...).__**sequential().into(new
>>         ArrayList<>());
>>
>>             not respecting encounter order would violate the principle
>>         of least
>>             astonishment.
>>
>>             So the original motivation for sequential() was "bring the
>>             computation back to the current thread, in order."  This
>>         was doable,
>>             but has a high price -- a full barrier where we buffer the
>>         contents
>>             of the entire stream before doing any forEach'ing.
>>
>>             Most of the time, sequential() only appears right before the
>>             terminal operation.  But, once implemented, there was no
>>         reason to
>>             constrain this to appear right before the forEach/into, so
>>         we didn't.
>>
>>             Once we discovered a need for .parallel(), it made sense
>>         it be the
>>             dual of .sequential() -- fully unconstrained.  And again, the
>>             implementation wasn't *that* bad -- better than
>>         .sequential().  But
>>             again, the most desirable position for .parallel() is
>>         right after
>>             the source.
>>
>>             Then we killed into() and replaced it with reduction,
>>         which is a
>>             much smarter way of managing ordering.  Eliminating half the
>>             justification for .sequential().
>>
>>             As far as I can tell, the remaining use cases for
>>         .sequential() are
>>             just modifiers to forEach to constrain it, in order, to
>>         the current
>>             thread.
>>
>>             As in:
>>                ints().parallel().filter(i -> isPrime(i))
>>                      .sequential().forEach(System._**_out::println)
>>
>>             Which could be replaced by
>>             .__**forEachSequentialAndOrderedInC**__urrentThread(), with a
>>         suitably
>>             better name.  Which could further be simplified to ditch
>>         the "in
>>             current thread" part by doing some locking in the
>>         implementation,
>>             which brings us to .forEachOrdered(action).  Which nicely
>>             complements .collectUnordered, and the two actually stand
>>         better
>>             with their duals present (reduce is by default ordered;
>>         forEach is
>>             by default unordered.)
>>
>>             The "put it anywhere" behavior of .parallel was completely
>>             bootstrapped on the "put it anywhere" nature of
>>         .sequential; we
>>             never really set out to support transitions in the API.
>>
>>             So, pulling the rug out from under the house of cards, I
>>         think we
>>             can fall back to:
>>
>>             1.  Modify semantics of .sequential and .parallel to apply
>>         globally
>>             to the entire pipeline.  This works because pipelines are
>>         fully lazy
>>             anyway, so we don't commit to seq-ness/par-ness until we
>>         hit the
>>             terminal op.  So they are functional versions of "set the
>>         seq/par
>>             bit in the source".  And that simplifies the specification of
>>             seq/par down to a single property of the entire pipeline
>>         -- much
>>             easier to spec.
>>
>>             2.  Add .forEachOrdered.  For sequential streams, this is just
>>             .forEach.  For par streams, we use a lock to avoid concurrent
>>             invocation of the lambda, and loosen up the current
>>         behavior from
>>             "full barrier" to "partial barrier", so that when the next
>>         chunk is
>>             available, we can start working right away.  This is easy to
>>             accomplish using the existing AbstractTask machinery.
>>
>>
>>             Before we go there, does anyone have use cases for
>>         .sequential() /
>>             .parallel() that *don't* put the parallel right after the
>>         source, or
>>             the sequential right before a forEach?
>>
>>
>>
>