Simplifying sequential() / parallel()

Fri Mar 22 07:53:02 PDT 2013

On 03/22/2013 03:07 PM, Joe Bowbeer wrote:
>
> Stateful programming has its issues but that ship has already sailed 
> (in Java).
>
> The programs where these new expressions will live are full of state...
>
> With the introduction of streams, programmers and refactoring tools 
> will be introducing the cool new expressions into existing code. 
> (forEach is the groovy guy's for loop, right?)
>
> I don't want to create danger zones in the code where these 
> transformations are accidents waiting to happen. Also think of the 
> code maintainers trying to determine, as they are enhancing and 
> debugging the code, where they are allowed to add state.
>
> Before, the existence of parallel() created a danger zone, but 
> sequential() restored safety. That's an easy rule to understand.
>
> BTW, what are the rules in Scala and groovy?
>
> Joe
>

for Groovy, gpars the groovy parallel library doesn't allow stateful 
closure,
you have too use special constructs like Agent for that.

http://gpars.codehaus.org/

Rémi

> On Mar 22, 2013 9:33 AM, "Brian Goetz" <brian.goetz at oracle.com 
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     The problem with stateful lambdas is that, unless one block of
>     code has control over the entire pipeline, it is an accident
>     waiting to happen.
>
>     Let's say you receive a stream as a parameter:
>
>       void foo(Stream s) { ... }
>
>     and you want to do something that requires a stateful mapper:
>
>       void foo(Stream s) {
>           s.map(... stateful ...)...
>       }
>
>     That's a bug already.  Because you don't know that the stream you
>     were passed in is sequential.  But I doubt that people will
>     remember, even most of the time, they need to do:
>
>           s.sequential().map(... stateful ...)...
>
>     instead.  Won't happen.
>
>     Stateful lambdas introduce the need for non-modular reasoning
>     about stream pipelines (who created this? who will consume this?
>     in what state was it created?).  And, it has all the same
>     self-deception problems as thread-safety.  People convince
>     themselves "I don't need to think about synchronization because no
>     one will ever use this object concurrently."
>
>     So, while I sympathize with the desire to let people say "I know
>     that this entire stream pipeline has been carefully controlled
>     such as to not have statefulness distort its results", I think in
>     reality, this will quickly turn into "statefulness is OK" in most
>     people's minds.  With the attended inevitable foot-shooting.
>
>     On 3/21/2013 9:57 PM, Joe Bowbeer wrote:
>
>         I'm traveling now and won't be able to respond promptly but
>         this topic
>         has been raised a couple of times already. Feel free to copy
>         and paste
>         my response from previous discussions:)
>
>         Rephrasing, I'm OK with non-interference but I object to banning
>         stateful in sequential ops.
>
>         I think there should be a one-one correspondence between any
>         for loop
>         and a sequential forEach.
>
>         Can you compare your restrictions with those in Scala and
>         Groovy? Scala
>         in particular, because it is more strictly defined, and I'm
>         pretty sure
>         I've combined stateful expressions with functional forms in
>         Scala, to
>         good effect. (One of the benefits of being multi-paradigmatic?)
>
>         In addition, I'm wary of the new form of forEach. If anything,
>         I'd like
>         its name to be simpler, e.g., each, not longer.
>
>         Joe
>
>         On Mar 21, 2013 3:48 PM, "Brian Goetz" <brian.goetz at oracle.com
>         <mailto:brian.goetz at oracle.com>
>         <mailto:brian.goetz at oracle.com
>         <mailto:brian.goetz at oracle.com>>> wrote:
>
>             Doug and I have been revisiting sequential() and
>         parallel().  I
>             think there's a nice simplification here.
>
>             The original motivation for sequential() was because we
>         originally
>             had .into(collection), and many collections were not
>         thread-safe so
>             we needed a means to bring the computation back to the current
>             thread.  The .sequential() method did that, but it also
>         brought back
>             a constraint of encounter ordering with it, because if
>         people did:
>
>                 stuff.parallel().map(...).__sequential().into(new
>         ArrayList<>());
>
>             not respecting encounter order would violate the principle
>         of least
>             astonishment.
>
>             So the original motivation for sequential() was "bring the
>             computation back to the current thread, in order."  This
>         was doable,
>             but has a high price -- a full barrier where we buffer the
>         contents
>             of the entire stream before doing any forEach'ing.
>
>             Most of the time, sequential() only appears right before the
>             terminal operation.  But, once implemented, there was no
>         reason to
>             constrain this to appear right before the forEach/into, so
>         we didn't.
>
>             Once we discovered a need for .parallel(), it made sense
>         it be the
>             dual of .sequential() -- fully unconstrained.  And again, the
>             implementation wasn't *that* bad -- better than
>         .sequential().  But
>             again, the most desirable position for .parallel() is
>         right after
>             the source.
>
>             Then we killed into() and replaced it with reduction,
>         which is a
>             much smarter way of managing ordering.  Eliminating half the
>             justification for .sequential().
>
>             As far as I can tell, the remaining use cases for
>         .sequential() are
>             just modifiers to forEach to constrain it, in order, to
>         the current
>             thread.
>
>             As in:
>                ints().parallel().filter(i -> isPrime(i))
>                      .sequential().forEach(System.__out::println)
>
>             Which could be replaced by
>             .__forEachSequentialAndOrderedInC__urrentThread(), with a
>         suitably
>             better name.  Which could further be simplified to ditch
>         the "in
>             current thread" part by doing some locking in the
>         implementation,
>             which brings us to .forEachOrdered(action).  Which nicely
>             complements .collectUnordered, and the two actually stand
>         better
>             with their duals present (reduce is by default ordered;
>         forEach is
>             by default unordered.)
>
>             The "put it anywhere" behavior of .parallel was completely
>             bootstrapped on the "put it anywhere" nature of
>         .sequential; we
>             never really set out to support transitions in the API.
>
>             So, pulling the rug out from under the house of cards, I
>         think we
>             can fall back to:
>
>             1.  Modify semantics of .sequential and .parallel to apply
>         globally
>             to the entire pipeline.  This works because pipelines are
>         fully lazy
>             anyway, so we don't commit to seq-ness/par-ness until we
>         hit the
>             terminal op.  So they are functional versions of "set the
>         seq/par
>             bit in the source".  And that simplifies the specification of
>             seq/par down to a single property of the entire pipeline
>         -- much
>             easier to spec.
>
>             2.  Add .forEachOrdered.  For sequential streams, this is just
>             .forEach.  For par streams, we use a lock to avoid concurrent
>             invocation of the lambda, and loosen up the current
>         behavior from
>             "full barrier" to "partial barrier", so that when the next
>         chunk is
>             available, we can start working right away.  This is easy to
>             accomplish using the existing AbstractTask machinery.
>
>
>             Before we go there, does anyone have use cases for
>         .sequential() /
>             .parallel() that *don't* put the parallel right after the
>         source, or
>             the sequential right before a forEach?
>
>