Simplifying sequential() / parallel()

Thu Mar 21 18:57:37 PDT 2013

I'm traveling now and won't be able to respond promptly but this topic has
been raised a couple of times already. Feel free to copy and paste my
response from previous discussions:)

Rephrasing, I'm OK with non-interference but I object to banning stateful
in sequential ops.

I think there should be a one-one correspondence between any for loop and a
sequential forEach.

Can you compare your restrictions with those in Scala and Groovy? Scala in
particular, because it is more strictly defined, and I'm pretty sure I've
combined stateful expressions with functional forms in Scala, to good
effect. (One of the benefits of being multi-paradigmatic?)

In addition, I'm wary of the new form of forEach. If anything, I'd like its
name to be simpler, e.g., each, not longer.

Joe
 On Mar 21, 2013 3:48 PM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> Doug and I have been revisiting sequential() and parallel().  I think
> there's a nice simplification here.
>
> The original motivation for sequential() was because we originally had
> .into(collection), and many collections were not thread-safe so we needed a
> means to bring the computation back to the current thread.  The
> .sequential() method did that, but it also brought back a constraint of
> encounter ordering with it, because if people did:
>
>    stuff.parallel().map(...).**sequential().into(new ArrayList<>());
>
> not respecting encounter order would violate the principle of least
> astonishment.
>
> So the original motivation for sequential() was "bring the computation
> back to the current thread, in order."  This was doable, but has a high
> price -- a full barrier where we buffer the contents of the entire stream
> before doing any forEach'ing.
>
> Most of the time, sequential() only appears right before the terminal
> operation.  But, once implemented, there was no reason to constrain this to
> appear right before the forEach/into, so we didn't.
>
> Once we discovered a need for .parallel(), it made sense it be the dual of
> .sequential() -- fully unconstrained.  And again, the implementation wasn't
> *that* bad -- better than .sequential().  But again, the most desirable
> position for .parallel() is right after the source.
>
> Then we killed into() and replaced it with reduction, which is a much
> smarter way of managing ordering.  Eliminating half the justification for
> .sequential().
>
> As far as I can tell, the remaining use cases for .sequential() are just
> modifiers to forEach to constrain it, in order, to the current thread.
>
> As in:
>   ints().parallel().filter(i -> isPrime(i))
>         .sequential().forEach(System.**out::println)
>
> Which could be replaced by .**forEachSequentialAndOrderedInC**urrentThread(),
> with a suitably better name.  Which could further be simplified to ditch
> the "in current thread" part by doing some locking in the implementation,
> which brings us to .forEachOrdered(action).  Which nicely complements
> .collectUnordered, and the two actually stand better with their duals
> present (reduce is by default ordered; forEach is by default unordered.)
>
> The "put it anywhere" behavior of .parallel was completely bootstrapped on
> the "put it anywhere" nature of .sequential; we never really set out to
> support transitions in the API.
>
> So, pulling the rug out from under the house of cards, I think we can fall
> back to:
>
> 1.  Modify semantics of .sequential and .parallel to apply globally to the
> entire pipeline.  This works because pipelines are fully lazy anyway, so we
> don't commit to seq-ness/par-ness until we hit the terminal op.  So they
> are functional versions of "set the seq/par bit in the source".  And that
> simplifies the specification of seq/par down to a single property of the
> entire pipeline -- much easier to spec.
>
> 2.  Add .forEachOrdered.  For sequential streams, this is just .forEach.
>  For par streams, we use a lock to avoid concurrent invocation of the
> lambda, and loosen up the current behavior from "full barrier" to "partial
> barrier", so that when the next chunk is available, we can start working
> right away.  This is easy to accomplish using the existing AbstractTask
> machinery.
>
>
> Before we go there, does anyone have use cases for .sequential() /
> .parallel() that *don't* put the parallel right after the source, or the
> sequential right before a forEach?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130321/0821d0f1/attachment-0001.html