Simplifying sequential() / parallel()
Remi Forax
forax at univ-mlv.fr
Fri Mar 22 06:43:29 PDT 2013
On 03/22/2013 02:33 PM, Brian Goetz wrote:
> The problem with stateful lambdas is that, unless one block of code
> has control over the entire pipeline, it is an accident waiting to
> happen.
>
> Let's say you receive a stream as a parameter:
>
> void foo(Stream s) { ... }
>
> and you want to do something that requires a stateful mapper:
>
> void foo(Stream s) {
> s.map(... stateful ...)...
> }
>
> That's a bug already. Because you don't know that the stream you were
> passed in is sequential. But I doubt that people will remember, even
> most of the time, they need to do:
>
> s.sequential().map(... stateful ...)...
>
> instead. Won't happen.
>
> Stateful lambdas introduce the need for non-modular reasoning about
> stream pipelines (who created this? who will consume this? in what
> state was it created?). And, it has all the same self-deception
> problems as thread-safety. People convince themselves "I don't need
> to think about synchronization because no one will ever use this
> object concurrently."
>
> So, while I sympathize with the desire to let people say "I know that
> this entire stream pipeline has been carefully controlled such as to
> not have statefulness distort its results", I think in reality, this
> will quickly turn into "statefulness is OK" in most people's minds.
> With the attended inevitable foot-shooting.
Yes,
recognizing what is stateful and what is stateless is enough
(says the guy the "concurrency in Java" teacher in me :)
+1 for 'each', foreach is already used for the enhanced for loop "for(:)",
in that case, forEachOrdered() should be eachOrdered().
Rémi
>
> On 3/21/2013 9:57 PM, Joe Bowbeer wrote:
>> I'm traveling now and won't be able to respond promptly but this topic
>> has been raised a couple of times already. Feel free to copy and paste
>> my response from previous discussions:)
>>
>> Rephrasing, I'm OK with non-interference but I object to banning
>> stateful in sequential ops.
>>
>> I think there should be a one-one correspondence between any for loop
>> and a sequential forEach.
>>
>> Can you compare your restrictions with those in Scala and Groovy? Scala
>> in particular, because it is more strictly defined, and I'm pretty sure
>> I've combined stateful expressions with functional forms in Scala, to
>> good effect. (One of the benefits of being multi-paradigmatic?)
>>
>> In addition, I'm wary of the new form of forEach. If anything, I'd like
>> its name to be simpler, e.g., each, not longer.
>>
>> Joe
>>
>> On Mar 21, 2013 3:48 PM, "Brian Goetz" <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com>> wrote:
>>
>> Doug and I have been revisiting sequential() and parallel(). I
>> think there's a nice simplification here.
>>
>> The original motivation for sequential() was because we originally
>> had .into(collection), and many collections were not thread-safe so
>> we needed a means to bring the computation back to the current
>> thread. The .sequential() method did that, but it also brought back
>> a constraint of encounter ordering with it, because if people did:
>>
>> stuff.parallel().map(...).__sequential().into(new
>> ArrayList<>());
>>
>> not respecting encounter order would violate the principle of least
>> astonishment.
>>
>> So the original motivation for sequential() was "bring the
>> computation back to the current thread, in order." This was doable,
>> but has a high price -- a full barrier where we buffer the contents
>> of the entire stream before doing any forEach'ing.
>>
>> Most of the time, sequential() only appears right before the
>> terminal operation. But, once implemented, there was no reason to
>> constrain this to appear right before the forEach/into, so we
>> didn't.
>>
>> Once we discovered a need for .parallel(), it made sense it be the
>> dual of .sequential() -- fully unconstrained. And again, the
>> implementation wasn't *that* bad -- better than .sequential(). But
>> again, the most desirable position for .parallel() is right after
>> the source.
>>
>> Then we killed into() and replaced it with reduction, which is a
>> much smarter way of managing ordering. Eliminating half the
>> justification for .sequential().
>>
>> As far as I can tell, the remaining use cases for .sequential() are
>> just modifiers to forEach to constrain it, in order, to the current
>> thread.
>>
>> As in:
>> ints().parallel().filter(i -> isPrime(i))
>> .sequential().forEach(System.__out::println)
>>
>> Which could be replaced by
>> .__forEachSequentialAndOrderedInC__urrentThread(), with a suitably
>> better name. Which could further be simplified to ditch the "in
>> current thread" part by doing some locking in the implementation,
>> which brings us to .forEachOrdered(action). Which nicely
>> complements .collectUnordered, and the two actually stand better
>> with their duals present (reduce is by default ordered; forEach is
>> by default unordered.)
>>
>> The "put it anywhere" behavior of .parallel was completely
>> bootstrapped on the "put it anywhere" nature of .sequential; we
>> never really set out to support transitions in the API.
>>
>> So, pulling the rug out from under the house of cards, I think we
>> can fall back to:
>>
>> 1. Modify semantics of .sequential and .parallel to apply globally
>> to the entire pipeline. This works because pipelines are fully lazy
>> anyway, so we don't commit to seq-ness/par-ness until we hit the
>> terminal op. So they are functional versions of "set the seq/par
>> bit in the source". And that simplifies the specification of
>> seq/par down to a single property of the entire pipeline -- much
>> easier to spec.
>>
>> 2. Add .forEachOrdered. For sequential streams, this is just
>> .forEach. For par streams, we use a lock to avoid concurrent
>> invocation of the lambda, and loosen up the current behavior from
>> "full barrier" to "partial barrier", so that when the next chunk is
>> available, we can start working right away. This is easy to
>> accomplish using the existing AbstractTask machinery.
>>
>>
>> Before we go there, does anyone have use cases for .sequential() /
>> .parallel() that *don't* put the parallel right after the source, or
>> the sequential right before a forEach?
>>
>>
More information about the lambda-libs-spec-experts
mailing list