Simplifying sequential() / parallel()
Joe Bowbeer
joe.bowbeer at gmail.com
Fri Mar 22 08:09:56 PDT 2013
I'm just asking about the rules for each() in groovy. gpars is special
purpose, akin to adding parallel() to ones code.
On Mar 22, 2013 10:57 AM, "Remi Forax" <forax at univ-mlv.fr> wrote:
> On 03/22/2013 03:07 PM, Joe Bowbeer wrote:
>
>>
>> Stateful programming has its issues but that ship has already sailed (in
>> Java).
>>
>> The programs where these new expressions will live are full of state...
>>
>> With the introduction of streams, programmers and refactoring tools will
>> be introducing the cool new expressions into existing code. (forEach is the
>> groovy guy's for loop, right?)
>>
>> I don't want to create danger zones in the code where these
>> transformations are accidents waiting to happen. Also think of the code
>> maintainers trying to determine, as they are enhancing and debugging the
>> code, where they are allowed to add state.
>>
>> Before, the existence of parallel() created a danger zone, but
>> sequential() restored safety. That's an easy rule to understand.
>>
>> BTW, what are the rules in Scala and groovy?
>>
>> Joe
>>
>>
> for Groovy, gpars the groovy parallel library doesn't allow stateful
> closure,
> you have too use special constructs like Agent for that.
>
> http://gpars.codehaus.org/
>
> Rémi
>
> On Mar 22, 2013 9:33 AM, "Brian Goetz" <brian.goetz at oracle.com <mailto:
>> brian.goetz at oracle.com**>> wrote:
>>
>> The problem with stateful lambdas is that, unless one block of
>> code has control over the entire pipeline, it is an accident
>> waiting to happen.
>>
>> Let's say you receive a stream as a parameter:
>>
>> void foo(Stream s) { ... }
>>
>> and you want to do something that requires a stateful mapper:
>>
>> void foo(Stream s) {
>> s.map(... stateful ...)...
>> }
>>
>> That's a bug already. Because you don't know that the stream you
>> were passed in is sequential. But I doubt that people will
>> remember, even most of the time, they need to do:
>>
>> s.sequential().map(... stateful ...)...
>>
>> instead. Won't happen.
>>
>> Stateful lambdas introduce the need for non-modular reasoning
>> about stream pipelines (who created this? who will consume this?
>> in what state was it created?). And, it has all the same
>> self-deception problems as thread-safety. People convince
>> themselves "I don't need to think about synchronization because no
>> one will ever use this object concurrently."
>>
>> So, while I sympathize with the desire to let people say "I know
>> that this entire stream pipeline has been carefully controlled
>> such as to not have statefulness distort its results", I think in
>> reality, this will quickly turn into "statefulness is OK" in most
>> people's minds. With the attended inevitable foot-shooting.
>>
>> On 3/21/2013 9:57 PM, Joe Bowbeer wrote:
>>
>> I'm traveling now and won't be able to respond promptly but
>> this topic
>> has been raised a couple of times already. Feel free to copy
>> and paste
>> my response from previous discussions:)
>>
>> Rephrasing, I'm OK with non-interference but I object to banning
>> stateful in sequential ops.
>>
>> I think there should be a one-one correspondence between any
>> for loop
>> and a sequential forEach.
>>
>> Can you compare your restrictions with those in Scala and
>> Groovy? Scala
>> in particular, because it is more strictly defined, and I'm
>> pretty sure
>> I've combined stateful expressions with functional forms in
>> Scala, to
>> good effect. (One of the benefits of being multi-paradigmatic?)
>>
>> In addition, I'm wary of the new form of forEach. If anything,
>> I'd like
>> its name to be simpler, e.g., each, not longer.
>>
>> Joe
>>
>> On Mar 21, 2013 3:48 PM, "Brian Goetz" <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>
>> <mailto:brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>>> wrote:
>>
>> Doug and I have been revisiting sequential() and
>> parallel(). I
>> think there's a nice simplification here.
>>
>> The original motivation for sequential() was because we
>> originally
>> had .into(collection), and many collections were not
>> thread-safe so
>> we needed a means to bring the computation back to the current
>> thread. The .sequential() method did that, but it also
>> brought back
>> a constraint of encounter ordering with it, because if
>> people did:
>>
>> stuff.parallel().map(...).__**sequential().into(new
>> ArrayList<>());
>>
>> not respecting encounter order would violate the principle
>> of least
>> astonishment.
>>
>> So the original motivation for sequential() was "bring the
>> computation back to the current thread, in order." This
>> was doable,
>> but has a high price -- a full barrier where we buffer the
>> contents
>> of the entire stream before doing any forEach'ing.
>>
>> Most of the time, sequential() only appears right before the
>> terminal operation. But, once implemented, there was no
>> reason to
>> constrain this to appear right before the forEach/into, so
>> we didn't.
>>
>> Once we discovered a need for .parallel(), it made sense
>> it be the
>> dual of .sequential() -- fully unconstrained. And again, the
>> implementation wasn't *that* bad -- better than
>> .sequential(). But
>> again, the most desirable position for .parallel() is
>> right after
>> the source.
>>
>> Then we killed into() and replaced it with reduction,
>> which is a
>> much smarter way of managing ordering. Eliminating half the
>> justification for .sequential().
>>
>> As far as I can tell, the remaining use cases for
>> .sequential() are
>> just modifiers to forEach to constrain it, in order, to
>> the current
>> thread.
>>
>> As in:
>> ints().parallel().filter(i -> isPrime(i))
>> .sequential().forEach(System._**_out::println)
>>
>> Which could be replaced by
>> .__**forEachSequentialAndOrderedInC**__urrentThread(), with a
>> suitably
>> better name. Which could further be simplified to ditch
>> the "in
>> current thread" part by doing some locking in the
>> implementation,
>> which brings us to .forEachOrdered(action). Which nicely
>> complements .collectUnordered, and the two actually stand
>> better
>> with their duals present (reduce is by default ordered;
>> forEach is
>> by default unordered.)
>>
>> The "put it anywhere" behavior of .parallel was completely
>> bootstrapped on the "put it anywhere" nature of
>> .sequential; we
>> never really set out to support transitions in the API.
>>
>> So, pulling the rug out from under the house of cards, I
>> think we
>> can fall back to:
>>
>> 1. Modify semantics of .sequential and .parallel to apply
>> globally
>> to the entire pipeline. This works because pipelines are
>> fully lazy
>> anyway, so we don't commit to seq-ness/par-ness until we
>> hit the
>> terminal op. So they are functional versions of "set the
>> seq/par
>> bit in the source". And that simplifies the specification of
>> seq/par down to a single property of the entire pipeline
>> -- much
>> easier to spec.
>>
>> 2. Add .forEachOrdered. For sequential streams, this is just
>> .forEach. For par streams, we use a lock to avoid concurrent
>> invocation of the lambda, and loosen up the current
>> behavior from
>> "full barrier" to "partial barrier", so that when the next
>> chunk is
>> available, we can start working right away. This is easy to
>> accomplish using the existing AbstractTask machinery.
>>
>>
>> Before we go there, does anyone have use cases for
>> .sequential() /
>> .parallel() that *don't* put the parallel right after the
>> source, or
>> the sequential right before a forEach?
>>
>>
>>
>
More information about the lambda-libs-spec-observers
mailing list