Constructing parallel streams

Mon Dec 10 08:08:18 PST 2012

The only reason is that it may not perform as well as the user expects.

The reason for this is that one of the big performance tricks we use is 
"jamming".  When you do

   foos.filter(...).map(...).reduce(...)

we can do the filtering, mapping, and reducing in a single pass (serial 
or parallel.)  If you do

foos.sequential().filter(...).parallel().map(...).sequential().reduce(...)

then you may be introducing "barriers" in the computation, where 
something has to stop and collect the results before proceeding.  This 
is giving up a lot of the performance benefit of the streams model. 
(Stateful ops, like sorting or limit, generally have a similar effect.)

However, since we don't know anything about what the user is doing in 
those lambdas, it is conceivable that it is still a win.

We do elide sequential/parallel calls if the stream already has that 
orientation (e.g., parallel on an already parallel stream is a no-op.)

Overall I'm mostly in the "don't try to save the user from themselves" 
camp here.   We should document how the model works and let 
performance-sensitive users measure for themselves.  So while it is most 
effective to put the parallel() at the head of the pipe, my distaste for 
having it in the middle is merely mild and overall I can live with it.

On 12/10/2012 11:01 AM, Joe Bowbeer wrote:
> I can easily imagine a pipeline that has alternating
> sequential/parallel/sequential segments.  Is there any reason to
> discourage a programmer from using the parallel/sequential methods to
> express this?
>
> On Dec 10, 2012 7:50 AM, "Brian Goetz" <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>         I don't like users being able to call parallel in the middle of the
>         stream construction.
>
>
>     I don't love it either.  The semantics are perfectly tractible, and
>     the implementation is perfectly straightforward, but the performance
>     is unlikely to be a win in most cases.  (I mentioned earlier we
>     would doc that this really should only be done at the head of the
>     pipeline.)
>
>         I propose to have an interface ParallelizableStream that allows to
>         choose if the user want the sequential or the parallel stream
>         upfront.
>
>
>     Yeah, we investigated this direction first.  Combinatorial
>     explosion: IntParallelizableStream, etc.
>
>     However, this could trivially become a dynamic property of streams
>     (fits easily into the existing stream flags mechanism).  Then only
>     the head streams would have the property, and if you tried to do
>     parallel() farther down the stream, we could ignore it or even throw
>     ISE.
>