stream() / parallel()

Tue Nov 27 14:39:43 PST 2012

On 11/27/2012 07:39 PM, Brian Goetz wrote:
> The "bun" operations of stream() and parallel() on Collections and 
> friends (which currently live in Streamable, but I am not sure that 
> Streamable carries its weight) work acceptably well for collections:
>
>   c.stream().ops(...)
> and
>   c.parallel().ops(...)
>
> The same thing works OK with the stream factories in Streams (which is 
> intended for use by libraries, not end-users):
>
>   Stream.stream(spliterator, flags)
>   Stream.parallel(spliterator, flags)
>   Arrays.stream(array)
>   Arrays.parallel(array)
>
> But we start to see friction in the other places where we surface 
> streams:
>
>   IntStream String.chars()
>   IntStream intRange(int start, int limit)
>   Stream<String> BufferedReader.lines()
>   <T> Stream<T> Streams.iterate(T seed, UnaryOperator<T> op)
>
> All of these are, currently, sequential streams.  But it seems an 
> unfortunate limitation that they are forced to be sequential.
>
> Duplicating these methods with parallel versions (i.e., 
> String.charsAsParallelStream) is possible but seems kind of yucky.
>
>
> Recent improvements in the Stream implementation (including the 
> strictures against "re-use") have created another possibility: an 
> explicit parallel() "op" on Streams.
>
> If this appears at the head of the stream, the performance cost is 
> almost zero (a few extra garbage objects at setup time), and on a 
> stream that is already parallel, it would be a no-op ("if (isParallel) 
> return this").
>
> At the use site, it would look like:
>
>   String.chars().parallel().forEach(...);
> or
>   int[] mapped
>     = Streams.intRange(0, 1000000).parallel().map(...).toArray();
>
> Collections and other things that currently implement Streamable could 
> additionally offer a convenience method to fuse stream+parallel, which 
> is both slightly more compact and slightly more efficient:
>
>   c.stream()...
>   c.parallelStream()...
>
> (The goal here is not to encourage fine-grained "transitions" between 
> parallel and sequential; this is likely to be a performance loss.)
>

The main issue is that you force all API developpers that want to 
implement a non parallel stream to have a good answer when user will 
call parallel().
Given that not all sequential stream can be parallelized, what is the 
semantics of Stream.parallel().

Rémi