stream() / parallel()
Remi Forax
forax at univ-mlv.fr
Tue Nov 27 14:39:43 PST 2012
On 11/27/2012 07:39 PM, Brian Goetz wrote:
> The "bun" operations of stream() and parallel() on Collections and
> friends (which currently live in Streamable, but I am not sure that
> Streamable carries its weight) work acceptably well for collections:
>
> c.stream().ops(...)
> and
> c.parallel().ops(...)
>
> The same thing works OK with the stream factories in Streams (which is
> intended for use by libraries, not end-users):
>
> Stream.stream(spliterator, flags)
> Stream.parallel(spliterator, flags)
> Arrays.stream(array)
> Arrays.parallel(array)
>
> But we start to see friction in the other places where we surface
> streams:
>
> IntStream String.chars()
> IntStream intRange(int start, int limit)
> Stream<String> BufferedReader.lines()
> <T> Stream<T> Streams.iterate(T seed, UnaryOperator<T> op)
>
> All of these are, currently, sequential streams. But it seems an
> unfortunate limitation that they are forced to be sequential.
>
> Duplicating these methods with parallel versions (i.e.,
> String.charsAsParallelStream) is possible but seems kind of yucky.
>
>
> Recent improvements in the Stream implementation (including the
> strictures against "re-use") have created another possibility: an
> explicit parallel() "op" on Streams.
>
> If this appears at the head of the stream, the performance cost is
> almost zero (a few extra garbage objects at setup time), and on a
> stream that is already parallel, it would be a no-op ("if (isParallel)
> return this").
>
> At the use site, it would look like:
>
> String.chars().parallel().forEach(...);
> or
> int[] mapped
> = Streams.intRange(0, 1000000).parallel().map(...).toArray();
>
> Collections and other things that currently implement Streamable could
> additionally offer a convenience method to fuse stream+parallel, which
> is both slightly more compact and slightly more efficient:
>
> c.stream()...
> c.parallelStream()...
>
> (The goal here is not to encourage fine-grained "transitions" between
> parallel and sequential; this is likely to be a performance loss.)
>
The main issue is that you force all API developpers that want to
implement a non parallel stream to have a good answer when user will
call parallel().
Given that not all sequential stream can be parallelized, what is the
semantics of Stream.parallel().
Rémi
More information about the lambda-libs-spec-observers
mailing list