Parallelism cost function

Sam Pullara spullara at gmail.com
Tue Jan 28 18:46:26 PST 2014


Just looking at the Javadocs:

——
Parallelism

Processing elements with an explicit for-loop is inherently serial. Streams facilitate parallel execution by reframing the computation as a pipeline of aggregate operations, rather than as imperative operations on each individual element. All streams operations can execute either in serial or in parallel. The stream implementations in the JDK create serial streams unless parallelism is explicitly requested. For example, Collection has methods Collection.stream() andCollection.parallelStream(), which produce sequential and parallel streams respectively; other stream-bearing methods such as IntStream.range(int, int) produce sequential streams but these streams can be efficiently parallelized by invoking their BaseStream.parallel() method. To execute the prior "sum of weights of widgets" query in parallel, we would do:


     int sumOfWeights = widgets.
parallelStream()

                               .filter(b -> b.getColor() == RED)
                               .mapToInt(b -> b.getWeight())
                               .sum();
 

The only difference between the serial and parallel versions of this example is the creation of the initial stream, using "parallelStream()" instead of "stream()". When the terminal operation is initiated, the stream pipeline is executed sequentially or in parallel depending on the orientation of the stream on which it is invoked. Whether a stream will execute in serial or parallel can be determined with the isParallel() method, and the orientation of a stream can be modified with the BaseStream.sequential() and BaseStream.parallel() operations. When the terminal operation is initiated, the stream pipeline is executed sequentially or in parallel depending on the mode of the stream on which it is invoked.

Except for operations identified as explicitly nondeterministic, such as findAny(), whether a stream executes sequentially or in parallel should not change the result of the computation.

Most stream operations accept parameters that describe user-specified behavior, which are often lambda expressions. To preserve correct behavior, thesebehavioral parameters must be non-interfering, and in most cases must be stateless. Such parameters are always instances of a functional interface such asFunction, and are often lambda expressions or method references.
———

I  think that we should have a lot more information here about when it is appropriate to use the parallelStream() call unless we are going to make sure that it executes inappropriate workloads sequentially. I’d hate to have a generation of Java programmers randomly adding .parallelStream() to all their Streams just because they think it will always be faster.

Sam


More information about the lambda-dev mailing list