Regression (b88): OOM on parallel limit operation
Brian Goetz
brian.goetz at oracle.com
Fri May 10 10:43:05 PDT 2013
> public class OOM {
> public static void main(String... ignored) {
>
> // prints 200_000
> System.out.println(java.util.stream.LongStream.iterate(1L, n -> n + 1L)
> .filter(l -> l % 100 == 0).limit(200_000).count());
>
> // prints 100_000
> System.out.println(java.util.stream.LongStream.iterate(1L, n -> n + 1L)
> .parallel()
> .filter(l -> l % 100 == 0).limit(100_000).count());
>
> // immediate OOM
> System.out.println(java.util.stream.LongStream.iterate(1L, n -> n + 1L)
> .parallel()
> .filter(l -> l % 100 == 0).limit(200_000).count());
> }
> }
There are definitely things we need to tweak here. But realistically,
there are also things we just need to document, and count on users to
not do silly things. The above examples are, sadly, silly, though this
is in large part our fault for having not enough documentation about
what is going on to make the silliness more obvious.
There are two interacting factors that make this case silly. The first
is generating a stream with iteration, which is sequential and for which
parallelism can only be reconstructed badly.
If a stream source is iterative, the split operation must be some form
of (some, rest), and has very little chance of being balanced since you
have no idea how many elements are on the right side. This means that
we generate right-heavy computation trees, which in the worst case look
like:
(a, (b, (c, (d, ....))))
which is far far worse than sequential, because the computation is still
fundamentally sequential but now you're additionally multiplying the
per-element work by a lot (because of the task and splitting overhead).
We can do somewhat better, by splitting off more than one at a time,
but ultimately we're still stuck with heavily lopsided and deep trees.
On the other hand, if your stream were generated by a range, that splits
evenly (generating balanced trees) and preserves sizing information.
This generates far better behaved parallel computation trees. It would
be great if the library could "just handle it" (and the library does
just handle a lot), but when things start to interact with challenged
ops like limit(), there's really no answer except "the user has to know
what he's doing."
The second factor is in the nature of limit(), which is constrained to
yield its results in encounter order. Encounter-order-constrained
parallel operations are always iffy; you want to use parallelism for
cases like:
source.filter(expensiveFn).limit(N)
to do the filtering in parallel, but the constraint to encounter order
(assuming the source is ordered) is a killer. Because it means you have
to buffer the whole thing until you've found at least N.
Combining these two -- pathologically unbalanced source with the need to
buffer the whole result -- is what leads to OOME.
What's missing is enough of the above explanation of the cost model in
the docs to make it obvious to users why:
iterate(0, f).filter(..).limit(BIG)
is silly and why adding limit(BIG) to a parallel pipeline changes its
behavior so drastically.
More information about the lambda-dev
mailing list