stream.parallel().limit() not usable
Paul Sandoz
paul.sandoz at oracle.com
Mon Oct 7 03:58:09 PDT 2013
On Oct 5, 2013, at 9:49 AM, Brian Goetz <brian.goetz at Oracle.COM> wrote:
> Here's some concrete advice for streams with limit():
>
> If N is smallish, you're generally fine with parallel. Short-circuiting will kick in once you've got your results, and cancel the remaining tasks. If upstream operations are expensive, you'll get value out of the parallelism.
>
Yes, and we ensure the short-circuiting is efficient for various shapes of decomposable input. Previously it was possible to get OOME for cases where the input comparatively is was not that large compared to the heap size.
Here is another degenerate case:
Stream.iterate(0, i -> i + i).paralel().limit(N);
The stream has an encounter order but we don't know the size (which is similar to the case where the filter operation clears size information). Also the source decomposes poorly (derived from an iterator). The only way we can decompose is to copy a prefix of elements from the source into an array (thus creating an unbalanced computation tree that is right-heavy).
Paul.
> If N is large, you're asking for a lot of additional work. Unless generation is obscenely expensive, this will likely not be a good tradeoff. In this case, you have some choices:
> - Use unordered -- when you just want N and don't care about the *first* N. This parallelizes cleanly.
> - Use sequential -- when your stream has unpredictable size+decomposition characteristics (source doesn't decompose perfectly, or pipeline has unpredictable size because of operations like filter)
>
More information about the lambda-dev
mailing list