8148250 Stream.limit parallel ordered performance
Paul Sandoz
paul.sandoz at oracle.com
Tue Jan 26 17:49:12 UTC 2016
> On 26 Jan 2016, at 16:51, Tagir F. Valeev <amaembo at gmail.com> wrote:
>
> Hello!
>
> Thank you for review! Here's the issue:
> https://bugs.openjdk.java.net/browse/JDK-8148250
> Will post complete webrev later.
>
> PS> Note that it is still the case that in almost all scenarios this
> PS> is likely to be a bad form of parallel stream.
>
> Well not always it's possible to estimate in advance the size of the
> stream. Consider that we have user-specified filter upstream which
> searches over the big collection (we want to return first 10 search
> results, order is important):
>
> data.parallelStream()
> .filter(element -> userQuery.test(element))
> .limit(10).collect(toList());
>
> If user query produces 10-15 results, using parallel stream is very
> reasonable, but if it produces millions of results it should not
> regress very much (at least should not become much slower than
> sequential version which is what we see currently).
I have my doubts that the cost of splitting and filtering a small number of elements concurrently will pay off in the majority of scenarios, hence the “almost all”.
It could work in cases where there is lots of data to be filtered and only a few items are reported that are proportionally spread out, or over small data and the filter operation is costly.
In any case it’s good to avoid the OOME, i am very glad you found a simple way to resolve that.
>
> PS> I think the comment you refer to still applies but now for larger n, so we should refine it.
>
> Should we replace "regardless of the value of n" with "when
> n*parallelismLevel is sufficiently large”?
Yes, when N * P is sufficiently large e.g to pluck a number out of the air > 2^32, say
Paul.
More information about the core-libs-dev
mailing list