stream.parallel().limit() on large streams
Brian Goetz
brian.goetz at oracle.com
Sun Oct 6 12:10:17 PDT 2013
So, the lesson for users is: the library works well when you tell us what you really want.
The lesson for us is: we need the docs to do a better job of helping you understand what you want :)
Sent from my iPhone
On Oct 6, 2013, at 10:26 AM, Arne Siegel <v.a.ammodytes at googlemail.com> wrote:
> Hi Brian,
>
> tried out both of your propositions, and indeed they do a good job on the order-independent scenario I was looking into:
>
> IntStream.range(0, maxToGenerate)
> .parallel()
> .unordered()
> .mapToObj(i -> generatorFunction.get())
> .filter(condition)
> .limit(needed)
> .forEach(action);
>
> ==> runs quite good, but is on average a few percent less efficient than the implementation you cited in your mail;
>
> Stream.generate(generatorFunction)
> .parallel()
> .limit(maxToGenerate)
> .filter(condition)
> .limit(needed)
> .forEach(action);
>
> ==> this is a really concise expression of the program's intents, and it performs equally well compared to the cited implementation. Nice!
>
> One note: it's a complete different picture for scenarios where order is important. These need other implementations, and for these I found
> - parallel streams running much slower than serial streams and serial loop;
> - the ExecutorService-based approach running much faster than serial implementations, most of the time.
>
> Thank you very much for your valuable hints!
>
> Arne
>
>
> 2013/10/5 Brian Goetz <brian.goetz at oracle.com>
>> > For completeness I want to show how I could rewrite the code using a
>> > streams-based implementation:
>> >
>> > final AtomicInteger elementsConsumed = new AtomicInteger(0);
>> > IntStream.range(0, maxToGenerate)
>> > .parallel()
>> > .mapToObj(i -> generatorFunction.get())
>> > .filter(condition::test)
>> > .peek(action::accept)
>> > .mapToInt(element ->
>> > elementsConsumed.incrementAndGet())
>> > .filter(n -> n >= needed)
>> > .findFirst();
>>
>> If this code works for you, then what you're saying is that you don't care about order. Which I believe. In which case, just use .unordered() in your stream and you'll get good parallelism without having to contort your code. Try it and report back?
>>
>> You might also do better with Stream.generate, since it creates an unordered stream:
>>
>> Stream.generate(generatorFunction)
>> .parallel()
>> ...
>
More information about the lambda-dev
mailing list