Concerns about parallel streams

Thu Jul 11 13:37:56 PDT 2013

I think this is to some degree, victim of our own success!  Before 
streams, you had to choose seq vs par, and the code for one was 
massively different from the other.  Now, you can make much cheaper 
course corrections, and can even course-correct at runtime -- this is a 
huge improvement.  But what we don't have is a magic "figure it out for 
me."

On 7/11/2013 4:20 PM, Sam Pullara wrote:
> I think one of the biggest issues is that the programmer is making a compile time decision when essentially only the runtime environment matters. The biggest one (absent doing I/O in your stream operations!) js whether you have 1 core or less actually available on the machine (concurrent requests, competing applications, VMs, etc) for your task.
>
> My guess is that the difference in performance is because of memory usage. Doing a rough analysis of GC logs, I found that sequential streams allocate 10-100x more than the for loop and parallel uses another 10x on top of that.
>
> How is a developer supposed to make an informed decision when applying .parallel()? Especially in library code...
>
> Sam
>
> Micro benchmark https://github.com/spullara/parallelstream/blob/master/src/main/java/parallelstreams/Benchmark.java
> Results are in ns / run
> Executed on a dual i7 2ghz macbook air using todays build from lambda
>
> Elements: 100...
> for loop: 562.63 (0.03572569261668172)
> sequential stream: 2891.91 (0.18362953936887128)
> parallel stream: 15748.61 (1.0)
> Elements: 200...
> for loop: 665.05 (0.031517030475387994)
> sequential stream: 2347.97 (0.11127139620373919)
> parallel stream: 21101.29 (1.0)
> Elements: 400...
> for loop: 1648.85 (0.07297808201110305)
> sequential stream: 4069.37 (0.1801102693353079)
> parallel stream: 22593.77 (1.0)
> Elements: 800...
> for loop: 3351.44 (0.13019962876027844)
> sequential stream: 7899.36 (0.30688114346185313)
> parallel stream: 25740.78 (1.0)
> Elements: 1600...
> for loop: 6792.7 (0.190088701717702)
> sequential stream: 14291.05 (0.39992449845904654)
> parallel stream: 35734.37 (1.0)
> Elements: 3200...
> for loop: 17991.95 (0.3807979678301864)
> sequential stream: 25637.77 (0.5426210452840141)
> parallel stream: 47248.02 (1.0)
> Elements: 6400...
> for loop: 34608.11 (0.48269659279327126)
> sequential stream: 59680.89 (0.8323991763164765)
> parallel stream: 71697.44 (1.0)
> Elements: 12800...
> for loop: 87321.16 (0.7721388554617178)
> sequential stream: 109948.0 (0.9722170763684879)
> parallel stream: 113089.97 (1.0)
> Elements: 25600...
> for loop: 96544.27 (0.4561001524046007)
> parallel stream: 180886.05 (0.854552579587232)
> sequential stream: 211673.4 (1.0)
> Elements: 51200...
> for loop: 223773.95 (0.4661786747968137)
> parallel stream: 397319.45 (0.8277185734621876)
> sequential stream: 480017.56 (1.0)
>
> Process finished with exit code 0
>
>
> On Jul 11, 2013, at 1:02 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>> One thing on my list of things to doc is notes on methods that have particularly bad or surprising parallel performance.  #1 on this list is limit(n) for large n when the stream is not sized or unordered.  Other culprits are collecting to maps (since map merging is expensive.)  Others?
>>
>> On 7/11/2013 3:20 PM, Sam Pullara wrote:
>>> As it stands, and it seems we are far past changing this API, it is
>>> simply too easy to get a parallel stream without thinking about
>>> whether it is the right thing to do. I think we need to extensively
>>> document when and why you would use parallel streams vs sequential
>>> streams. We should include a cost model, a benchmark that will help
>>> people figure out whether they should use it, and perhaps some rules
>>> of thumbs where it makes sense. As it stands I think that we are
>>> going to see some huge regressions in performance (both memory and
>>> cpu usage) when people call .parallel() on streams that should be
>>> evaluated sequentially. It would have been great to have the cost
>>> model built into the system that would make a good guess as to
>>> whether it should use parallel execution.
>>>
>>> Doug, what are your thoughts? How do you expect people to use it? I
>>> can imagine some heuristics that we could put in that might save us —
>>> maybe by having a hook that decides when to really do parallel
>>> execution that gets executed every N ms with some statistics...
>>>
>>> Sam
>>>