Concerns about parallel streams

Thu Jul 11 12:20:29 PDT 2013

As it stands, and it seems we are far past changing this API, it is simply too easy to get a parallel stream without thinking about whether it is the right thing to do. I think we need to extensively document when and why you would use parallel streams vs sequential streams. We should include a cost model, a benchmark that will help people figure out whether they should use it, and perhaps some rules of thumbs where it makes sense. As it stands I think that we are going to see some huge regressions in performance (both memory and cpu usage) when people call .parallel() on streams that should be evaluated sequentially. It would have been great to have the cost model built into the system that would make a good guess as to whether it should use parallel execution. 

Doug, what are your thoughts? How do you expect people to use it? I can imagine some heuristics that we could put in that might save us — maybe by having a hook that decides when to really do parallel execution that gets executed every N ms with some statistics...

Sam