Concerns about parallel streams

Thu Jul 11 21:26:53 PDT 2013

On Jul 11, 2013, at 8:35 PM, David Holmes <david.holmes at oracle.com> wrote:
> On 12/07/2013 5:20 AM, Sam Pullara wrote:
>> As it stands, and it seems we are far past changing this API, it is simply too easy to get a parallel stream without thinking about whether it is the right thing to do. I think we need to extensively document when and why you would use parallel streams vs sequential streams. We should include a cost model, a benchmark that will help people figure out whether they should use it, and perhaps some rules of thumbs where it makes sense. As it stands I think that we are going to see some huge regressions in performance (both memory and cpu usage) when people call .parallel() on streams that should be evaluated sequentially. It would have been great to have the cost model built into the system that would make a good guess as to whether it should use parallel execution.
> 
> I think we addressed this at the start with the decision to require 
> explicit rather than automatic parallelism. Hence I totally oppose any 
> proposal that we run in sequential mode until we have used up a 
> timeslice - that's the automatic parallelism path.

You misunderstand me. I mean if you ask explicitly for parallel mode to not actually use it until we verify that you haven't made a big error. I agree with you except that I think we should protect them from making a big mistake when it is enabled and is unnecessary.

Sam