Concerns about parallel streams
David Holmes
david.holmes at oracle.com
Thu Jul 11 20:35:46 PDT 2013
Sam,
On 12/07/2013 5:20 AM, Sam Pullara wrote:
> As it stands, and it seems we are far past changing this API, it is simply too easy to get a parallel stream without thinking about whether it is the right thing to do. I think we need to extensively document when and why you would use parallel streams vs sequential streams. We should include a cost model, a benchmark that will help people figure out whether they should use it, and perhaps some rules of thumbs where it makes sense. As it stands I think that we are going to see some huge regressions in performance (both memory and cpu usage) when people call .parallel() on streams that should be evaluated sequentially. It would have been great to have the cost model built into the system that would make a good guess as to whether it should use parallel execution.
I think we addressed this at the start with the decision to require
explicit rather than automatic parallelism. Hence I totally oppose any
proposal that we run in sequential mode until we have used up a
timeslice - that's the automatic parallelism path.
Continuing on that explicit path, just as our libraries require explicit
parallelism selection, so applications should also require/allow it. If
an app chooses to always use parallel() then that is "automatic
parallelism" at the app level - and that is as bad as auto-parallelism
at the library level. Programmers don't have the runtime knowledge
needed to determine whether parallelism will "work" - that is something
that application deployers need to choose.
So my advice for the docs here is two fold:
a) programmers should stick with sequential unless parallel can be shown
to have a significant benefit; and
b) programmers should allow deployers/end-users to opt-in to parallelism
where they have enabled it, rather than enabling it automatically.
My 2c.
Cheers,
David
------
> Doug, what are your thoughts? How do you expect people to use it? I can imagine some heuristics that we could put in that might save us — maybe by having a hook that decides when to really do parallel execution that gets executed every N ms with some statistics...
>
> Sam
>
More information about the lambda-libs-spec-experts
mailing list