Processing-Mode Equality
Brian Goetz
brian.goetz at oracle.com
Sat Feb 8 10:53:51 PST 2014
> My impression is that the Stream interface is parallel only. It does
> allow user to specify parallelism level = 1, a.k.a. the sequential
> mode.
That's strictly true, if only in a lawyer-distorting-the-facts sort of
way.
If you express the problem at a higher level, then your solution can
work both sequentially or in parallel. Streams is aimed at helping
users express problems in this way, guiding away from the one element at
a time approach, towards specifying the goal at the aggregate level.
This turns out to usually result in clearer, less error-prone code, and
often exposes significant opportunities for optimization. (But if you
like the old way, we're not taking your for-loops away.)
We've all got years of "training" that "teaches" us to favor mutability,
which is a behavior that is increasingly unhelpful. Yes, sometimes its
the only way out, but the other 99% of the time, there's a good -- often
better -- way to express the problem without side-effects.
Blaming this on "parallelism obsession" is mostly missing the point.
Parallelism is one benefit you get for free by letting go of
statefulness, but it is by far not the only one.
> The main reason for the sequential mode is that a lot of Java
> programmers are imperative-minded, and they'll want to pass in
> functions with side effects.
No, this is wrong. The main reason for the sequential "mode" is that
sometimes the performance characteristics of a solution favor sequential
execution.
Parallelism is strictly an optimization -- in the best case. It should
be obvious that a parallel solution is *always* going to be more work
than a sequential one. In the happy case, the benefits of parallelism
can overcome the overhead of task splitting, inter-thread communication,
etc to get to the answer faster (as measured in wall-clock time). But
this isn't always true. And its fairly hard to automatically guess
which will be better, so we factor the problem:
- encourage the user to specify the problem in a way that is amenable
to getting the right answer with either parallel or sequential computations
- give the user explicit control over whether or not to apply
parallelism -- because the user probably has a better idea of whether
the characteristics of his problem are suited for parallel solution.
Yes, users *can* request a sequential calculation and then use
side-effects -- but this is usually a pretty bad idea. This is in the
same category as writing programs with data races in 1998 and saying
"but its OK, this program will never run on a multiprocessor."
More information about the lambda-dev
mailing list