Processing-Mode Equality

Sat Feb 8 10:53:51 PST 2014

> My impression is that the Stream interface is parallel only. It does
> allow user to specify parallelism level = 1, a.k.a. the sequential
> mode.

That's strictly true, if only in a lawyer-distorting-the-facts sort of 
way.

If you express the problem at a higher level, then your solution can 
work both sequentially or in parallel.  Streams is aimed at helping 
users express problems in this way, guiding away from the one element at 
a time approach, towards specifying the goal at the aggregate level. 
This turns out to usually result in clearer, less error-prone code, and 
often exposes significant opportunities for optimization.  (But if you 
like the old way, we're not taking your for-loops away.)

We've all got years of "training" that "teaches" us to favor mutability, 
which is a behavior that is increasingly unhelpful.  Yes, sometimes its 
the only way out, but the other 99% of the time, there's a good -- often 
better -- way to express the problem without side-effects.

Blaming this on "parallelism obsession" is mostly missing the point. 
Parallelism is one benefit you get for free by letting go of 
statefulness, but it is by far not the only one.

> The main reason for the sequential mode is that a lot of Java
> programmers are imperative-minded, and they'll want to pass in
> functions with side effects.

No, this is wrong.  The main reason for the sequential "mode" is that 
sometimes the performance characteristics of a solution favor sequential 
execution.

Parallelism is strictly an optimization -- in the best case.  It should 
be obvious that a parallel solution is *always* going to be more work 
than a sequential one.  In the happy case, the benefits of parallelism 
can overcome the overhead of task splitting, inter-thread communication, 
etc to get to the answer faster (as measured in wall-clock time).  But 
this isn't always true.  And its fairly hard to automatically guess 
which will be better, so we factor the problem:

  - encourage the user to specify the problem in a way that is amenable 
to getting the right answer with either parallel or sequential computations
  - give the user explicit control over whether or not to apply 
parallelism -- because the user probably has a better idea of whether 
the characteristics of his problem are suited for parallel solution.

Yes, users *can* request a sequential calculation and then use 
side-effects -- but this is usually a pretty bad idea.  This is in the 
same category as writing programs with data races in 1998 and saying 
"but its OK, this program will never run on a multiprocessor."