Concerns about parallel streams

Thu Jul 11 16:50:26 PDT 2013

On 07/11/13 17:08, Sam Pullara wrote:
> Hoping Doug enters the thread soon….

(It's great to feel needed, but today maybe a little too much :-)

A couple of quick notes:

If you are writing from-scratch ForkJoin programs
rather than stream-based ones, you become immediately
aware that you have to make some decisions about task
granularity and partitioning. The rule of thumb we state in
FJ is if you stay above a thousand or so instructions per
leaf task, you'll have a good chance of success.

The big problem when you automate this via streams is that
most programmers have nearly no idea about any of the
components of this otherwise straightforward guidance.
And as Aleksey explained, there are few prospects for
magically automating in general.
Yet any attempt at providing any form of parameterization
of hinting to control this has been defensibly rejected.
As a result we have a completely opaque cost model.
No sense in pretending otherwise.

Despite this, the easy guidance is:

If you have a lot of data, or very costly per-element computations,
the best practice is to use parallel(). Otherwise, feel free to
experiment with it, but don't expect any miracles.

We could even give factor-of-1000-proof numbers here:
A million elements. A million instructions

-Doug