Processing-Mode Equality

Sun Feb 9 11:34:46 PST 2014

Well, it's decidable at compile time after having done the
profiling/tuning/measurement/etc for a given use case (that's not expected
to vary much).  I do agree that library code is ill suited to make that
choice, but then it's also not the library's decision to make in the first
place.  The library should allow the option for user to pick/configure
parallelism, but that's about it - no automatic decision.  This is nothing
new though.

Applications (or rather, probably some small part(s) of a given
application) *is* the place to decide given that there should be enough
context (afterall, an application should have a feel of what problem sets
it's solving).  There may be a need for dynamic selection of parallel vs
sequential (based on measurement, profiling, heuristic, etc) algorithm
(e.g. you know the compute kernel CPU complexity, but you get varying size
of inputs), but that's fine.

I do agree that docs/guidance should make it very clear what things to
consider for deciding whether to go parallel, but it should also make it
very clear that profiling/measurement is a must.  The widgets stream
example should be just a "concept"/syntax example, not a blanket example of
where to parallelize.  I do, however, think that javadocs isn't the place
for an exhaustive exposition on this topic.  It should highlight the
general types of things to consider, and then leave it for other sources
(books, articles, blogs, etc) to provide real use cases where parallel ends
up being better.

Sent from my phone
On Feb 9, 2014 2:15 PM, "Sam Pullara" <spullara at gmail.com> wrote:

> My point is that it is currently not decidable whether parallel execution
> will be a win or a loss at compile time -- but currently that is where we
> expose the choice. So it may be a tool, but it isn't one someone should use
> except in extremely constrained circumstances where the runtime context is
> well known, i.e. not libraries, very few applications. However, I feel like
> it is presented as if you should always use it if your stream code is
> parallizable. Also, if you look at the javadoc, I'm not sure we could have
> picked a more obviously bad example where it will essentially never make
> sense to run the code in parallel:
>
> int sumOfWeights = widgets.*parallelStream()*
>                                .filter(b -> b.getColor() == RED)
>                                .mapToInt(b -> b.getWeight())
>                                .sum();
>
> I'm not sure where the tip over point would be, but my guess is in the
> millions of widgets[1].
>
> Sam
>
> [1] On my machine it turns over somewhere between 1m and 2m widgets. Sadly
> YMMV quite a bit.
>
>
>
> On Sun, Feb 9, 2014 at 10:51 AM, Vitaly Davidovich <vitalyd at gmail.com>wrote:
>
>> In my view, there's no need to characterize the Stream API in terms of
>> sequential vs parallel.  It simply is another tool that allows to run
>> parallel (or sequential) code with less boilerplate and the syntactic
>> difference between the two modes is minimal.  The need to measure, tune,
>> and design the algo/data structures for a specific workload doesn't go
>> away, and YMMV fine print is still applicable.
>>
>> Sent from my phone
>> On Feb 9, 2014 1:39 PM, "Sam Pullara" <spullara at gmail.com> wrote:
>>
>>> Here are just a few of the things you need to know at runtime to decide
>>> if you want to run something in parallel:
>>>
>>> 1) The cost of executing the per element code. On current architectures
>>> and JVMs where Java runs this spans 3+ orders of magnitude in performance.
>>> The variables include CPU, GPU, power constraints, hotspot effectiveness,
>>> etc. As a library author is generally impossible if you accept a lambda to
>>> know anything about the performance of that lambda at compile time and
>>> therefore impossible to decide to run in parallel.
>>>
>>> 2) The number of elements that you are going to process. This is only
>>> forecastable in very degenerate cases. Rarely will you write a Stream based
>>> function that has any idea how many elements are going to be run through it.
>>>
>>> 3) The cost for copying the input data and output data across memory
>>> barriers. The differences between non-NUMA and NUMA architectures is
>>> another couple order of magnitude gap. Even the cost of synchronization on
>>> a real system may make it too high to consider running some things in
>>> parallel.
>>>
>>> 4) Whether you want to optimize for throughput or latency. This is
>>> especially bad for library authors. As Brian stated, it is always more
>>> expensive in total CPU and memory usage to run in parallel. Systems that
>>> optimize for throughput may not want this additional deoptimization.
>>>
>>> 5) Runtime context. If you are already running in parallel at a higher
>>> level additional parallelism in the the leaf code will often be just more
>>> noise and actually interfere with the high level splitting of work.
>>>
>>> I think your straw man argument that if only people changed the way they
>>> wrote code everything would work fine is just not true. Runtime measurement
>>> and opportunistically parallelizing operations when it makes sense no more
>>> requires solving the halting problem than runtime optimization in hotspot
>>> requires solving the halting problem. Obviously it would help, but you can
>>> make real progress by characterizing loads and dispatching them
>>> intelligently.
>>>
>>> Sam
>>>
>>> On Feb 9, 2014, at 4:42 AM, Doug Lea <dl at cs.oswego.edu> wrote:
>>>
>>> > One slogan is "Data-centric, parallel-agnostic".
>>> >
>>> > On 02/08/2014 01:36 PM, Sam Pullara wrote:
>>> >> That is one way to think about it and programming wise you would be
>>> correct.
>>> >> However, if you run everything in parallel that can be you will
>>> likely be
>>> >> disappointed in the performance.
>>> >
>>> > This of course assumes that people will continue to write programs in
>>> > ways that present a high likelihood of disappointment. This is sure to
>>> > sometimes happen: Classic object-oriented programmers will tend to use
>>> > unpartitionable side-effecting methods, classic functional programmers
>>> > will tend  to use hopelessly sequential data structures, and classic
>>> > event-driven programmers will tend to use one-by-one vs bulk updates.
>>> > Among those likely to cope best are database programmers, who are
>>> > already comfortable with data-centric bulk updates.
>>> >
>>> >> A pretty special set of conditions need to
>>> >> be present for it to make sense to run things in parallel.
>>> >
>>> > In some alternative universe, someone is now complaining that a
>>> > pretty special set of conditions need to be present for it to make
>>> > sense to run things sequentially even if eligible for parallelism:
>>> > The combination of intrinsically sequential data structures,
>>> > less than a few thousand elements, and trivially cheap per-element
>>> > functions. If you designed programs so these cases rarely occurred,
>>> > you wouldn't worry about occasional slowdowns when they do.
>>> > But we cannot write such code for you. We cannot even tell you
>>> > with certainty when you do/don't: Just trying to figure out the
>>> > cost of per-element functions hits the Halting problem.
>>> >
>>> > We will see some bad reactions and experiences along the way
>>> > as people decide how and when to use the Stream framework.
>>> > And we will surely see people deciding to never use it because
>>> > data-centric, parallel-agnostic programming clashes with their
>>> > adopted programming style. Fine. Java (among other JVM languages)
>>> > succeed because people with different religious views about
>>> > programming can coexist.
>>> >
>>> > -Doug
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>