Processing-Mode Equality

Sam Pullara spullara at gmail.com
Sun Feb 9 10:37:26 PST 2014


Here are just a few of the things you need to know at runtime to decide if you want to run something in parallel:

1) The cost of executing the per element code. On current architectures and JVMs where Java runs this spans 3+ orders of magnitude in performance. The variables include CPU, GPU, power constraints, hotspot effectiveness, etc. As a library author is generally impossible if you accept a lambda to know anything about the performance of that lambda at compile time and therefore impossible to decide to run in parallel.

2) The number of elements that you are going to process. This is only forecastable in very degenerate cases. Rarely will you write a Stream based function that has any idea how many elements are going to be run through it.

3) The cost for copying the input data and output data across memory barriers. The differences between non-NUMA and NUMA architectures is another couple order of magnitude gap. Even the cost of synchronization on a real system may make it too high to consider running some things in parallel.

4) Whether you want to optimize for throughput or latency. This is especially bad for library authors. As Brian stated, it is always more expensive in total CPU and memory usage to run in parallel. Systems that optimize for throughput may not want this additional deoptimization.

5) Runtime context. If you are already running in parallel at a higher level additional parallelism in the the leaf code will often be just more noise and actually interfere with the high level splitting of work.

I think your straw man argument that if only people changed the way they wrote code everything would work fine is just not true. Runtime measurement and opportunistically parallelizing operations when it makes sense no more requires solving the halting problem than runtime optimization in hotspot requires solving the halting problem. Obviously it would help, but you can make real progress by characterizing loads and dispatching them intelligently.

Sam

On Feb 9, 2014, at 4:42 AM, Doug Lea <dl at cs.oswego.edu> wrote:

> One slogan is "Data-centric, parallel-agnostic".
> 
> On 02/08/2014 01:36 PM, Sam Pullara wrote:
>> That is one way to think about it and programming wise you would be correct.
>> However, if you run everything in parallel that can be you will likely be
>> disappointed in the performance.
> 
> This of course assumes that people will continue to write programs in
> ways that present a high likelihood of disappointment. This is sure to
> sometimes happen: Classic object-oriented programmers will tend to use
> unpartitionable side-effecting methods, classic functional programmers
> will tend  to use hopelessly sequential data structures, and classic
> event-driven programmers will tend to use one-by-one vs bulk updates.
> Among those likely to cope best are database programmers, who are
> already comfortable with data-centric bulk updates.
> 
>> A pretty special set of conditions need to
>> be present for it to make sense to run things in parallel.
> 
> In some alternative universe, someone is now complaining that a
> pretty special set of conditions need to be present for it to make
> sense to run things sequentially even if eligible for parallelism:
> The combination of intrinsically sequential data structures,
> less than a few thousand elements, and trivially cheap per-element
> functions. If you designed programs so these cases rarely occurred,
> you wouldn't worry about occasional slowdowns when they do.
> But we cannot write such code for you. We cannot even tell you
> with certainty when you do/don't: Just trying to figure out the
> cost of per-element functions hits the Halting problem.
> 
> We will see some bad reactions and experiences along the way
> as people decide how and when to use the Stream framework.
> And we will surely see people deciding to never use it because
> data-centric, parallel-agnostic programming clashes with their
> adopted programming style. Fine. Java (among other JVM languages)
> succeed because people with different religious views about
> programming can coexist.
> 
> -Doug
> 
> 
> 
> 



More information about the lambda-dev mailing list