Demo for Parallel Core Collection API

Tue Oct 15 08:20:40 PDT 2013

On Oct 15, 2013, at 4:35 PM, Tristan Yan <tristan.yan at oracle.com> wrote:

> Hi Paul
> you have comments "suggest that all streams are sequential. There is an inconsistency in the use and in some cases it is embedded in other stream usages."
> 
> We do not really understand what exactly is meant, could you elaborate a little bit. Is it because we want to show ppl that we should use stream more than parallelStream?

Going parallel is easy to do but not always the right thing to do. Going parallel almost always requires more work with the expectation that work will complete sooner than the work required to get the same result sequentially. There are a number of factors that affect whether parallel is faster than sequential. Two of those factors are N, the size of the data, and Q the cost of processing an element in the pipeline. N * Q is a simple cost model, the large that product the better the chances of parallel speed up. N is easy to know, Q not so easy but can often be intuitively guessed. (Note that there are other factors such as the properties of the stream source and operations that Brian and I talked about in our J1 presentation.)

Demo code that just makes everything (or most streams) parallel is sending out the wrong message. 

So i think the demo code should present two general things:

1) various stream functionality, as you have done;

2) parallel vs. sequential for various cases where it is known that parallel is faster on a multi-core system.

For 2) i strongly recommend measuring using jmh [1]. The data sets you have may or may not be amenable to parallel processing, it's worth investigating though.

I have ideas for other parallel demos. One is creating probably primes (now that SecureRandom is replaced with ThreadLocalRandom), creating a probably prime that is a BigInteger is an relatively expensive operation so Q should be high. Another more advanced demo is a Monte-Carlo calculation of PI using SplittableRandom and a special Spliterator, in this case N should be largish. But there are other simpler demonstrations like sum of squares etc to get across that N should be large. Another demo could be calculation of a mandelbrot set, which is embarrassingly parallel over an area in the complex plane.

So while you should try and fit some parallel vs. sequential execution into your existing demos i do think it worth having a separate set of demos that get across the the simple cost model of N * Q. So feel free to use some of those ideas previously mentioned, i find those ideas fun so perhaps others will too :-)

Paul.

[1] http://openjdk.java.net/projects/code-tools/jmh/

On Oct 15, 2013, at 4:37 PM, Tristan Yan <tristan.yan at oracle.com> wrote:

> Also there is one more question I missed
> 
> You suggested ""ParallelCore" is not a very descriptive name. Suggest "streams"."
> 1) yes we agree this demo is not for parallel computation per se
> 2) but we do not have a clear demo for parallel computation
> 3) if we are to rename this, we need to develop another one, do you have a scenario for that?