Concerns about parallel streams

Thu Jul 11 13:52:37 PDT 2013

Aleksey: Can you add memory parameters to your model?  Including both the
memory overhead of parallel streams and the memory working set of each
parallel task.

Some tasks may be embarrassingly parallel -- except for the unfortunate
constraint that only K instances will fit in the available memory.

By the way, another concern not mentioned is UI responsiveness.  Adding
.parallel() competes with the UI thread's ability to respond to user input.
 How to model UI degradation?

--Joe

On Thu, Jul 11, 2013 at 1:35 PM, Aleksey Shipilev <
aleksey.shipilev at oracle.com> wrote:

> On 07/11/2013 11:20 PM, Sam Pullara wrote:
> > Doug, what are your thoughts? How do you expect people to use it? I
> > can imagine some heuristics that we could put in that might save us —
> > maybe by having a hook that decides when to really do parallel
> > execution that gets executed every N ms with some statistics...
>
> I am not Doug, but have been deeply involved in figuring out the
> parallel performance model. In short, it is formalizable down to the way
> of having four model parameters:
>   P - number of processors (loosely, number of FJP workers)
>   C - number of concurrent clients (i.e. Stream users)
>   N - source size (e.g. collection.size())
>   Q - operation cost, per element
>
> Assuming the ideally splittable source and embarrassingly parallel
> operations, we confirmed the model is most heavily dependent on N*Q,
> which is exactly the amount of work we are presented with. At this
> point, break-even against sequential stream correlates with N*Q in order
> of 200-400 us, with P in (1, 32) on different machines.
>
> That is, with the simple filter taking around 5 ns per element, the
> break-even is somewhere around 40K-80K elements in the source. (Which is
> not really a good break-even point).
>
> While N is known in most cases, Q is really hard. The profiling would
> not really help with the operations taking different times all of the
> sudden. Also, we can't easily profile the very fast operations with both
> the good granularity *and* the low overhead.
>
> I working in the background to build up the benchmark to easily figure
> the break-even front in (P, C, N, Q) space for a given source and the
> pipeline. It should probably be available for the developers within the
> JDK.
>
> Thanks,
> -Aleksey.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130711/0f8bf0f3/attachment.html