Concerns about parallel streams

Thu Jul 11 14:08:35 PDT 2013

Hoping Doug enters the thread soon….

How about we don't bother doing any work in parallel until at least a kernel time slice has passed running in sequential mode? On a linux box on AWS that amount of time is 16ms:

--------------------------------------------
#include <sched.h>

int main(int argc, char* argv[]) {
  struct timespec t;
  sched_rr_get_interval(0, &t);
  printf("%d\n", t.tv_nsec);
}
---------------------------------------------

That would at least stop people from screwing it up when N*Q is very small. I'm very much in favor of making this programmatic with a callback so at least the decision to bring the parallel pipeline up can be made by something at runtime. For example, my callback would return false whenever it was called inside an server side application that accepts concurrent requests. Maybe hang this off of the ForkJoinPool?

public interface ParallelStreamChecker {
  /**
   * This method is called upon execution and then once every kernel timeslice until the Optional.isPresent. If you return of(true) the parallel pipeline will
   * be setup and the rest of the stream will be processed in parallel. If you return of(false) the stream will be processed sequentially.
   */
  Optional<Boolean> check(long startNanoTime, long numberOfElementsProcessed);
}

This also makes sense the context of an enterprise application server that will probably replace the ForkJoin implementation and wants control over this kind of thing.

Sam

On Jul 11, 2013, at 1:35 PM, Aleksey Shipilev <aleksey.shipilev at oracle.com> wrote:

> On 07/11/2013 11:20 PM, Sam Pullara wrote:
>> Doug, what are your thoughts? How do you expect people to use it? I
>> can imagine some heuristics that we could put in that might save us —
>> maybe by having a hook that decides when to really do parallel
>> execution that gets executed every N ms with some statistics...
> 
> I am not Doug, but have been deeply involved in figuring out the
> parallel performance model. In short, it is formalizable down to the way
> of having four model parameters:
>  P - number of processors (loosely, number of FJP workers)
>  C - number of concurrent clients (i.e. Stream users)
>  N - source size (e.g. collection.size())
>  Q - operation cost, per element
> 
> Assuming the ideally splittable source and embarrassingly parallel
> operations, we confirmed the model is most heavily dependent on N*Q,
> which is exactly the amount of work we are presented with. At this
> point, break-even against sequential stream correlates with N*Q in order
> of 200-400 us, with P in (1, 32) on different machines.
> 
> That is, with the simple filter taking around 5 ns per element, the
> break-even is somewhere around 40K-80K elements in the source. (Which is
> not really a good break-even point).
> 
> While N is known in most cases, Q is really hard. The profiling would
> not really help with the operations taking different times all of the
> sudden. Also, we can't easily profile the very fast operations with both
> the good granularity *and* the low overhead.
> 
> I working in the background to build up the benchmark to easily figure
> the break-even front in (P, C, N, Q) space for a given source and the
> pipeline. It should probably be available for the developers within the JDK.
> 
> Thanks,
> -Aleksey.