Bulk operations, decomposition functions

Wed Jul 25 03:04:08 PDT 2012

On 07/24/12 18:33, Aleksey Shipilev wrote:
> I wanted to check if someone had looked into current decomposition
> functions (from now on, "DF" for brevity) for bulk operations (i.e.
> ParallelIterables.calculateDepth, BaseTask, etc.); or otherwise has the
> rationale behind them?

I'm not sure why the functions in current prototype
were chosen by Mike et al, but for preliminary versions of
those for new ConcurrentHashMap and others,
I've been using
   targetLeafSize = max(1, N / (p * 8))
(N is estimated size, p is fjp.parallelism)
In the normal case of recursive splitting by two,
the parallel depth is log2 of this.

The magic number 8 is a hedge-factor to cope
with variance in system load. This splits tasks as
if you had 8 times more workers than you do,
which improves load balancing, so the chances
that you are more than X% slower than optimal under
unexpected loads on one or more cores decline rapidly
with X; although for small N, possibly with enough
overhead to outweigh speedups. We cannot know this,
but are in general required to err on the side of too many
rather than too few subtasks. Otherwise, people would
be surprised and unhappy that, say, a task computing the
next best chess move on a list of 4 positions did them
all serially because 4 was too small a task size to split.

There is a second strategy in place that alomst never
kicks in so might not be used. The FJ getSurplusQueuedTaskCount
method can tell a worker if there are too many or two few
unstolen tasks, so it can adjust. (Some of the ideas behind
this are in a paper I coauthored a few years ago  --
(http://gee.cs.oswego.edu/dl/papers/icpp08.pdf
see also the internal documentation inside ForkJoinTask:
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166y/ForkJoinTask.java?view=log)

Variations of this work OK for underlying data structures
that are O(1) or even O(log N) splittable. For others
(for example most Queue classes), it looks like it is better
to just to decompose into tiny tasks and submit them than it is to
repeatedly count your way through the data structure.

-Doug