Bulk operations, decomposition functions

Wed Jul 25 07:35:19 PDT 2012

Just to add to Doug's comments:
 - I had thought we were using the "8x" formula suggested by Doug; if we're not, that's my silly error and it should be fixed.  
 - We have a to-do item to evaluate the splitting effectiveness of this default formula; right now we're just trying to get all the operations implemented and giving the same answers in both serial and parallel :)  
 - This is a dumb one-size-fits-all splitting formula for the general (default-implemented) case where a data structure has offered up a splitting algorithm.  Data structures that can do better will likely implement parallel ops themselves.  

On Jul 24, 2012, at 6:33 PM, Aleksey Shipilev wrote:

> (sorry for resending twice, but I suspect the first message got filtered
> out because of the attached GPG signature)
> 
> Hi,
> 
> I wanted to check if someone had looked into current decomposition
> functions (from now on, "DF" for brevity) for bulk operations (i.e.
> ParallelIterables.calculateDepth, BaseTask, etc.); or otherwise has the
> rationale behind them?
> 
> Trying the analyze the decomposition we have now, I had coded the DF in
> R [1], built some graphs [2] and have some preliminary questions:
> 
> I. The DF depends on both the guess for the task size and available
> parallelism. This follows naturally from the code, and also backed up by
> the chart 1. What if the guess is wrong? It seems dangerous to depend on
> the guess to drive the decomposition decisions.
> 
> II. The DF always decomposes into 2^n leaf tasks. This seems technically
> plausible, but it brings up the rounding issues into consideration: now,
> the little change in problem size can drastically change the subtask
> size, and as such, bulk operation performance. You might see that on
> chart 2: the only z-levels are the powers of two; and also the
> transitions between levels are abrupt and have significant jitter around
> the cliff.
> 
> III. The DF seems to favor very fast functions to operate on data. It
> will naturally prefer to have larger subtasks, boosting the performance
> for short operations, but limiting the scalability for heavy ones. For
> example, the current decomposition will generate only 64 subtasks for
> 100-element external task and 100 workers in FJP. This is partially
> demonstrated by chart 3 at low-end external task sizes: the average
> number of tasks per worker falls below one there.
> 
> It is also interesting to see that the "load factor" for FJP is probably
> too low even in non-pathological cases: there are only 1-2 tasks per
> worker, which can inhibit the balancing for heterogeneous tasks, albeit
> that is needed to be proven empirically.
> 
> -----
> 
> I would like to dive more deeply into that. I think this task could not
> be solved without the live feedback from both FJP and the collection
> being processed, which should guide decomposition decisions.
> 
> Any thoughts, comments, suggestions, etc. are appreciated.
> 
> -Aleksey.
> 
> [1] http://shipilev.net/pub/jdk/lambda/decomp-size-1/plot.r
> [2] http://shipilev.net/pub/jdk/lambda/decomp-size-1/plots.pdf
>