Bulk operations, decomposition functions

Doug Lea dl at cs.oswego.edu
Thu Aug 9 03:05:20 PDT 2012


On 07/25/12 06:04, Doug Lea wrote:
> I've been using
> targetLeafSize = max(1, N / (p * 8))
> (N is estimated size, p is fjp.parallelism)

I noticed that this may err in the wrong direction if N is
obtained from the size() method of a map or collection,
because size() is required to saturate at Integer.MAX_VALUE.
(And we know that this happens in some actual collections
out there because several classes have either always done this
or have been fixed over the years to avoid int wraparaound.)

A tactic that avoids this problem when you are splitting
by (approximately) two is to focus on the remaining number
of splits rather than the target sizes. As in:
   b = min(N, p * 8);
   while (b > 1)
     split(b >>>= 1, ...)

This works even if N is an underestimate, assuming
you have fewer than 500 million cores :-)

(I'll send some less generally interesting follow-ups
on particular cases off-list.)

-Doug




More information about the lambda-dev mailing list