Bulk operations: the evidence for thread wasted joining

Wed Aug 1 09:13:56 PDT 2012

Hi,

Here's the another peculiar thing I'm investigating. This time, the
collection on which my operation is performing have enough head-room for
all workers. In my previous notation, that translates to:
  N = 1000 (collection size)
  P = 80   (#CPUs on target machine, FJP parallelism)
  C = 1    (only one external client)
  Q = 10^9 (operation cost, that is a lot, closer to 6 sec per op)

If you run this on large machine with fresh FJP, and trace it with
fjp-trace [1], you will notice two things [2][3].

First, there are severe "edge effects" by the end of the particular bulk
operation. That is expected, but this is something to worry about when
doing performance testing. This is mitigated if C >> 1, and the pool has
other stuff to do in the mean time. I still believe most of the bulk
operation usages in the wild would have rather low C.

Second, and more important, is to notice how many threads are wasted
join-ing stuff. Of course, there is an open question whether these
threads would do anything useful should they not be joined. There is the
evidence though all threads are either running or busy on joining, which
may give us the intuition there *is* indeed the work to do.

I would think CountedCompleters [4] should help us out here. With Mike's
pending commit with FJP update, this API would finally be exposed in
lambda/lambda, and we can try this out.

-Aleksey.

[1] https://github.com/shipilev/fjp-trace
[2] http://shipilev.net/pub/jdk/lambda/bulk-joins-trouble-1/
[3] http://shipilev.net/pub/jdk/lambda/bulk-joins-trouble-1/trace.png
[4]
http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/jsr166y/CountedCompleter.html