RFR (S): Proper divide-n-conquer in array handling

Fri Oct 28 16:46:38 UTC 2016

Hi,

It bugs me our GC workers are following the odd HS tradition of forking
out the piece of work, and pushing the rest of the work back on queue.
This effectively serializes the balancing work: with N workers, there
should be N consecutive cutout-submit-steal interactions. Worse, this
only gets amortized if the chunk work takes more time than queue
stealing (which is a dangerous assumption to make).

In fork/join world, we know how to seed the work queues more
efficiently: you divide in half, and let others steal the largest half.
Then you keep dividing until you hit the leaf task, which you can then
execute. The upside of this is that stealers always get "larger" chunks
of work, which they break down for themselves.

This is what this patch does:
 http://cr.openjdk.java.net/~shade/shenandoah/concmark-proper-dnc/webrev.01/
 (also did a few renames and comments)

There are no clear performance improvements on our workloads, because
mark costs are completely dominated by cache misses on "marked" bitset.
But there is a chicken-and-egg problem: optimizing for faster marks
would trim down the chunk work size, and run into task disbalance
because of this issue. So I prefer to nail this down before it hits us.

Testing: hs_gc_sheneandoah, SPECjvm2008, some microbenchmarks

Thanks,
-Aleksey