RFR (S): Single-element buffer in thread-local taskqueues

Tue Nov 8 23:26:57 UTC 2016

Am Dienstag, den 08.11.2016, 23:01 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> In our work-stealing strategy, we are falling victim to yet another
> non-optimality in division strategy. The best way to do parallel work
> is
> to fork out the tasks, submit everything but one task to the queue,
> recursively execute the remaining task. Pushing the last task is
> redundant, because we will pop it out on the next cycle. (Actually,
> as
> array stride example taught us, this can even set us up for thrashing
> the queue, if somebody stole the task under our feet).
> 
> Now, rewriting the closure-heavy code to this pattern would be not
> easy,
> but we can solve this in TaskQueues themselves, enter here:
>   http://cr.openjdk.java.net/~shade/shenandoah/taskqueue-buff/webrev.
> 01
>  (includes the clean_queues fix, to be pushed separately)
> 
> It does the single-entry buffer before the queue, which acts almost
> like
> the local variable for us to pull from. It does not make sense to
> buffer
> more than one task, because it may impede work stealing. Buffering a
> single task is completely safe, because we are definitely are going
> back
> for it, and nobody else should steal.
> 
> Testing: hotspot_gc_shenandoah, jcstress-all (quick), microbenchmarks

Great stuff!

> Trims down the mark times for full GC (again, conc GC mark would also
> benefit from this, once we solve other bottlenecks there).

Do you happen to know what those bottlenecks are? Improving full-gc is
great, but ideally Shenandoah should not go there :-)

> Good to go?

Should we maybe put this code into shenandoahTaskQueue.hpp and friends?
Reduce shared code changes, etc ;-)

Other than that: go Alexey, go! :-)

Roman