RFR (S): Single-element buffer in thread-local taskqueues
Zhengyu Gu
zgu at redhat.com
Wed Nov 9 14:14:45 UTC 2016
Hi Aleksey,
Would it make sense to incorporate pop_buffer() inside pop_local()? since you are
doing the push buffer in push()? The buffered element could be easily forgotten
when migrating from OverflowTaskQueue to BufferedOverflowTaskQueue.
Thanks,
-Zhengyu
On 11/08/2016 05:01 PM, Aleksey Shipilev wrote:
> Hi,
>
> In our work-stealing strategy, we are falling victim to yet another
> non-optimality in division strategy. The best way to do parallel work is
> to fork out the tasks, submit everything but one task to the queue,
> recursively execute the remaining task. Pushing the last task is
> redundant, because we will pop it out on the next cycle. (Actually, as
> array stride example taught us, this can even set us up for thrashing
> the queue, if somebody stole the task under our feet).
>
> Now, rewriting the closure-heavy code to this pattern would be not easy,
> but we can solve this in TaskQueues themselves, enter here:
> http://cr.openjdk.java.net/~shade/shenandoah/taskqueue-buff/webrev.01
> (includes the clean_queues fix, to be pushed separately)
>
> It does the single-entry buffer before the queue, which acts almost like
> the local variable for us to pull from. It does not make sense to buffer
> more than one task, because it may impede work stealing. Buffering a
> single task is completely safe, because we are definitely are going back
> for it, and nobody else should steal.
>
> Testing: hotspot_gc_shenandoah, jcstress-all (quick), microbenchmarks
>
> Trims down the mark times for full GC (again, conc GC mark would also
> benefit from this, once we solve other bottlenecks there).
>
> == Baseline
>
> Full GC Times = (avg = 7344.99 ms)
> Mark = (avg = 2802.24 ms)
> Drain Queues = (avg = 2800.25 ms)
> Weak References = (avg = 0.27 ms)
> Class Unloading = (avg = 0.82 ms)
> Calculate Addresses = (avg = 824.15 ms)
> Adjust Pointers = (avg = 1813.09 ms)
> Copy Objects = (avg = 1889.87 ms)
>
> == Patched
>
> Full GC Times = (avg = 6856.22 ms)
> Mark = (avg = 2338.00 ms)
> Drain Queues = (avg = 2336.00 ms)
> Weak References = (avg = 0.27 ms)
> Class Unloading = (avg = 0.80 ms)
> Calculate Addresses = (avg = 819.28 ms)
> Adjust Pointers = (avg = 1806.19 ms)
> Copy Objects = (avg = 1877.28 ms)
>
> Good to go?
>
> Thanks,
> -Aleksey
>
More information about the shenandoah-dev
mailing list