RFR (S): Single-element buffer in thread-local taskqueues

Zhengyu Gu zgu at redhat.com
Wed Nov 9 14:14:45 UTC 2016


Hi Aleksey,

Would it make sense to incorporate pop_buffer() inside pop_local()? since you are
doing the push buffer in push()? The buffered element could be easily forgotten
when migrating from OverflowTaskQueue to BufferedOverflowTaskQueue.


Thanks,

-Zhengyu



On 11/08/2016 05:01 PM, Aleksey Shipilev wrote:
> Hi,
>
> In our work-stealing strategy, we are falling victim to yet another
> non-optimality in division strategy. The best way to do parallel work is
> to fork out the tasks, submit everything but one task to the queue,
> recursively execute the remaining task. Pushing the last task is
> redundant, because we will pop it out on the next cycle. (Actually, as
> array stride example taught us, this can even set us up for thrashing
> the queue, if somebody stole the task under our feet).
>
> Now, rewriting the closure-heavy code to this pattern would be not easy,
> but we can solve this in TaskQueues themselves, enter here:
>    http://cr.openjdk.java.net/~shade/shenandoah/taskqueue-buff/webrev.01
>   (includes the clean_queues fix, to be pushed separately)
>
> It does the single-entry buffer before the queue, which acts almost like
> the local variable for us to pull from. It does not make sense to buffer
> more than one task, because it may impede work stealing. Buffering a
> single task is completely safe, because we are definitely are going back
> for it, and nobody else should steal.
>
> Testing: hotspot_gc_shenandoah, jcstress-all (quick), microbenchmarks
>
> Trims down the mark times for full GC (again, conc GC mark would also
> benefit from this, once we solve other bottlenecks there).
>
> == Baseline
>
> Full GC Times         = (avg =  7344.99 ms)
>    Mark                = (avg =  2802.24 ms)
>      Drain Queues      = (avg =  2800.25 ms)
>      Weak References   = (avg =     0.27 ms)
>      Class Unloading   = (avg =     0.82 ms)
>    Calculate Addresses = (avg =   824.15 ms)
>    Adjust Pointers     = (avg =  1813.09 ms)
>    Copy Objects        = (avg =  1889.87 ms)
>
> == Patched
>
> Full GC Times         = (avg =  6856.22 ms)
>    Mark                = (avg =  2338.00 ms)
>      Drain Queues      = (avg =  2336.00 ms)
>      Weak References   = (avg =     0.27 ms)
>      Class Unloading   = (avg =     0.80 ms)
>    Calculate Addresses = (avg =   819.28 ms)
>    Adjust Pointers     = (avg =  1806.19 ms)
>    Copy Objects        = (avg =  1877.28 ms)
>
> Good to go?
>
> Thanks,
> -Aleksey
>



More information about the shenandoah-dev mailing list