RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection

Thu Jul 5 19:47:49 UTC 2018

Hi,

On Thu, 2018-07-05 at 14:49 -0400, Kim Barrett wrote:
> > On Jul 5, 2018, at 12:08 PM, Zhengyu Gu <zgu at redhat.com> wrote:
> > 
> > Hi Kim and Thomas,
> > 
> > Thanks for reviewing.
> > 
> > On 07/05/2018 03:54 AM, Thomas Schatzl wrote:
> > > Hi,
> > > On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote:
> > > > > On Jun 27, 2018, at 2:39 PM, Zhengyu Gu <zgu at redhat.com>
> > > > > wrote:
> > > > 
> > > > […]
> > > > An alternative that might be better is, whenever a pop_global
> > > > fails, reset the associated last_stolen id to invalid.  This
> > > > will revert to 2 random choices until we find (at least) one
> > > > with something we can steal. Actually, it seems the referenced
> > > > paper does something similar, and the webrev code doesn't match
> > > > the referenced paper.
> > > 
> > > That may explain why my perf results are different to the paper
> > > that I was planning to investigate :) Nice
> > > find.
> > 
> > Sorry, my bad.

I have been looking into this a bit and finally (with some patch from
me that fixes the changes too) and some additional probes (using the
TASKQUEUE_STATS "infrastructure") I am starting to get meaningful
results.

More about that later.

In any case the technique looks like a nice improvement at least in
steal attempts and steal/steal attempts ratio on some bigger tests, but
I need to update my code again it seems :)

I can add the changes to the TASKQUEUE_STATS logging later btw.

> > 
> > […]
> > Updated webrev:
> > 
> > http://cr.openjdk.java.net/~zgu/8205921/webrev.01/index.html
> 
> src/hotspot/share/gc/shared/taskqueue.inline.hpp
>  255     if (sz2 > sz1) {
>  256       sel_k = k2;
>  257       suc = _queues[k2]->pop_global(t);
>  258     } else {
>  259       sel_k = k1;
>  260       suc = _queues[k1]->pop_global(t);
>  261     }
> 
> The paper avoids the steal attempt when both potential victims have a
> size of zero, e.g. insert another clause:
> 
>   } else if (sz1 == 0) {
>     sel_k = k1;  // Might be needed to avoid uninitialized variable
> warnings?
>     suc = false;
>   } else {
>     ...
> 
> There is a race condition between obtaining the size and checking it
> here, but I don't think that's important.  The point is to avoid an
> expensive steal attempt when it is very likely to fail.
> 

There is another bug in the existing code: current Hotspot collectors
all reuse a single task queue set. So since the queue id's are only
initialized once at startup, there will be some initial use of a
suboptimal queue.

I assume Shenandoah does not need a reset because it creates new
taskqueuesets whenever it needs them (and frees them afterwards).

The current design of passing stealing-local information (the seed)
makes it clear that the owner of that variable needs to initialize it.

At this time I have no preference on Kim's suggestion to put these
variables into the queue if you asked me. I would tend to encapsulate
the mechanism as much as possible though.

I do think if Shenandoah wants to put this information into the
GCThreadLocalBlock (for what reason?) it is probably most flexible to
pass these things as kind of context to the steal_best_of_2() method. I
do not think it is desirable to have a second copy of the taskqueue*
code around; I can't see how else one implementation can use the
TaskQueueSet local queue ids and the other use the same information
from somewhere else right now.

Thanks,
  Thomas