RFR(S) 8205921: Optimizing best-of-2 work stealing queue selection

Thu Jul 5 07:54:42 UTC 2018

Hi,

On Thu, 2018-07-05 at 01:15 -0400, Kim Barrett wrote:
> > On Jun 27, 2018, at 2:39 PM, Zhengyu Gu <zgu at redhat.com> wrote:
> > 
> > Hi,
> > 
> > Please review this small enhancement base on paper [1], that keeps
> > the last successfully stolen queue as one of best-of-2 candidates
> > for work stealing.
> > 
> > Based on experiments done by Thomas Schatzl and myself, it shows
> > positive impacts on task termination and average pause time.
> > 
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8205921
> > Webrev: http://cr.openjdk.java.net/~zgu/8205921/webrev.00/index.htm
> > l
> > 
> > 
> > Test:
> >  hotspot_gc on Linux 64 (fastdebug and release)
> > 
> > 
> > [1] Characterizing and Optimizing Hotspot Parallel Garbage
> >    Collection on Multicore Systems
> >    http://ranger.uta.edu/~jrao/papers/EuroSys18.pdf
> > 
> > Thanks,
> > 
> > -Zhengyu
> 
> Once set, _last_stolen_queues entries are never invalidated.  So we
> may as well initialize the entries to queue_num+1 mod num_queues.
> Then get rid of the is_valid test (and the whole notion of validity)
> and the (only used once per queue_num in the webrev change) random
> selection of k1.
> 
> But I think that might not be desirable.  The webrev change's
> behavior is to always use the queue chosen for the last steal attempt
> as one of the two, even if the last steal attempt failed.  And
> because the choice of which of the two to try next prefers that one
> when they are both empty, we may be reduced to searching with only
> one random choice for a while, even though the one we keep using has
> repeatedly failed to yield a result.
> 
> An alternative that might be better is, whenever a pop_global fails,
> reset the associated last_stolen id to invalid.  This will revert to
> 2 random choices until we find (at least) one with something we can
> steal. Actually, it seems the referenced paper does something
> similar, and the webrev code doesn't match the referenced paper.

That may explain why my perf results are different to the paper that I
was planning to investigate :) Nice find.

> Why do the last_queue array entries need to be padded? Why not just
> add a _last_stolen_queue member to TaskQueueSuper?

The _last_stolen_queue is associated to a (stealing) thread, not the
queue. Multiple threads might have the same queue as current steal
target.

One other option I discussed is instead of this array of PaddedQueueId
(which I would rename as TaskQueueThreadLocal or
TaskQueueStealLocals/Context because I can see adding more in the
future) would be passing this around like the seed parameter to
steal_best_of_2 (and actually put the seed parameter in there too).

It's a bit weird to me to pass two different kinds of thread locals
related to work stealing two different ways.

The padding is to avoid potential false sharing issues as otherwise
last_stolen_id's of different threads end up on the same cache line.
And the writes of different threads to disjoint locations would likely
invalidate the cache line all the time. Just to avoid a potential
performance issue here.

> I think it is a pre-existing bug that GenericTaskQueueSet::_n is of
> type uint, but the associated constructor argument is of type int.  I
> think the constructor is wrong in this regard.

- please use CamelCase for the INVALID_QUEUE_ID constant.

- there are some superfluous spaces at the end-of-line, but that would
be flushed out before pushing anyway.

Thanks,
  Thomas