RFR (XS): 8188877: Improper synchronization in offer_termination

Thu Nov 16 08:41:51 UTC 2017

On Mon, 2017-10-09 at 21:44 -0400, Kim Barrett wrote:
> > On Oct 9, 2017, at 8:20 PM, White, Derek <Derek.White at cavium.com>
> > wrote:
> > > src/hotspot/share/gc/shared/taskqueue.cpp
> > > 
> > > The change to make _offered_termination volatile prevents the
> > > compiler
> > > from hoisting it out of the loop, but I'm not sure it is enough
> > > to prevent
> > > unnecessary delays in recognizing the termination condition.
> > > In particular, while offer_termination is spinning, it's not
> > > clear there's
> > > anything to ensure an update by some other thread will become
> > > visible
> > > during the repeated spin cycles.  yield and sleep seem likely to
> > > involve fences
> > > that will make such updates visible, but not so much for
> > > SpinPause.
> > 
> > [Derek] 
> > The concurrent access around _offered_termination is (semi*)-
> > independent of other data accesses. So I'm not sure if more
> > ordering constraints will help much here?
> > 
> > The atomic incr/dec operations provide a fence to publish the
> > change. On aarch64 this is a DMB instruction. A full sync (DSB)
> > will also publish the change, but blocks until the publish has been
> > acked. But it doesn't publish any faster.
> > 
> > The only extra thing we could do is read _offered_termination using
> > OrderAccess::load_acquire(). I think the only thing this adds in
> > this case is ordering between the loads of _offered_termination in
> > different  loop iterations. Is this required or helpful? 
> 
> Consider the case where there is no further work to do, and all
> threads are heading toward offer_termination.  Until the last thread
> makes the offer, the others will be in the spin1...spinN/yield/sleep
> cycle, as expected.  But because there appear to be no memory
> barriers in the spin part of that cycle, a thread in the spin part
> might not see the final offer until it completes the full spin cycle
> and then performs the yield (assuming yield does have sufficient
> memory barriers to acquire the published value).
> 
> So I think the read of _offered_termination for comparison with
> _n_threads probably ought to be an acquire.

  I agree with Derek that adding an acquire for the load of
_offered_termination does not seem to improve termination properties,
and is imho unnecessary (but the volatile declaration of course is).

Thanks,
  Thomas