RFR(S): 8229422: Taskqueue: Outdated selection of weak memory model platforms

Doerr, Martin martin.doerr at sap.com
Wed Jan 22 15:01:01 UTC 2020


Hi Andrew and David,

the scenario for which these barriers are needed is not so trivial:

Thread1: set bottom (push)
Thread2: read age, read bottom, set age (pop_global)
Thread3: read age, read bottom (pop_global)

The requirement is that Thread3 must never read an older bottom value than Thread2 after Thread3 has seen the age value from Thread2.

The age is updated by cmpxchg in pop_global which already implies strict ordering so there's no extra release barrier.
I'd rather choose OrderAccess::loadload in pop_global, but I don't think it's a big deal.

I'll be glad if somebody from GC can double-check if I remembered this stuff correctly.

Best regards,
Martin


> -----Original Message-----
> From: David Holmes <david.holmes at oracle.com>
> Sent: Mittwoch, 22. Januar 2020 12:59
> To: Andrew Haley <aph at redhat.com>; Doerr, Martin
> <martin.doerr at sap.com>; Derek White <derekw at marvell.com>; hotspot-
> gc-dev at openjdk.java.net; Kim Barrett <kim.barrett at oracle.com>
> Subject: Re: RFR(S): 8229422: Taskqueue: Outdated selection of weak
> memory model platforms
> 
> On 22/01/2020 8:52 pm, Andrew Haley wrote:
> > On 1/15/20 1:00 AM, David Holmes wrote:
> >> On 15/01/2020 2:15 am, Andrew Haley wrote:
> >>> On 1/14/20 3:52 PM, Doerr, Martin wrote:
> >>>
> >>>> good catch. I think you're right. A multi-copy-atomic, but weak
> >>>> architecture (e.g. aarch64) needs an instruction which orders both
> >>>> volatile loads.
> >>>
> >>> Good, I thought so.
> >>>
> >>> Given that TSO machines define OrderAccess::acquire() as no more than
> >>> a compiler barrier, I believe that we could do something like
> >>>
> >>> #ifdef CPU_MULTI_COPY_ATOMIC
> >>>      OrderAccess::acquire();
> >>> #else
> >>>      OrderAccess::fence();
> >>> #endif
> >>
> >> "acquire" isn't used to order loads it is used to pair with a "release"
> >> associated with the store of the variable now being loaded.
> >>
> >> If this is the code referred to:
> >>
> >>     Age oldAge = _age.get();
> >>     // Architectures with weak memory model require a barrier here
> >>     // to guarantee that bottom is not older than age,
> >>     // which is crucial for the correctness of the algorithm.
> >> #ifndef CPU_MULTI_COPY_ATOMIC
> >>     OrderAccess::fence();
> >> #endif
> >>     uint localBot = Atomic::load_acquire(&_bottom);
> >>
> >> then I think there is an assumption (perhaps incorrect) that the
> >> load_acquire will prevent reordering as well as performing the necessary
> >> "acquire" semantics.
> >
> > It depends on how _age is written to.
> >
> > As far as I can see there is no ordering between setting _bottom and
> setting
> > _age,
> >
> >    void set_empty() {
> >      _bottom = 0;
> >      _age.set(0);
> >    }
> >
> > so it looks like any kind of fence on the reader side is pointless anyway. In
> > that case, I don't know why we're doing any of this if it doesn't matter
> > what order the reader threads see updates to _age and _bottom.
> >
> > It's all rather baffling. _bottom is declared volatile, as is _age, so I
> > guess there must be some ordering requirements, but no fences on the
> > writing side to enforce it.
> >
> > What actually are the ordering requirements between _bottom and _age?
> 
> I'm assuming the ordering requirement is to preserve the order as
> expressed in the code. There is likely an assumption that by declaring
> both as volatile that the the compiler will not reorder them; and that
> the load_acquire will prevent the hardware from reordering them. I'm not
> sure if either of those assumptions are actually valid.
> 
> But that doesn't explain the complete lack of barriers in set_empty.
> 
> The GC folk will need to chime in on the detailed semantic requirements
> of this algorithm.
> 
> David


More information about the hotspot-gc-dev mailing list