missing memory barrier in acmp with C2

Wed Oct 26 18:48:55 UTC 2016

So I see you took Hans' example, but his example has Thread 1 also reading
some state during construction, which can be modified by Thread 2
concurrently.  That is a problem, but your example was a bit too slimmed
down to illustrate that.

On Wednesday, October 26, 2016, Vitaly Davidovich <vitalyd at gmail.com> wrote:

>
>
> On Wed, Oct 26, 2016 at 2:15 PM, Andrew Haley <aph at redhat.com
> <javascript:_e(%7B%7D,'cvml','aph at redhat.com');>> wrote:
>
>> On 26/10/16 16:31, Vitaly Davidovich wrote:
>> > On Wednesday, October 26, 2016, Andrew Haley <aph at redhat.com
>> <javascript:_e(%7B%7D,'cvml','aph at redhat.com');>> wrote:
>> >
>> >> On 26/10/16 15:02, Vitaly Davidovich wrote:
>> >>> On Wednesday, October 26, 2016, Andrew Haley <aph at redhat.com
>> <javascript:_e(%7B%7D,'cvml','aph at redhat.com');>
>> >> <javascript:;>> wrote:
>> >>>
>> >>>> On 26/10/16 12:27, Roman Kennke wrote:
>> >>>>> Am Mittwoch, den 26.10.2016, 13:24 +0200 schrieb Roland Westrelin:
>> >>>>>> http://cr.openjdk.java.net/~roland/shenandoah/membar-acmp/we
>> brev.00/
>> >>>>>>
>> >>>>>> The code generated for acmp is missing a memory barrier.
>> >>>>>
>> >>>>> Great!
>> >>>>>
>> >>>>>> Should it be a loadstore + loadload as in
>> >>>>>> ShenandoahBarrierSet::asm_acmp_barrier() on aarch64 or simply a
>> >>>>>> loadload?
>> >>>>>
>> >>>>> I can come up with a reason for loadload, but not for loadstore, I
>> >>>>> think loadstore is not necessary there. I'd go for the less
>> restrictive
>> >>>>> fence unless we come up with a good reason not to.
>> >>>>
>> >>>> The general rule is that you can get away with loadload fences if you
>> >>>> really know what you are doing, but it is exceedingly subtle.
>> >>>>
>> >>>> Imagine this.  We have two variables, a boolean x_init and an oop
>> >>>> x.
>> >>>>
>> >>>> Thread 1:
>> >>>> <Initialize x>
>> >>>> x_init.store_release(true);
>> >>>>
>> >>>> Thread 2:
>> >>>> if (x_init.load_aquire())
>> >>>>     x.blah = y
>> >>>>
>> >>>> If you replace the load acquire with a loadload fence, the store of
>> >>>> x.blah can become visible before the initialization of x.
>> >>>
>> >>> x.blah requires a load of x (which cannot reorder with loadload)
>> >>
>> >> x is just a local, and it's in a register.  Where would you even load
>> >> it from?
>> >
>> > I don't follow - x is an oop, and x.blah is at (addr of x) + (offset of
>> > blah field).  You need to load addr of x
>>
>> Where do you suppose the addr of x is being loaded from?
>>
>> The addr of x is in a register already.  We don't need to read it
>> from a field.  It may be an argument, for example.
>
>
>> > to figure out dest addr of the store.  As written in your snippet,
>> > the load of x is after the loadload.
>>
>> It's not.
>>
> I interpreted your code as pseudocode, but you seem to be implying some
> other context.  So you're saying you constructed x in Thread1,
> store_release'd the initialization, passed the address of x to Thread2
> through memory, Thread2 read it from a field somewhere into a register, and
> now the snippet you're showing is when 'x' is already in a register?
>
>>
>> >>> and it's data dependent; unless you take something like Alpha into
>> >>> account, but that's unsupported anyway.
>> >>
>> >> Please explain.  And, while you're at it, please explain why Hans is
>> >> wrong, or why my interpretation is wrong.
>> >
>> > As mentioned above, to get x.blah address you need a load of x (or have
>> the
>> > address available already) - that's data dependent load.
>>
>> Dependent on what?
>>
> See above - your code looks like pseudocode, and x.blah seemed like
> shorthand/pseudocode for loading x and writing a new value to .blah
>
>>
>> Andrew.
>>
>>
>

-- 
Sent from my phone