[aarch64-port-dev ] RFR(S): 8087341: C2 doesn't optimize redundant memory operations with G1

Fri Feb 12 13:51:40 UTC 2016

Hi Roland,

On 12/02/16 12:36, Roland Westrelin wrote:
> Hi Andrew,
> 
>> A patch for the AArch64 C2 volatile/CAS generation code which deals
>> with the effects of your proposed C2 patch is available as a
>> webrev
>> 
>> http://cr.openjdk.java.net/~adinn/8087341-aarch64/webrev.00/
> 
> Thanks for putting that together. I didn’t expect that simple change
> to cause so much trouble.

It was my decision to employ back end rule predicates which poke around
in the graph that led to this -- it's not anything to do with your
choice. I think your fix is correct and valuable in its own right, yet
more so because it has simplified that back end code substantially.

>> n.b. I have /not/ created a separate issue for the AArch64 part of
>> this fix. I am not sure whether you want to combine it with your
>> patch or push it as a separate stage.
> 
> I can push everything together and list you as a contributor (in the
> contributed-by field) if that works for you.
> 

Yes please. I think Andrew Haley's responses so far mean that  has
agreed the AArch64 part of this change. Perhaps he can confirm?

> Vladimir, can you take another look at this? Your two objections
> were:
> 
>> Also we have specialized insert_mem_bar_volatile() if we don't want
>> wide memory affect. Why not use it?
> 
> The membar in the change takes the entire memory state as input but
> only changes raw memory. I don’t think that can be achieved with
> insert_mem_bar_volatile(). As explained by Mikael, the membar is here
> to force ordering between the oop store and the card table load.
> That’s why I think the membar’s inputs and outputs should be set up
> that way.

Not that I am an official reviewer but I agree with you here.

>> And we need to keep precedent edge link to oop store in case EA
>> eliminates related allocation.
> 
> Mikael said it’s not ok to eliminate the memory barrier if we leave
> the gc barrier.

Also in agreement with this.

For both G1GC and CMS +CondCardMark a StoreLoad barrier is necessary to
ensure that the StoreX is visible before the LoadB/StoreCM pair which
implement the conditional card mark. For these configurations AArch64
detects any MemBarVolatile associated with the card mark and inserts a
dmb ish instruction (StoreLoad implementation) before the ldrb/strb.

With CMS -CondCardMark the generic code does not insert a memory
barrier. However, for correctness on non-TSO architectures we need a
StoreStore barrier between the StoreX and the StoreCM implementing the
card mark. That ensures that these writes cannot be observed by GC
threads out of order (it might cause the GC to miss the write). This
special case is handled on AArch64 by translating StoreCM to include a
dmb ishst instruction (StoreStore implementation) before the strb.

regards,

Andrew Dinn
-----------