Release store in C2 putfield

Thu Sep 4 10:30:56 UTC 2014

On 09/04/2014 10:30 AM, Bertrand Delsart wrote:
> I'm not a C2 expert but from what I have quickly checked, an issue may 
> be that we need StoreStore ordering on some platforms.
> 
> This should for instance be true for cardmarking (new stored oop
> must be visible before the card is marked).

Okay.  I can live with that.  Is there a corresponding read barrier in
the code which scans the card table?

> This may also be true for the oop stores in general, as initially
> discussed. [IMHO this is related to final fields, which have to be
> visible when the object escape. Barriers are the end of the
> constructors may not be sufficient if objects can can escape before
> the end of their init() method.

I'm pretty sure we do this correctly.  Are you aware of any place
(except unsafe publication, which is a programmer error) where this
might happen?  We generate a barrier at the end of a constructor if
there is a final field and at the end of object creation.

I should explain: I respect all nodes, and generate all barriers,
except when there is a redundant barrier.  So, if a barrier is
immediately preceded by a load acquire I don't emit anything; likewise
for a store release.

So, for something like

    volatile int x;

    void foo() {
        while (true) x = 1;
    }

I get

  ;; membar_release (elided)
  0x0000007fa822f1e4: add	x10, x19, #0xc
  ;; 0x1
  0x0000007fa822f1e8: orr	w12, wzr, #0x1
  0x0000007fa822f1ec: stlr	w12, [x10]
  0x0000007fa822f1f0: dmb	ish             ;*putfield x
                                                ; - VolatileStore::foo at 2 (line 5)

  0x0000007fa822f1f4: adrp	x11, 0x0000007fb7ff7000
                                                ; OopMap{r19=Oop off=120}
                                                ;*goto
                                                ; - VolatileStore::foo at 5 (line 5)
                                                ;   {poll}
  0x0000007fa822f1f8: ldr	wzr, [x11]      ;*goto
                                                ; - VolatileStore::foo at 5 (line 5)
                                                ;   {poll}
  0x0000007fa822f1fc: b	0x0000007fa822f1e4

Which is correct, I think.

> Unfortunately, MemNode only defines higher level 'release' and 
> 'acquire'. This means that if you want to use MemNode to guarantee the 
> StoreStore (instead of using a separate membar), then you need to use 
> MemNode::release... which uselessly adds the LoadStore semantic.
>
> LoadStore may require a stronger barrier (for instance "DMB SY" instead 
> of "DMB ST" on ARM32).
>
> We have the same issue with some other code in hotspot. Some StoreStore 
> constraints just before a write have been implemented using 
> OrderAccess::release_store(). The later is currently slightly more 
> efficient on SPARC/x86... but this is because OrderAccess::storestore() 
> is not fully optimized (we just need to prevent C++ compiler reordering, 
> we should not have to actually generate code for storestore on TSO 
> systems). Unfortunately, on platforms with weaker memory models, 
> release_store() may be less efficient than a storestore() followed by a 
> write :-(

Indeed.  I think the back end should drive this.

> May be we need in OrderAccess and in MemNode a new store operation 
> weaker than release_store, ordering only the store wrt previous stores 
> (e.g. "membar #StoreStore; write"). This will be a simple store on TSO 
> systems (+ something to prevent compiler reordering) but will be 
> expanded to whatever is the most efficient barrier on the other systems.

It would help back-end writers tremendously if there were more kinds
of MemNodes, preferably annotated in such a way that we could tell
what each barrier was for.  That way we could tell whether this is a
barrier at the end of object creation, etc.  And I could simply not
generate anything for barriers that are unneeded because of store
release instructions.

Andrew.