Release store in C2 putfield

Fri Sep 5 14:35:55 UTC 2014

On 09/05/2014 09:57 AM, Andrew Haley wrote:
> On 09/05/2014 02:08 PM, Doug Lea wrote:
>>
>> 1. As far as I can see, the G1 post barrier enforces ordering,
>> but the plain one (GraphKit::write_barrier_post) does not.
>> On the other hand, some GC barrier-related mechanics seem to
>> be strewn elsewhere, so might have this effect. In particular,
>> the release inside Parse::do_put_xxx seems suspicious.
>> I'd expect CMS, but not the other GCs, to have the
>> same constraints as G1...
>
> That G1 code isn't so bad: it is at least conditional in that the card
> is read and the memory barrier is used if and only if the card is not
> young.  The code to which I really object uses a release store for
> every card table write.

I think we (also Mikael) agree that the GC barriers (*write_barrier_post)
ought to be self-contained to reflect their actual constraints, ideally
avoiding any need to deal with them in Parse::do_put_xxx or elsewhere.
Probably this means some changes for CMS (not just G1) vs other collectors.

> either the card table needs a release or it doesn't.

A related fun fact about release per se is that you have no
assurance when that store will occur. It could be postponed
for as long as instruction scheduler of the further optimized
graph feels like doing so. I expect/hope not past a safepoint though.

>
>> 4. Reminder to Andrew: We cannot let the VM crash when people
>> write racy/wrong code including unsafe publication.
>
> There's absolutely no way I'd do that

(Of course I didn't mean to accuse you of it, just remind you of it
when contemplating what to do here!)

>
> I am arguing against every oop store being a release store, which
> seems Very Wrong to me. (This seems to be for IA64, which as far as I
> know is a private SAP target.  This code really should be marked
> IA64-ONLY, but it would be even better to reorganize the way this
> stuff is handled in HotSpot.)

> I am wondering whether to give up trying to use acquire and release
> instructions for the time being, and fall back to using explicit
> barriers.  It's rather messy, but it's good enough until this gets
> sorted out properly.
>

I'm not sure.

ARMv8 (also IA64) acquire/release specs bind some effects
to the locations. If you treat the release and store
as separable, and later combine at instruction generation, it's
superficially conceivable that you'd lose something in case you
matched different writes with different fences. (Plus, if you
cannot combine, you'd choose a plain fence or use fake thread-local
target for releasing write; although we've seen (for x86) that
choosing fake targets can be challenging...)

But I don't know of any cases (i.e., compiler transforms) in which
this could possibly matter wrt to any JMM-related guarantees.

-Doug