Release store in C2 putfield
Mikael Gerdin
mikael.gerdin at oracle.com
Fri Sep 5 13:40:38 UTC 2014
I have a short clarification on the card-marking barriers.
On Friday 05 September 2014 09.08.54 Doug Lea wrote:
> I'm trying to disentangle the many interrelated issues here,
> mainly wrt to JMM and possible revisions. A few notes/comments:
>
> 1. As far as I can see, the G1 post barrier enforces ordering,
> but the plain one (GraphKit::write_barrier_post) does not.
> On the other hand, some GC barrier-related mechanics seem to
> be strewn elsewhere, so might have this effect. In particular,
> the release inside Parse::do_put_xxx seems suspicious.
> I'd expect CMS, but not the other GCs, to have the
> same constraints as G1. I'd also expect the ordering constraints
> to sometimes have a significant overall performance cost on ARM and
> Power. (The G1 ordering enforcement changes were apparently
> performance tested only on TSO machines; see
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-October/011077.html
> and follow-ups).
> (Aside: Yet more reasons to hate card-marking.)
The reason for adding the StoreLoad for G1 is because G1 always checks if the
card is dirty before putting it on the dirty card queue and writing a 0 to it.
It's possible that CMS would need a similar StoreLoad if +UseCondCardMark is
set, but I'm not sure.
With -UseCondCardMark CMS does not need a StoreLoad, but it needs the field
write to be visible when the dirty card is visible, so I guess that is
StoreStore.
/Mikael
>
> 2. Hans Boehm has argued/demonstrated over the years (see for
> example, http://hboehm.info/c++mm/no_write_fences.html), that
> StoreStore fences, as opposed to release==(StoreStore|StoreLoad)
> fences, are too delicate and anomaly-filled to expose as a
> programming mode. But there are cases where they may come into
> play, for example as the first fence of a volatile-store
> (that also requires a trailing StoreLoad), that might be
> profitable to separate if any other internal mechanics could
> then be applied to further optimize. And even if not generally
> useful, they seem to apply to the GC post_barrier case.
>
> 3. We are indeed strongly considering simplifying the revised
> Java Memory Model to require release fencing on construction,
> not just in the presence of final fields. In the ideal implementation,
> this would require multiple fences (i.e., more than the one
> needed anyway to force object header sanity) only if "this"
> escapes within a constructor. But because some of these fences are
> currently hard-wired and so not amenable to elision/optimization,
> carrying this out on non-TSO will probably take some effort.
>
> 4. Reminder to Andrew: We cannot let the VM crash when people
> write racy/wrong code including unsafe publication.
>
> -Doug
>
> On 09/04/2014 09:30 AM, Andrew Haley wrote:
> > On 09/04/2014 01:19 PM, Bertrand Delsart wrote:
> >> On 04/09/14 12:30, Andrew Haley wrote:
> >>> On 09/04/2014 10:30 AM, Bertrand Delsart wrote:
> >>>> I'm not a C2 expert but from what I have quickly checked, an issue may
> >>>> be that we need StoreStore ordering on some platforms.
> >>>>
> >>>> This should for instance be true for cardmarking (new stored oop
> >>>> must be visible before the card is marked).
> >>>
> >>> Okay. I can live with that. Is there a corresponding read barrier in
> >>> the code which scans the card table?
> >>
> >> See for instance the storeload() in
> >> G1SATBCardTableLoggingModRefBS::write_ref_field_work
> >
> > Okay, thanks, that is tremendously helpful. I know what to look for
> > now.
> >
> >> There are in fact a lot of other barriers in concurrent card scanning
> >> and cleaning (some of them being implicit due to compare and swap
> >> operations).
> >>
> >>>> This may also be true for the oop stores in general, as initially
> >>>> discussed. [IMHO this is related to final fields, which have to be
> >>>> visible when the object escape. Barriers are the end of the
> >>>> constructors may not be sufficient if objects can can escape before
> >>>> the end of their init() method.
> >>>
> >>> I'm pretty sure we do this correctly. Are you aware of any place
> >>> (except unsafe publication, which is a programmer error) where this
> >>> might happen? We generate a barrier at the end of a constructor if
> >>> there is a final field and at the end of object creation.
> >>
> >> I agree that this is not a good programming style but I'm not sure this
> >> can always be considered a programmer error. Do you see anything in the
> >> java specification that forbid publication before the end of object
> >> creation ?
> >
> > No, but there's nothing in the Java spec which says that the language
> > will protect a programmer from themself.
> >
> >> For instance, objects may have to be linked at creation time. In
> >> general, the publication should be safe because it will hit barriers
> >> (because what the object is exported too will often needs to be
> >> protected). However, I do not think this is mandatory according to the
> >> specifications.
> >
> > That's right.
> >
> >> Now, the problem is to see what the JMM requires in that case. I'm not
> >> 100% sure that a StoreStore is needed here. This is why I said "may not
> >> be sufficient". The JSR-133 cookbook has several "(outside of
> >> constructor)" statements that might mean it is not needed (if you have a
> >> membar at the end of the constructor). However, while I'm familiar with
> >> barriers because of my runtime, GC and embedded background, I do not
> >> consider myself to be a JMM expert. I will let one chime in.
> >>
> >> Of course, from a support point of view, it may be easier to add a
> >> StoreStore semantic on oop stores (should not be too expensive, taking
> >> into account the cost of GC barriers cost and the frequency of oop
> >> store) than to investigate the kind of troubles a strange ordering can
> >> lead to and explain to the customer why his Java code must be changed
> >> for platforms with weaker memory models.
> >
> > I think programmers are going to have to get used to it. The issue of
> > safe publication is very well known, especially because of the book
> > _Java Concurrency in Practice_.
> >
> > AIUI the purpose of the JMM is to give a clear definition of the
> > memory semantics of Java that can be efficiently executed on a wide
> > variety of machines. We need Java to scale well on machines with many
> > cores, and the JMM is a good fit to that.
> >
> >> Did you measure the performance regression ?
> >
> > Yes. It is high; but I can't provide any numbers.
> >
> > Andrew.
More information about the hotspot-dev
mailing list