RFR: 8189871: Refactor GC barriers to use declarative semantics

Tue Nov 28 02:48:14 UTC 2017

> On Nov 27, 2017, at 8:31 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
> 
>> On Nov 24, 2017, at 4:17 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>> 
>> Hi Kim,
>> 
>> On 2017-11-23 21:23, Kim Barrett wrote:
>>> However, while re-exploring that code path, I did notice something
>>> that bothered me.
>>> 
>>> ModRefBarrierSet::AccessBarrier<...>::oop_atomic_cmpxchg_in_heap
>>>  bs->template write_ref_field_pre<decorators>(addr);
>>>  oop result = Raw::oop_atomic_cmpxchg(...);
>>>  if (result == compare_value) {
>>>    bs->template write_ref_field_post<decorators>(...);
>>>  }
>>>  return result;
>>> 
>>> does an extra load for G1, to apply the pre-barrier.  The old code
>>> applied the pre-barrier to the result obtained from the raw cmpxchg.
>> 
>> In my Access API, I want the notion of pre and post write barriers introduced on ModRef level to be respected as just that - pre and post write barriers. Previous code assumed that pre-write barrier really has nothing to do with whether it comes before or after the write - instead it was assumed that a pre-write barrier is equivalent to the SATB barrier, to the extent that in some places the pre-write barrier is used as an implementation for Reference.get for G1 (a weak load!), because it happens to be a SATB barrier and has nothing to do with happening before a write.
>> 
>> So to on the ModRef level assume that pre-write barriers can be performed *after* the write, because we know it is an SATB barrier, is an abstraction crime. That is why I apply it before the write instead like we do for stores - because it is a pre-write barrier (and if it wasn't expensive in the context of a store, I see no point in optimizing it in the context of an atomic CAS). If a specific GC wants to apply it after as a micro optimization, then they should override the ModRef implementation for their CAS to do that.
>> 
>> To apply the pre-write barrier *after* the write for CAS specifically, is an optimization to get rid of a relaxed load from a conservatively fenced atomic operation, in a cold runtime path. That seemed like not a good idea to micro optimize and violate abstractions for to me. If we really want to optimize that code path, I see more point in applying the xor shift filter to not perform the expensive StoreLoad fence and card fiddling for writes within the same old region in the runtime path of the post-write barrier, as we do in compiled code.
>> 
>> I hope you agree that performing the pre-write barrier as a post-write barrier for the rather chilly runtime CAS is a bad idea and not an optimization that is worth it.
> 
> I think this is a misunderstanding of the API. pre/post here don't
> refer to any strict ordering of operations wrto the store; they refer
> to the pre/post-store value at the location. (Some of our collectors
> impose a strict ordering requirement between the store and the
> post-barrier, but that isn't a general requirement for all collectors.
> The collectors with such a requirement must generate appropriate code
> to ensure the ordering.  There does appear to be a general requirement
> that no safepoints can occur anywhere in there, but that's all.)
> 
> I don't think it should be surprising that the pre-barrier got used
> for SATB support in ways that don't match the notional ordering
> constraint, since I don't think that constraint was ever intended. The
> pre-barrier API was added as part of supporting G1, and we have no
> examples of non-SATB collectors that do anything interesting with the
> pre-barrier. To treat the pre-barrier as anything other than SATB
> support is premature generalization without supporting use-cases or
> exemplars.
> 
> So I see this change as an unnecessary pessimization based on an
> incorrect over-generalization. Perhaps there are some poorly chosen
> names (actually, I wouldn't want to defend some of the names), but
> that's not a reason to change the semantics and thereby generate worse
> code for the sole user of this functionality.
> 
> And while I agree that this specific pessimization probably doesn't
> have any measurable performance impact, I'm now very worried about
> what other similar changes may have been made.

Hm, let me retract some of that.

The above describes how I think it should work.  I don't agree with
the strict ordering interpretation of pre/post.

However, it seems the C++ implementation of the G1 pre-barrier has
always (so far as I can tell) read and enqueued the value at the given
address.

The compiled code may be different, with (for example)
GraphKit::pre_barrier having a pre_val argument for the old value, in
case it is already available, so that it doesn't need to be reloaded.

And re-examining the C++ impl of Unsafe_CompareAndExchangeObject, I
misread the old code this last time through. It wasn't applying the
pre-barrier to the result obtained from the raw cmpxchg. In fact, it's
not applying the pre-barrier at all, as I originally thought.  Doesn't
that make this and other similar operations formerly broken for G1?