RFR: 8189871: Refactor GC barriers to use declarative semantics

Tue Nov 28 11:21:28 UTC 2017

Hi Kim,

Since you retracted part of your first email, I will answer this email 
instead.

On 2017-11-28 03:48, Kim Barrett wrote:
>> On Nov 27, 2017, at 8:31 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>
>>> On Nov 24, 2017, at 4:17 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>>
>>> Hi Kim,
>>>
>>> On 2017-11-23 21:23, Kim Barrett wrote:
>>>> However, while re-exploring that code path, I did notice something
>>>> that bothered me.
>>>>
>>>> ModRefBarrierSet::AccessBarrier<...>::oop_atomic_cmpxchg_in_heap
>>>>   bs->template write_ref_field_pre<decorators>(addr);
>>>>   oop result = Raw::oop_atomic_cmpxchg(...);
>>>>   if (result == compare_value) {
>>>>     bs->template write_ref_field_post<decorators>(...);
>>>>   }
>>>>   return result;
>>>>
>>>> does an extra load for G1, to apply the pre-barrier.  The old code
>>>> applied the pre-barrier to the result obtained from the raw cmpxchg.
>>> In my Access API, I want the notion of pre and post write barriers introduced on ModRef level to be respected as just that - pre and post write barriers. Previous code assumed that pre-write barrier really has nothing to do with whether it comes before or after the write - instead it was assumed that a pre-write barrier is equivalent to the SATB barrier, to the extent that in some places the pre-write barrier is used as an implementation for Reference.get for G1 (a weak load!), because it happens to be a SATB barrier and has nothing to do with happening before a write.
>>>
>>> So to on the ModRef level assume that pre-write barriers can be performed *after* the write, because we know it is an SATB barrier, is an abstraction crime. That is why I apply it before the write instead like we do for stores - because it is a pre-write barrier (and if it wasn't expensive in the context of a store, I see no point in optimizing it in the context of an atomic CAS). If a specific GC wants to apply it after as a micro optimization, then they should override the ModRef implementation for their CAS to do that.
>>>
>>> To apply the pre-write barrier *after* the write for CAS specifically, is an optimization to get rid of a relaxed load from a conservatively fenced atomic operation, in a cold runtime path. That seemed like not a good idea to micro optimize and violate abstractions for to me. If we really want to optimize that code path, I see more point in applying the xor shift filter to not perform the expensive StoreLoad fence and card fiddling for writes within the same old region in the runtime path of the post-write barrier, as we do in compiled code.
>>>
>>> I hope you agree that performing the pre-write barrier as a post-write barrier for the rather chilly runtime CAS is a bad idea and not an optimization that is worth it.
>> I think this is a misunderstanding of the API. pre/post here don't
>> refer to any strict ordering of operations wrto the store; they refer
>> to the pre/post-store value at the location. (Some of our collectors
>> impose a strict ordering requirement between the store and the
>> post-barrier, but that isn't a general requirement for all collectors.
>> The collectors with such a requirement must generate appropriate code
>> to ensure the ordering.  There does appear to be a general requirement
>> that no safepoints can occur anywhere in there, but that's all.)
>>
>> I don't think it should be surprising that the pre-barrier got used
>> for SATB support in ways that don't match the notional ordering
>> constraint, since I don't think that constraint was ever intended. The
>> pre-barrier API was added as part of supporting G1, and we have no
>> examples of non-SATB collectors that do anything interesting with the
>> pre-barrier. To treat the pre-barrier as anything other than SATB
>> support is premature generalization without supporting use-cases or
>> exemplars.
>>
>> So I see this change as an unnecessary pessimization based on an
>> incorrect over-generalization. Perhaps there are some poorly chosen
>> names (actually, I wouldn't want to defend some of the names), but
>> that's not a reason to change the semantics and thereby generate worse
>> code for the sole user of this functionality.
>>
>> And while I agree that this specific pessimization probably doesn't
>> have any measurable performance impact, I'm now very worried about
>> what other similar changes may have been made.
> Hm, let me retract some of that.
>
> The above describes how I think it should work.  I don't agree with
> the strict ordering interpretation of pre/post.

Okay.

> However, it seems the C++ implementation of the G1 pre-barrier has
> always (so far as I can tell) read and enqueued the value at the given
> address.

Yes, only the compiled code changes that. More about that in next 
paragraph...

> The compiled code may be different, with (for example)
> GraphKit::pre_barrier having a pre_val argument for the old value, in
> case it is already available, so that it doesn't need to be reloaded.

The example you mention has built an abstraction to make the reordering 
of pre-write barriers okay. The abstraction is a function called 
can_move_pre_barrier, that checks if for a given BarrierSet it is okay 
to move the pre-write barrier to after the write despite being a "pre" 
write barrier. G1 says that is an okay thing to do in that function, and 
only because of that is it okay to move it as a micro optimization.

In summary, that optimization uses my interpretation of pre-write 
barriers, and has an explicit abstraction for moving it past the write. 
I do not think we want to move the pre-write barrier past the write 
without such an abstraction (we have not before), and I do not think we 
should have such an abstraction in the runtime path.

> And re-examining the C++ impl of Unsafe_CompareAndExchangeObject, I
> misread the old code this last time through. It wasn't applying the
> pre-barrier to the result obtained from the raw cmpxchg. In fact, it's
> not applying the pre-barrier at all, as I originally thought.  Doesn't
> that make this and other similar operations formerly broken for G1?

I recall that code was a bit confusing. You had a boolean on the 
oopDesc::atomic_compare_exchange_oop operation that said whether you 
wanted to perform a pre-write barrier or not inside of the CAS (with an 
implicit default value of false, which was the opposite of what all uses 
except one callsite required). If you did apply the pre-barrier, it 
would pass in the exchange_value to the update_barrier_set_pre function, 
which might look like it would do something, but that value was in fact 
just dismissed by all GCs, and G1 would instead reload the value before 
the store. And then you had to manually apply the post-write barrier 
after the CAS call if it was successful. And the corresponding operation 
for oop arrays with the same name naturally had completely different 
semantics. There was some corresponding atomic_exchange_oop operation, 
that was indeed broken, but nobody used it. So I believe it did work 
(true was passed in for whether to perform the pre-write barrier for 
Unsafe_CompareAndExchange), but it was confusing indeed, and was very 
much down to the user of the APIs to manually make sure barriers are 
executed.

Thanks,
/Erik