OrderAccess Refactoring

Tue Jan 13 06:41:32 UTC 2015

Hi David,

Thank you for your reply! :)

On 13/01/15 03:37, David Holmes wrote:
> The table is not a definition, it is an example of how the
> acquire/release semantics can be implemented on the different architectures:
>
> membar loadStore|loadLoad
>
> is sufficient to implement acquire, but not exactly equivalent -
> acquire/release semantics can not be expressed exactly in terms of
> loadload|loadStore etc, as acquire/release define one-way barriers
> whereas loadload etc are bi-directional.

Understood, but I just think it is a bit more than just an example 
(agreeing it's an approximation of the intended semantics). This is in 
fact how it actually behaves concretely and is implemented on all 
platforms, as opposed to the conceptual description of acquire/release 
which in general can not be relied upon outside certain narrowed 
contexts which might do with being clarified. I personally think it's 
useful to point that context out where acquire release is defined, so 
that anyone in doubt can see what is actually guaranteed by the 
implementation and what is not and hence when such semantics can be 
relied upon and when they can't. If you disagree though, I'm fine with 
leaving the comment as it was. ;)

> Yes conceptually fence is "back-to-back acquire and release" but that
> doesn't mean fence can necessarily be implemented by concatenating the
> implementations of acquire and release on any given platform.

And that is exactly what I believe could be weird to somebody out there. 
If acquire() and release() had the semantics described, you would be 
able to call them both and get a fence, but since they are assumed to 
only be paired with loads and stores in a certain well defined order, 
you never need the StoreLoad, hence making them not a fence, and they 
shouldn't be of course.

 > and you don't need the "dummy" load or store.

Agreed, I got rid of them all. :)

> Or put another way there is no acquire()/release() only
> load_acquire()/store_release() ?

Yes these are the contexts where it gives the intended semantics in the 
implementation. And outside similar contexts, it makes no sense to use 
it. Hopefully this is clear enough to the user.

> Sorry I got the API backwards. What we often see is:
>
> release_store(addr, val) == release(); store(addr,val);
>
> but semantically that is wrong because release() allows stores to move
> ahead of it so the RHS could be executed as:
>
> store(addr, val); release()

Actually that is semantically equivalent by all means both by 
definition, current implementation and intention. And this is why I'm 
stressing that acquire() and release() only make sense together with a 
load and a store. Whether they are separate methods or not is 
irrelevant, it has the same semantics anyway and it has to.

To demonstrate my point, let's take a simple example. If your reasoning 
of splitting barriers from memory accesses is correct, then look what 
happens in the following example:

1.a release store
2.b load acquire
2.c load acquire

...should be equivalent to:

1.a release
1.b store
1.c load
1.d acquire
1.e load
1.f acquire

...and if it wasn't we would need a fence in acquire (cf. 1.d) to 
prevent reordering between 1.b and 1.e which the general definition of 
acquire guarantees if not constrained to its load. Ouch!! ;)

But fear not, because when they are coupled with a store/load, 
regardless if you use release() store() or release_store(), they are 
equivalent and act /as if/ they were joined as a single instruction. And 
as you see, 1.b and 1.c may reorder without violating any rules so we 
don't have to do any expensive full memory barriers.

This is why I'm saying that the description we have of acquire() and 
release() only make sense in a narrowed context: when acquire is only 
used after a load and release only before a store, then the semantics 
hold true to the definition, and whether you use the split or joined 
form you get the same observed results and semantics. And in this 
narrowed context where it makes sense, it's equivalent to the TSO memory 
ordering, and I know you didn't like it but that can be defined as 
acquire = Load(Load|Store) and release = (Load|Store)Store, in that 
narrowed context where it is valid w.r.t. the original definition. And 
these are also the actual guarantees in our implementation which is 
helpful. :)

I know you didn't like this as a definition, but it solves the problem 
of equivalence here and perfectly defines the scope of when acquire() 
and release() can be used. And that is precisely when my definition is 
equal to the current definition. ;)

> (Though I'd
> still prefer not to see it written this way - and that is what I want to
> fix under https://bugs.openjdk.java.net/browse/JDK-7143664 )

I do agree it's better to use the combined variant but maybe for a 
different reason. :) They are actually semantically equivalent so I 
disagree about that. However it is possible to have a better barrier for 
the combined variant. For instance PPC has optimized acquire load, x86 
on Windows don't need compiler barriers because its volatiles have 
acquire/release compiler reordering semantics which is tighter etc.

And then of course, the user of the API wouldn't have to read between 
the lines in the documentation about in which contexts the acquire() and 
release() barriers are actually working as defined. By removing the 
possibility of doing something potentially wrong, less errors might be 
made! ;)

> BTW this is another point of confusion when we have an API like
> release_store() but instructions like st.rel - which side of the barrier
> is the store on, and does it matter?

Yes it does matter. The release is always before the store and the 
acquire after the load - this is well defined. The reason for this is 
that it matches an abstract machine with a store buffer using a FIFO 
queue which is equivalent to TSO which is well understood and portable. 
If they were not in this order it would be useless for this usage:

T1: write stuff
T1: release store x1

T2: load acquire x1
T2: load stuff

The release/acquire store/load to x1 should synchronize so that the 
loading of stuff in T2 will see the writing of stuff in T1. Note that if 
the release and acquire were on the wrong side, this would break.

> Very long, depressing history of issues with volatiles and casting with
> different compilers. So what you see is the result of what needed to be
> done to "make things work".

:)

>> surely all calls to OrderedAccess should be expected to be volatile,
>
> I agree, and I've suggested this in the past too, but was told that
> making variables volatile incurs too much performance penalty in some
> cases, so we only cast them to volatile when needed for these ops.

Performance is a non-argument IMO. Because it's only used for 
store_fence, and it always requires a compiler barrier to work at least 
for the fence. So volatile won't make a difference at all. Even if it 
would (which it doesn't), the real deal is the fence, not the volatile 
memory access (because C++ volatile is not Java volatile).

If that was the only argument, I'd like to make this consistent! :)

>> nothing else would make any sense? And if so, why is store_fence the one
>> exception that is non-volatile while none of the others are? Unless
>> anyone knows there is a good reason for this, I'd like to make it
>> consistent with the rest of the code and make it volatile as well (and
>> hence get rid of some code duplication while at it). Permission granted? ;)
>
> That case would need to be examined in detail. :)

My assessment:
1. Volatile can't impact performance in store_fence (requiring compiler 
barrier anyway and volatile in C++ only does compiler level reordering, 
and fences would kill any other effect even if there was one which there 
is not).
2. Volatile won't break anything and create illegal reorderings, because 
it's stricter than non-volatile. On the contrary it's easy to make 
mistakes leading to trouble on for instance windows without it.
3. It makes the code more consistent, allows to remove several copy 
pastes, and generalize making the code more readable.

If I get your permission, I'll add volatile!

>> Thank you David for a nice discussion about this! :)
>
> I'd like to say it is fun, but it really isn't :) This stuff gives me
> continual headaches.

Sorry about the headaches, I feel a bit guilty! Hopefully with this 
cleanup, at least some future headache will be relieved though. :)

Thanks for the discussion, and hope this email won't make the headache 
worse!

/Erik