OrderAccess Refactoring
Erik Österlund
erik.osterlund at lnu.se
Tue Jan 13 06:41:32 UTC 2015
Hi David,
Thank you for your reply! :)
On 13/01/15 03:37, David Holmes wrote:
> The table is not a definition, it is an example of how the
> acquire/release semantics can be implemented on the different architectures:
>
> membar loadStore|loadLoad
>
> is sufficient to implement acquire, but not exactly equivalent -
> acquire/release semantics can not be expressed exactly in terms of
> loadload|loadStore etc, as acquire/release define one-way barriers
> whereas loadload etc are bi-directional.
Understood, but I just think it is a bit more than just an example
(agreeing it's an approximation of the intended semantics). This is in
fact how it actually behaves concretely and is implemented on all
platforms, as opposed to the conceptual description of acquire/release
which in general can not be relied upon outside certain narrowed
contexts which might do with being clarified. I personally think it's
useful to point that context out where acquire release is defined, so
that anyone in doubt can see what is actually guaranteed by the
implementation and what is not and hence when such semantics can be
relied upon and when they can't. If you disagree though, I'm fine with
leaving the comment as it was. ;)
> Yes conceptually fence is "back-to-back acquire and release" but that
> doesn't mean fence can necessarily be implemented by concatenating the
> implementations of acquire and release on any given platform.
And that is exactly what I believe could be weird to somebody out there.
If acquire() and release() had the semantics described, you would be
able to call them both and get a fence, but since they are assumed to
only be paired with loads and stores in a certain well defined order,
you never need the StoreLoad, hence making them not a fence, and they
shouldn't be of course.
> and you don't need the "dummy" load or store.
Agreed, I got rid of them all. :)
> Or put another way there is no acquire()/release() only
> load_acquire()/store_release() ?
Yes these are the contexts where it gives the intended semantics in the
implementation. And outside similar contexts, it makes no sense to use
it. Hopefully this is clear enough to the user.
> Sorry I got the API backwards. What we often see is:
>
> release_store(addr, val) == release(); store(addr,val);
>
> but semantically that is wrong because release() allows stores to move
> ahead of it so the RHS could be executed as:
>
> store(addr, val); release()
Actually that is semantically equivalent by all means both by
definition, current implementation and intention. And this is why I'm
stressing that acquire() and release() only make sense together with a
load and a store. Whether they are separate methods or not is
irrelevant, it has the same semantics anyway and it has to.
To demonstrate my point, let's take a simple example. If your reasoning
of splitting barriers from memory accesses is correct, then look what
happens in the following example:
1.a release store
2.b load acquire
2.c load acquire
...should be equivalent to:
1.a release
1.b store
1.c load
1.d acquire
1.e load
1.f acquire
...and if it wasn't we would need a fence in acquire (cf. 1.d) to
prevent reordering between 1.b and 1.e which the general definition of
acquire guarantees if not constrained to its load. Ouch!! ;)
But fear not, because when they are coupled with a store/load,
regardless if you use release() store() or release_store(), they are
equivalent and act /as if/ they were joined as a single instruction. And
as you see, 1.b and 1.c may reorder without violating any rules so we
don't have to do any expensive full memory barriers.
This is why I'm saying that the description we have of acquire() and
release() only make sense in a narrowed context: when acquire is only
used after a load and release only before a store, then the semantics
hold true to the definition, and whether you use the split or joined
form you get the same observed results and semantics. And in this
narrowed context where it makes sense, it's equivalent to the TSO memory
ordering, and I know you didn't like it but that can be defined as
acquire = Load(Load|Store) and release = (Load|Store)Store, in that
narrowed context where it is valid w.r.t. the original definition. And
these are also the actual guarantees in our implementation which is
helpful. :)
I know you didn't like this as a definition, but it solves the problem
of equivalence here and perfectly defines the scope of when acquire()
and release() can be used. And that is precisely when my definition is
equal to the current definition. ;)
> (Though I'd
> still prefer not to see it written this way - and that is what I want to
> fix under https://bugs.openjdk.java.net/browse/JDK-7143664 )
I do agree it's better to use the combined variant but maybe for a
different reason. :) They are actually semantically equivalent so I
disagree about that. However it is possible to have a better barrier for
the combined variant. For instance PPC has optimized acquire load, x86
on Windows don't need compiler barriers because its volatiles have
acquire/release compiler reordering semantics which is tighter etc.
And then of course, the user of the API wouldn't have to read between
the lines in the documentation about in which contexts the acquire() and
release() barriers are actually working as defined. By removing the
possibility of doing something potentially wrong, less errors might be
made! ;)
> BTW this is another point of confusion when we have an API like
> release_store() but instructions like st.rel - which side of the barrier
> is the store on, and does it matter?
Yes it does matter. The release is always before the store and the
acquire after the load - this is well defined. The reason for this is
that it matches an abstract machine with a store buffer using a FIFO
queue which is equivalent to TSO which is well understood and portable.
If they were not in this order it would be useless for this usage:
T1: write stuff
T1: release store x1
T2: load acquire x1
T2: load stuff
The release/acquire store/load to x1 should synchronize so that the
loading of stuff in T2 will see the writing of stuff in T1. Note that if
the release and acquire were on the wrong side, this would break.
> Very long, depressing history of issues with volatiles and casting with
> different compilers. So what you see is the result of what needed to be
> done to "make things work".
:)
>> surely all calls to OrderedAccess should be expected to be volatile,
>
> I agree, and I've suggested this in the past too, but was told that
> making variables volatile incurs too much performance penalty in some
> cases, so we only cast them to volatile when needed for these ops.
Performance is a non-argument IMO. Because it's only used for
store_fence, and it always requires a compiler barrier to work at least
for the fence. So volatile won't make a difference at all. Even if it
would (which it doesn't), the real deal is the fence, not the volatile
memory access (because C++ volatile is not Java volatile).
If that was the only argument, I'd like to make this consistent! :)
>> nothing else would make any sense? And if so, why is store_fence the one
>> exception that is non-volatile while none of the others are? Unless
>> anyone knows there is a good reason for this, I'd like to make it
>> consistent with the rest of the code and make it volatile as well (and
>> hence get rid of some code duplication while at it). Permission granted? ;)
>
> That case would need to be examined in detail. :)
My assessment:
1. Volatile can't impact performance in store_fence (requiring compiler
barrier anyway and volatile in C++ only does compiler level reordering,
and fences would kill any other effect even if there was one which there
is not).
2. Volatile won't break anything and create illegal reorderings, because
it's stricter than non-volatile. On the contrary it's easy to make
mistakes leading to trouble on for instance windows without it.
3. It makes the code more consistent, allows to remove several copy
pastes, and generalize making the code more readable.
If I get your permission, I'll add volatile!
>> Thank you David for a nice discussion about this! :)
>
> I'd like to say it is fun, but it really isn't :) This stuff gives me
> continual headaches.
Sorry about the headaches, I feel a bit guilty! Hopefully with this
cleanup, at least some future headache will be relieved though. :)
Thanks for the discussion, and hope this email won't make the headache
worse!
/Erik
More information about the hotspot-dev
mailing list