OrderAccess Refactoring

Fri Jan 9 14:18:15 UTC 2015

Erik,

I am delighted to have you looking at this code in detail to make sure it is accurate and maintainable. 
> On Jan 9, 2015, at 5:32 AM, Erik Österlund <erik.osterlund at lnu.se> wrote:
> 
> Hi,
> 
> I have some proposed changes to OrderAccess, thought I’d check if they are wanted. It comes in two parts: consistency fixing, and refactoring the code to a generalization/specialization approach instead to eliminate lots of redundant platform specific code which is not platform specific. It’s possible to pick only one if both are not wanted.
> 
> 2.a) OrderAccess Maintainability
> Summary: Remove platform specific code which is not platform specific (most of it) and make it shared, and separate different concerns in the code.
> 
> Different concerns are now joined to a big blob: Which barrier to use (acquire, release, fence) and where to put it w.r.t. to memory accesses, how to implement those barriers, how to make atomic memory accesses, how to specialize more efficient variants are all different concerns, now flattened out to be done over and over again for each platform even though the behaviour is very general. I propose to move everything except specialization and barrier implementation to shared code to separate their concerns, i.e. how to make atomic memory accesses and where to put the barriers w.r.t. accesses is the same for everyone except the few specialized variants with inline asm. This removes most of the code for most of the individual platforms and makes it easier to reason about the different concerns in isolation without changing stuff all over the place. The idea is to split the class into 3 classes using inheritance: OrderAccessBase, OrderAccessPlatform and OrderAccess (of course). Most of the stuff goes in OrderAccessBase, but platforms with inline assembly can put their specialized variants in OrderAccessPlatform.
In theory I like refactoring. In practice I hope there is a very clear way to specify the per-platform assumptions and to make it very obvious what the end results are
for a given platform and API.

> 
> 2.b) OrderAccess Correctness
> Summary: Relying on volatile to provide acquire/release on TSO cannot generally be guaranteed, must be fixed on all TSO except x86 linux and windows.
> 
> Correctness concerns are inconsistent over different platforms and dangerous which is my second issue with OrderAccess. Specifically, for TSO architectures like SPARC and x86, OrderAccess relies on C++ volatile for acquire release semantics, which in most cases is wrong. In every compiler I know of except MSVC 2008, this can not be relied upon as the C++ standard section 1.9 is vague whether volatiles may be reordered w.r.t. non-volatiles and has been interpreted differently by compiler implementors. It only really promises reordering constraints between volatiles which is not enough for acquire release semantics. As a matter of fact, GCC explicitly states they may be arbitrarily reordered at will, and therefore we need compiler barriers to prevent this. This problem was observed on linux and fixed by Mikael, on Linux. But the exact same problem (relying on volatile to have acquire release semantics) still lurks around on the other TSO architectures, waiting for the right moment to strike. With the refactoring, correcting such an issue (which I also intend to do because I think the current implementation is flawed) can be easily done with 3 LoC per platform - the barrier selection for memory accesses, instead of for each type of memory access on nearly every TSO platform. 
Having code reviewed (and obviously missed things) earlier rounds of OrderAccess - I just wanted to note that we never thought that C++ volatile would provide
acquire/release semantics. volatile has always been just a compiler directive, and yes volatile-volatile only. Acquire/release are instructions to the hardware - not
what I would call compiler barriers. 

Personally I would do the correctness fixes first - and make sure they are really well tested and carefully code reviewed studying the cookbook and memory model.
I have not been following the more recent java memory model evolution - I believe Aleksey Snipilev has - if you have references to any updated advice
from Doug Lea - that would be very helpful for your reviewers to study. I know the last time I reviewed this I had to make my own small charts and walk it
through based on per-platform guarantees.

I would find it helpful if you are aware of specific bugs - if you could list the actual problems and call those out? It is very hard to get this right and
hard to review - so if you were to have targeted reviews for specific fixes we could give the changes the depth of study that they need.

Doing the refactoring afterwards allows us the easier exercise of ensuring that the refactoring produces the same code that we already have studied
in depth and believe to be correct.

So as you refactor please do find a place to clarify the per-platform assumptions so if we realize our assumptions are wrong we will re-review the code. 
> 
> The refactoring will as a bonus make a later transition to C++11/14 easier when compilers are ready, also with only a few LoC. Since we are now forced to use a compiler barrier which can be a bit too restrictive for the intended semantics, a more modern compiler can instead relax this a bit without sacrificing correctness and give us the exact semantics we want.
> 
> Any idea if this is wanted, and if so both a and b or only b? IMO it’s much easier to fix the correctness issues on different platforms with that extra bit of refactoring too. ;)
thank you for offering and for asking for opinions :-)
Karen
> 
> Thanks,
> /Erik