OrderAccess Refactoring

Fri Jan 9 21:17:59 UTC 2015

[Hoping to fix nesting of comments removed by server so you can actually 
see who said what in previous post :-|]

Hi Karen,

Thanks for the reply!

On 09 Jan 2015, at 15:18, Karen Kinnear 
<karen.kinnear at oracle.com<mailto:karen.kinnear at oracle.com>> wrote:

> Erik,
>
> I am delighted to have you looking at this code in detail to make sure it is accurate and maintainable.

:)

> In theory I like refactoring. In practice I hope there is a very clear way to specify the per-platform assumptions and to make it very obvious what the end results are
> for a given platform and API.

For each platform you only have to specify two things:

1. What is needed for acquire, release and fence for memory accesses, 
for that platform (5 LoC per platform). The pattern of when to use them 
(semantics) is generalized in shared code.

2. To provide any specializations for optimization purposes when 
applicable (certain TSO platforms that can join the access and semantics 
to a single inline asm instruction). Note that this is optional, and 
only done to get faster accesses. Many platforms do not do it at all.

I think it’s very intuitive! Will provide webrev if this seems worthwhile.

> Having code reviewed (and obviously missed things) earlier rounds of OrderAccess - I just wanted to note that we never thought that C++ volatile would provide
> acquire/release semantics. volatile has always been just a compiler directive, and yes volatile-volatile only. Acquire/release are instructions to the hardware - not
> what I would call compiler barriers.

Acquire/release are semantics, not necessarily bound to neither 
compiler/hardware. It gives a certain contract to the user that expects 
these acquire/release semantics. Compiler barriers and fences is merely 
a means of providing these semantics, but the user should not have to 
care about such details.

If what you are saying is that it was intended only as acquire/release 
for hardware but not constrain compiler reorderings to comply with the 
semantics, then the contract is broken (does in fact not promise acquire 
release, only on hardware level but compilers could do whatever with 
non-volatiles which is equally as bad).

A store release should synchronize with an acquire load to the same 
location, like in this example:

T1: store x1
T1: release store x2

T2: load acquire x2
T2: load x1

The semantics should guarantee that if load acquire x2 happens after 
store release x2 in total order, then it is guaranteed that store x1 
will be observed by the load x1, and not some old value that was there 
before store x1. This contract does not currently hold since the normal 
non-volatile memory accesses may be reordered with volatile memory 
accesses; the stores and loads may be reordered in this example by the 
compiler at will.

Therefore acquire release semantics are not guaranteed in general, but 
only w.r.t. volatiles. This is neither clearly documented nor used in 
such a way.

OrderedAccess is used in a way that assumes acquire/release semantics. 
But for certain TSO platforms, the contract only provides volatile 
semantics, not acquire release semantics. On other platforms, it 
properly provides acquire release semantics, which is expected.

It only makes sense to by contract promise acquire/release semantics 
(both for compiler and hardware reorderings) for uses of these methods, 
and consistently do so on all platforms.

> Personally I would do the correctness fixes first - and make sure they are really well tested and carefully code reviewed studying the cookbook and memory model.
> I have not been following the more recent java memory model evolution - I believe Aleksey Snipilev has - if you have references to any updated advice
> from Doug Lea - that would be very helpful for your reviewers to study. I know the last time I reviewed this I had to make my own small charts and walk it
> through based on per-platform guarantees.

There is nothing wrong with the Java memory model. The only problem is 
that OrderAccess promises semantics that are only guaranteed on some 
platforms.

> I would find it helpful if you are aware of specific bugs - if you could list the actual problems and call those out? It is very hard to get this right and
> hard to review - so if you were to have targeted reviews for specific fixes we could give the changes the depth of study that they need.

In for instance https://bugs.openjdk.java.net/browse/JDK-8061964 this 
was observed and caused trouble for G1 card refinement that went boom. 
In the discussion I pointed out the issue of relying on volatiles as a 
cause for the crash, and it was fixed, but not consistently for all 
platforms, but only for gcc on linux where the reordering was observed 
in the generated asm output. But it is equally as dangerous on other 
(TSO) platforms (except windows where the compiler actually explicitly 
promises not to reorder - their volatiles follow acquire/release in 
compiler reorderings). I think hoping for for compilers to not reorder 
without any guarantees, waiting for things to go boom where it has 
already been observed on other platforms seems dangerous. Better safe 
than sorry. :)

One different solution is to redocument the contract clearly as being 
acquire/release w.r.t. volatile accesses, which is currently the actual 
contract provided by the implementation.
But since bugs were already discovered (which is very understandable 
because no other atomic library has such semantics that I know of) and 
its code is still unchanged, it seems dangerous to do so without 
re-evaluating the use of OrderedAccess in hotspot, which seems a bit 
painful. Also such a revised contract would be a disaster when you need 
to copy loads of data and then store release for instance. This would 
force the copying (of potentially lots of data) to either be done using 
volatile accesses rather than in an accelerated fashion (e.g. SSE 
extensions, write combining), or add compiler barriers to such copying 
to give the release store actual release store semantics. This would 
also require fixing G1 where a bug was observed to not rely on the 
intuitive and expected contract, but on an unintuitive and provably 
error prone contract. Ugh!!

The only solution that seems to make sense to me is to fix the semantics 
of OrderedAccess (that were already changed on Linux) to be what is 
expected, and that is acquire/release w.r.t. other accesses (volatile or 
non-volatile).

> Doing the refactoring afterwards allows us the easier exercise of ensuring that the refactoring produces the same code that we already have studied
> in depth and believe to be correct.
>
> So as you refactor please do find a place to clarify the per-platform assumptions so if we realize our assumptions are wrong we will re-review the code.

I was hoping to do both at the same time, since there are many platforms 
to be changed, and making them over and over for every type of access on 
every platform to be changed is a bit tedious (and error prone) compared 
to doing it once per platform, reusing the shared pattern (to be 
reviewed once instead of once per platform) and review the choice of 
barriers once for each platform instead of for every memory access on 
every platform. Of course I don’t mind if we choose to do things in the 
other order though if that is preferred. :)

If you compare the change in semantics I propose, without thinking about 
the refactoring, just compare x86 on bsd and linux, you see that on 
linux a compiler barrier is correctly used, but not on bsd. This is the 
change that needs to be added to all TSO platforms except linux and 
windows to ensure correctness.

thank you for offering and for asking for opinions :-)

Thank you for the good feedback! It looks like this change seems wanted, 
so I’ll go ahead and prepare a webrev with the proposed changes. :)

Thanks,
Erik