OrderAccess Refactoring

Fri Jan 9 16:07:54 UTC 2015

Hi Karen,

Thanks for the reply!

On 09 Jan 2015, at 15:18, Karen Kinnear <karen.kinnear at oracle.com<mailto:karen.kinnear at oracle.com>> wrote:

Erik,

I am delighted to have you looking at this code in detail to make sure it is accurate and maintainable.

:)

In theory I like refactoring. In practice I hope there is a very clear way to specify the per-platform assumptions and to make it very obvious what the end results are
for a given platform and API.

For each platform you only have to specify two things:

1. What is needed for acquire, release and fence for memory accesses, for that platform (5 LoC per platform). The pattern of when to use them (semantics) is generalized in shared code.

2. To provide any specializations for optimization purposes when applicable (certain TSO platforms that can join the access and semantics to a single inline asm instruction). Note that this is optional, and only done to get faster accesses. Many platforms do not do it at all.

I think it’s very intuitive! Will provide webrev if this seems worthwhile.

Having code reviewed (and obviously missed things) earlier rounds of OrderAccess - I just wanted to note that we never thought that C++ volatile would provide
acquire/release semantics. volatile has always been just a compiler directive, and yes volatile-volatile only. Acquire/release are instructions to the hardware - not
what I would call compiler barriers.

Acquire/release are semantics, not necessarily bound to neither compiler/hardware. It gives a certain contract to the user that expects these acquire/release semantics. Compiler barriers and fences is merely a means of providing these semantics, but the user should not have to care about such details.

If what you are saying is that it was intended only as acquire/release for hardware but not constrain compiler reorderings to comply with the semantics, then the contract is broken (does in fact not promise acquire release, only on hardware level but compilers could do whatever with non-volatiles which is equally as bad).

A store release should synchronize with an acquire load to the same location, like in this example:

T1: store x1
T1: release store x2

T2: load acquire x2
T2: load x1

The semantics should guarantee that if load acquire x2 happens after store release x2 in total order, then it is guaranteed that store x1 will be observed by the load x1, and not some old value that was there before store x1. This contract does not currently hold since the normal non-volatile memory accesses may be reordered with volatile memory accesses; the stores and loads may be reordered in this example by the compiler at will.

Therefore acquire release semantics are not guaranteed in general, but only w.r.t. volatiles. This is neither clearly documented nor used in such a way.

OrderedAccess is used in a way that assumes acquire/release semantics. But for certain TSO platforms, the contract only provides volatile semantics, not acquire release semantics. On other platforms, it properly provides acquire release semantics, which is expected.

It only makes sense to by contract promise acquire/release semantics (both for compiler and hardware reorderings) for uses of these methods, and consistently do so on all platforms.

Personally I would do the correctness fixes first - and make sure they are really well tested and carefully code reviewed studying the cookbook and memory model.
I have not been following the more recent java memory model evolution - I believe Aleksey Snipilev has - if you have references to any updated advice
from Doug Lea - that would be very helpful for your reviewers to study. I know the last time I reviewed this I had to make my own small charts and walk it
through based on per-platform guarantees.

There is nothing wrong with the Java memory model. The only problem is that OrderAccess promises semantics that are only guaranteed on some platforms.

I would find it helpful if you are aware of specific bugs - if you could list the actual problems and call those out? It is very hard to get this right and
hard to review - so if you were to have targeted reviews for specific fixes we could give the changes the depth of study that they need.

In for instance https://bugs.openjdk.java.net/browse/JDK-8061964 this was observed and caused trouble for G1 card refinement that went boom. In the discussion I pointed out the issue of relying on volatiles as a cause for the crash, and it was fixed, but not consistently for all platforms, but only for gcc on linux where the reordering was observed in the generated asm output. But it is equally as dangerous on other (TSO) platforms (except windows where the compiler actually explicitly promises not to reorder - their volatiles follow acquire/release in compiler reorderings). I think hoping for for compilers to not reorder without any guarantees, waiting for things to go boom where it has already been observed on other platforms seems dangerous. Better safe than sorry. :)

One different solution is to redocument the contract clearly as being acquire/release w.r.t. volatile accesses, which is currently the actual contract provided by the implementation.
But since bugs were already discovered (which is very understandable because no other atomic library has such semantics that I know of) and its code is still unchanged, it seems dangerous to do so without re-evaluating the use of OrderedAccess in hotspot, which seems a bit painful. Also such a revised contract would be a disaster when you need to copy loads of data and then store release for instance. This would force the copying (of potentially lots of data) to either be done using volatile accesses rather than in an accelerated fashion (e.g. SSE extensions, write combining), or add compiler barriers to such copying to give the release store actual release store semantics. This would also require fixing G1 where a bug was observed to not rely on the intuitive and expected contract, but on an unintuitive and provably error prone contract. Ugh!!

The only solution that seems to make sense to me is to fix the semantics of OrderedAccess (that were already changed on Linux) to be what is expected, and that is acquire/release w.r.t. other accesses (volatile or non-volatile).

Doing the refactoring afterwards allows us the easier exercise of ensuring that the refactoring produces the same code that we already have studied
in depth and believe to be correct.

So as you refactor please do find a place to clarify the per-platform assumptions so if we realize our assumptions are wrong we will re-review the code.

I was hoping to do both at the same time, since there are many platforms to be changed, and making them over and over for every type of access on every platform to be changed is a bit tedious (and error prone) compared to doing it once per platform, reusing the shared pattern (to be reviewed once instead of once per platform) and review the choice of barriers once for each platform instead of for every memory access on every platform. Of course I don’t mind if we choose to do things in the other order though if that is preferred. :)

If you compare the change in semantics I propose, without thinking about the refactoring, just compare x86 on bsd and linux, you see that on linux a compiler barrier is correctly used, but not on bsd. This is the change that needs to be added to all TSO platforms except linux and windows to ensure correctness.

thank you for offering and for asking for opinions :-)

Thank you for the good feedback! It looks like this change seems wanted, so I’ll go ahead and prepare a webrev with the proposed changes. :)

Thanks,
Erik