RFR: 7143664: Clean up OrderAccess implementations and usage

Wed Feb 11 14:32:57 UTC 2015

Hi David,

Thanks a lot for reviewing my changes!
Sorry for the delay, had a few other things I had to take care of.

On 09/02/15 07:14, David Holmes wrote:
> On 29/01/2015 8:48 AM, David Holmes wrote:
>> Updated webrev:
>>
>> http://cr.openjdk.java.net/~dholmes/7143664/webrev.v2/
>
> First let me say that Erik and I had a lot of offline discussions about
> this since he first proposed his reworking. My initial concern was
> ensuring the model changes were appropriate as this is a complex area
> and there can be different views of how things can/should be expressed.
> As per the bug report I was aware of a number of issues with the
> existing expression of semantics, but Erik showed me things were even
> worse than I had thought :)

:)

> src/share/vm/runtime/orderAccess.hpp
>
> Minor nit: I would prefer the term "bound" rather than "joined" when
> referring to load_acquire and store_release.

Fixed.

> In the implementation table I think "sparc" should be "sparc TSO".

Fixed.

> I still think we need to say something about C++ volatile and about
> compiler barriers in general.

How about something like:

C++ volatile semantics prevent compiler re-ordering between volatile 
memory accesses. However, reordering between non-volatile and volatile 
memory accesses is in general undefined. For compiler reordering 
constraints taking non-volatile memory accesses into consideration, a 
compiler barrier has to be used instead. Note also that both volatile 
semantics and compiler barrier do not prevent hardware reordering.

> src/os_cpu/bsd_x86/vm/orderAccess_bsd_x86.inline.hpp
> src/os_cpu/linux_x86/vm/orderAccess_linux_x86.inline.hpp
>
> In OrderAccess::fence doesn't the existing asm already act as a compiler
> barrier; and on uniprocessor we need neither compiler nor hardware
> barriers; so isn't the compiler_barrier() redundant?

I saw this was already clarified. But yeah, even uniprocessor systems 
have concurrency. Therefore I added compiler barriers to properly 
protect even uniprocessor systems, that could previously suffer from 
data races involving unlikely yet possible compiler reorderings and 
context switches.

>
> ---
>
> src/os_cpu/bsd_zero/vm/orderAccess_bsd_zero.inline.hpp
> src/os_cpu/linux_zero/vm/orderAccess_linux_zero.inline.hpp
>
> No comments - The Zero folk will need to approve this change in terminology.

I didn't like talking about "read" and "write" barriers as it says 
little about what it actually means. And I also didn't like defining 
things like storestore/storeload etc. in terms of acquire/release and 
read/write barriers because I think it's weird. ;) That's why I changed 
this. But if the previous terminology is preferred, I'm fine with 
changing that.

> src/os_cpu/solaris_sparc/vm/orderAccess_solaris_sparc.inline.hpp
> src/os_cpu/solaris_sparc/vm/solaris_sparc.il
>
> You've removed the GNU_SOURCE ifdefs which means the new code has to be
> tested using gcc on Solaris. We don't officially support that platform
> so I have no way to verify these particular changes.

Previously solaris studio did not support inline asm syntax. So the 
solaris studio variant used .il assembly instead while the GNU_SOURCE 
variant used inline asm.

But since this is nowadays supported by solaris studio too, I joined 
them into one. Not only because it's nicer to unify, but because I need 
the compiler barrier constraints that inline asm can give us. I don't 
know the compiler reordering constraints are guaranteed when calling 
external assembly. So better make it explicit and safe.

> src/os_cpu/solaris_x86/vm/orderAccess_solaris_x86.inline.hpp
> src/os_cpu/solaris_x86/vm/solaris_x86_32.il
> src/os_cpu/solaris_x86/vm/solaris_x86_64.il
>
> I think it makes sense to use the same code here as the linux x86
> version. But as above can't verify this for gcc on Solaris; also query
> about need for compiler_barrier() in fence().

Same goes here as above. Solaris studio used to not support inline asm. 
So I joined the variants relevant for correctness fixes. But of course, 
it would be possible to add the optimizations from x86 bsd/linux as 
well. But I wanted to keep optimizations out of this changeset as I 
think it is already a lot to review without the addition of new 
optimizations. So if you don't mind, I'd like to defer that until the 
correctness fixes are in place.

> Do you have a reference to the MSVC volatile semantics?

https://msdn.microsoft.com/en-us/library/12a04hfd(v=vs.100).aspx

Release:
"A write to a volatile object (volatile write) has Release semantics; a 
reference to a global or static object that occurs before a write to a 
volatile object in the instruction sequence will occur before that 
volatile write in the compiled binary."

Acquire:
"A read of a volatile object (volatile read) has Acquire semantics; a 
reference to a global or static object that occurs after a read of 
volatile memory in the instruction sequence will occur after that 
volatile read in the compiled binary."

Should fit our model right for compiler reordering.

> Not clear about the 32-bit versus 64-bit changes - mainly because I
> don't know why they were different in the first place.

Seems like x86_64 didn't have good support for inline asm syntax at some 
point. I don't know if this is still true today. It would be nice to 
know because I have seen fishy x86_64 implementations of a lot of atomic 
stuff as well, assuming lack of compiler support that might actually be 
available today.

> Why are windows release_store_fence specializations different to linux
> x86 versions?

How do you mean they are different? To me they look quite equivalent. 
xchg is always implicitly locked on x86, and they both use xchg to get 
the wanted fence semantics.

> Not clear about your change to the float version of release_store_fence:
> a) do we need to change it?
> b) given what we changed it to isn't that the default ?

I didn't want to have default specializations of jfloat/jdouble 
equalling jint/jlong. I thought maybe some architecture might want to 
put it in different registers or something, making a default conversion 
to int/long maybe undesirable. So the default for jfloat/jdouble is to 
use acquire()/release() paired with an atomic memory access. So you have 
to explicitly specialize them to be the same as jint/jlong if wanted, 
which I believe is what is done in the code you refer to.

If you think I'm paranoid about floats, I can of course make it a 
default behaviour to use jint/jlong even if it isn't explicit, in the 
same way I handled unsigned variants.

Rule of thumb: The variants that need specialization are the ones that 
have overrides for OrderAccess::store/OrderAccess::load, the rest is 
handled automatically. Currently jdouble and jfloat are part of that and 
hence need explicit specialization, but if we want to generalize floats 
too, that's fine by me. So is it preferred to generalize jfloat/jdouble 
to equal jint/jlong?

> General comment: copyright dates need updating in most, if not all, files.

Fixed.

Do we want a new webrev? (changed copyright headers and few words in 
comments)

Thanks a lot for taking time to review these changes!

/Erik