RFR: 7143664: Clean up OrderAccess implementations and usage
Erik Österlund
erik.osterlund at lnu.se
Wed Feb 11 14:32:57 UTC 2015
Hi David,
Thanks a lot for reviewing my changes!
Sorry for the delay, had a few other things I had to take care of.
On 09/02/15 07:14, David Holmes wrote:
> On 29/01/2015 8:48 AM, David Holmes wrote:
>> Updated webrev:
>>
>> http://cr.openjdk.java.net/~dholmes/7143664/webrev.v2/
>
> First let me say that Erik and I had a lot of offline discussions about
> this since he first proposed his reworking. My initial concern was
> ensuring the model changes were appropriate as this is a complex area
> and there can be different views of how things can/should be expressed.
> As per the bug report I was aware of a number of issues with the
> existing expression of semantics, but Erik showed me things were even
> worse than I had thought :)
:)
> src/share/vm/runtime/orderAccess.hpp
>
> Minor nit: I would prefer the term "bound" rather than "joined" when
> referring to load_acquire and store_release.
Fixed.
> In the implementation table I think "sparc" should be "sparc TSO".
Fixed.
> I still think we need to say something about C++ volatile and about
> compiler barriers in general.
How about something like:
C++ volatile semantics prevent compiler re-ordering between volatile
memory accesses. However, reordering between non-volatile and volatile
memory accesses is in general undefined. For compiler reordering
constraints taking non-volatile memory accesses into consideration, a
compiler barrier has to be used instead. Note also that both volatile
semantics and compiler barrier do not prevent hardware reordering.
> src/os_cpu/bsd_x86/vm/orderAccess_bsd_x86.inline.hpp
> src/os_cpu/linux_x86/vm/orderAccess_linux_x86.inline.hpp
>
> In OrderAccess::fence doesn't the existing asm already act as a compiler
> barrier; and on uniprocessor we need neither compiler nor hardware
> barriers; so isn't the compiler_barrier() redundant?
I saw this was already clarified. But yeah, even uniprocessor systems
have concurrency. Therefore I added compiler barriers to properly
protect even uniprocessor systems, that could previously suffer from
data races involving unlikely yet possible compiler reorderings and
context switches.
>
> ---
>
> src/os_cpu/bsd_zero/vm/orderAccess_bsd_zero.inline.hpp
> src/os_cpu/linux_zero/vm/orderAccess_linux_zero.inline.hpp
>
> No comments - The Zero folk will need to approve this change in terminology.
I didn't like talking about "read" and "write" barriers as it says
little about what it actually means. And I also didn't like defining
things like storestore/storeload etc. in terms of acquire/release and
read/write barriers because I think it's weird. ;) That's why I changed
this. But if the previous terminology is preferred, I'm fine with
changing that.
> src/os_cpu/solaris_sparc/vm/orderAccess_solaris_sparc.inline.hpp
> src/os_cpu/solaris_sparc/vm/solaris_sparc.il
>
> You've removed the GNU_SOURCE ifdefs which means the new code has to be
> tested using gcc on Solaris. We don't officially support that platform
> so I have no way to verify these particular changes.
Previously solaris studio did not support inline asm syntax. So the
solaris studio variant used .il assembly instead while the GNU_SOURCE
variant used inline asm.
But since this is nowadays supported by solaris studio too, I joined
them into one. Not only because it's nicer to unify, but because I need
the compiler barrier constraints that inline asm can give us. I don't
know the compiler reordering constraints are guaranteed when calling
external assembly. So better make it explicit and safe.
> src/os_cpu/solaris_x86/vm/orderAccess_solaris_x86.inline.hpp
> src/os_cpu/solaris_x86/vm/solaris_x86_32.il
> src/os_cpu/solaris_x86/vm/solaris_x86_64.il
>
> I think it makes sense to use the same code here as the linux x86
> version. But as above can't verify this for gcc on Solaris; also query
> about need for compiler_barrier() in fence().
Same goes here as above. Solaris studio used to not support inline asm.
So I joined the variants relevant for correctness fixes. But of course,
it would be possible to add the optimizations from x86 bsd/linux as
well. But I wanted to keep optimizations out of this changeset as I
think it is already a lot to review without the addition of new
optimizations. So if you don't mind, I'd like to defer that until the
correctness fixes are in place.
> Do you have a reference to the MSVC volatile semantics?
https://msdn.microsoft.com/en-us/library/12a04hfd(v=vs.100).aspx
Release:
"A write to a volatile object (volatile write) has Release semantics; a
reference to a global or static object that occurs before a write to a
volatile object in the instruction sequence will occur before that
volatile write in the compiled binary."
Acquire:
"A read of a volatile object (volatile read) has Acquire semantics; a
reference to a global or static object that occurs after a read of
volatile memory in the instruction sequence will occur after that
volatile read in the compiled binary."
Should fit our model right for compiler reordering.
> Not clear about the 32-bit versus 64-bit changes - mainly because I
> don't know why they were different in the first place.
Seems like x86_64 didn't have good support for inline asm syntax at some
point. I don't know if this is still true today. It would be nice to
know because I have seen fishy x86_64 implementations of a lot of atomic
stuff as well, assuming lack of compiler support that might actually be
available today.
> Why are windows release_store_fence specializations different to linux
> x86 versions?
How do you mean they are different? To me they look quite equivalent.
xchg is always implicitly locked on x86, and they both use xchg to get
the wanted fence semantics.
> Not clear about your change to the float version of release_store_fence:
> a) do we need to change it?
> b) given what we changed it to isn't that the default ?
I didn't want to have default specializations of jfloat/jdouble
equalling jint/jlong. I thought maybe some architecture might want to
put it in different registers or something, making a default conversion
to int/long maybe undesirable. So the default for jfloat/jdouble is to
use acquire()/release() paired with an atomic memory access. So you have
to explicitly specialize them to be the same as jint/jlong if wanted,
which I believe is what is done in the code you refer to.
If you think I'm paranoid about floats, I can of course make it a
default behaviour to use jint/jlong even if it isn't explicit, in the
same way I handled unsigned variants.
Rule of thumb: The variants that need specialization are the ones that
have overrides for OrderAccess::store/OrderAccess::load, the rest is
handled automatically. Currently jdouble and jfloat are part of that and
hence need explicit specialization, but if we want to generalize floats
too, that's fine by me. So is it preferred to generalize jfloat/jdouble
to equal jint/jlong?
> General comment: copyright dates need updating in most, if not all, files.
Fixed.
Do we want a new webrev? (changed copyright headers and few words in
comments)
Thanks a lot for taking time to review these changes!
/Erik
More information about the hotspot-dev
mailing list