RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Tue Nov 26 03:22:15 PST 2013
Hi everybody,
thanks a lot for the detailed reviews!
I'll try to answer to all in one mail.
> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned;
We don't think it's correct if we omit the barrier after initializing
a volatile field. Previously, we discussed this with Aleksey Shipilev
and Doug Lea, and they agreed.
Also, concurrency torture tests
LongVolatileTest
AtomicIntegerInitialValueTest
will fail.
(In addition, observing 0 instead of the inital value of a volatile field would be
very counter-intuitive for Java programmers, especially in AtomicInteger.)
> proposed for PPC64 is to make volatile reads extremely heavyweight
Yes, it costs measurable performance. But else it is wrong. We don't
see a way to implement this cheaper.
> - these algorithms should be expressed using the correct OrderAccess operations
Basically, I agree on this. But you also have to take into account
that due to the different memory ordering instructions on different platforms
just implementing something empty is not sufficient.
An example:
MemBarRelease // means LoadStore, StoreStore barrier
MemBarVolatile // means StoreLoad barrier
If these are consecutively in the code, sparc code looks like this:
MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore)
MemBarVolatile --> membar(Assembler::StoreLoad)
Just doing what is required.
On Power, we get suboptimal code, as there are no comparable,
fine grained operations:
MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad
MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad
obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful,
I need an additional optimization that removes the lwsync. I can not implement
MemBarRelease empty, as it is also used independently.
Back to the IRIW problem. I think here we have a comparable issue.
Doing the MemBarVolatile or the OrderAccess::fence() before the read
is inefficient on platforms that have multiple-read-atomicity.
I would propose to guard the code by
VM_Version::cpu_is_multiple_read_atomic() or even better
OrderAccess::cpu_is_multiple_read_atomic()
Else, David, how would you propose to implement this platform independent?
(Maybe we can also use above method in taskqueue.hpp.)
Best regards,
Goetz.
-- Other ports:
The IRIW issue requires at least 3 processors to be relevant, so it might
not happen on small machines. But I can use PPC_ONLY instead
of PPC64_ONLY if you request so (and if we don't get rid of them).
-- MemBarStoreStore after initialization
I agree we should not change it in the ppc port. If you wish, I can
prepare an extra webrev for hotspot-comp.
-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com]
Sent: Tuesday, November 26, 2013 2:49 AM
To: Vladimir Kozlov
Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Okay this is my second attempt at answering this in a reasonable way :)
On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
> I have to ask David to do correctness evaluation.
From what I understand what we see here is an attempt to fix an
existing issue with the implementation of volatiles so that the IRIW
problem is addressed. The solution proposed for PPC64 is to make
volatile reads extremely heavyweight by adding a fence() when doing the
load.
Now if this was purely handled in ppc64 source code then I would be
happy to let them do whatever they like (surely this kills performance
though!). But I do not agree with the changes to the shared code that
allow this solution to be implemented - even with PPC64_ONLY this is
polluting the shared code. My concern is similar to what I said with the
taskQueue changes - these algorithms should be expressed using the
correct OrderAccess operations to guarantee the desired properties
independent of architecture. If such a "barrier" is not needed on a
given architecture then the implementation in OrderAccess should reduce
to a no-op.
And as Vitaly points out the constructor barriers are not needed under
the JMM.
> I am fine with suggested changes because you did not change our current
> code for our platforms (please, do not change do_exits() now).
> But may be it should be done using more general query which is set
> depending on platform:
>
> OrderAccess::needs_support_iriw_ordering()
>
> or similar to what we use now:
>
> VM_Version::needs_support_iriw_ordering()
Every platform has to support IRIW this is simply part of the Java
Memory Model, there should not be any need to call this out explicitly
like this.
Is there some subtlety of the hardware I am missing here? Are there
visibility issues beyond the ordering constraints that the JMM defines?
> From what I understand our ppc port is also affected. David?
We can not discuss that on an OpenJDK mailing list - sorry.
David
-----
> In library_call.cpp can you add {}? New comment should be inside else {}.
>
> I think you should make _wrote_volatile field not ppc64 specific which
> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY()
> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>
> In do_put_xxx() can you combine your changes:
>
> if (is_vol) {
> // See comment in do_get_xxx().
> #ifndef PPC64
> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
> #else
> if (is_field) {
> // Add MemBarRelease for constructors which write volatile field
> (PPC64).
> set_wrote_volatile(true);
> }
> #endif
> }
>
> Thanks,
> Vladimir
>
> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of
>> the torture test suite:
>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>
>> Example:
>> volatile x=0, y=0
>> __________ __________ __________ __________
>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>
>> write(x=1) read(x) write(y=1) read(y)
>> read(y) read(x)
>>
>> Disallowed: x=1, y=0 y=1, x=0
>>
>>
>> Solution: This example requires multiple-copy-atomicity. This is only
>> assured by the sync instruction and if it is executed in the threads
>> doing the loads. Thus we implement volatile read as sync-load-acquire
>> and omit the sync/MemBarVolatile after the volatile store.
>> MemBarVolatile happens to be implemented by sync.
>> We fix this in C2 and the cpp interpreter.
>>
>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>> fails on staxf" for taskqueue.hpp.
>>
>> Further this change contains a fix that assures that volatile fields
>> written in constructors are visible before the reference gets
>> published.
>>
>>
>> Looking at the code, we found a MemBarRelease that to us, seems too
>> strong.
>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice.
>> What do you think?
>>
>> Please review and test this change.
>>
>> Best regards,
>> Goetz.
>>
More information about the ppc-aix-port-dev
mailing list