RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Tue Nov 26 04:51:23 PST 2013
Hi David,
-- Volatile in constuctor
> AFAIK we have not seen those tests fail due to a
> missing constructor barrier.
We see them on PPC64. Our test machines have typically 8-32 processors
and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!)
-- IRIW issue
> I can not possibly answer to the necessary level of detail with a few
> moments thought.
Sure. There also is no solution as you require for the taskqueue problem yet,
and that's being discussed now for almost a year.
> You are implying there is a problem here that will
> impact numerous platforms (unless you can tell me why ppc is so different?)
No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a
solution with the #defines, and that's correct for all, but not nice, I admit.
(I don't really know about ARM, though).
So if I can write down a nicer solution testing for methods that are evaluated
by the C-compiler I'm happy.
The problem is not that IRIW is not handled by the JMM, the problem
is that
store
sync
does not assure multiple-read-atomicity,
only
sync
load
does so on PPC. And you require multiple-read-atomicity to
pass that test. The JMM is fine. And
store
MemBarVolatile
is fine on x86, sparc etc. as there exist assembler instructions that
do what is required.
So if you are off soon, please let's come to a solution that
might be improvable in the way it's implemented, but that
allows us to implement a correct PPC64 port.
Best regards,
Goetz.
-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com]
Sent: Tuesday, November 26, 2013 1:11 PM
To: Lindenmaier, Goetz
Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Hi Goetz,
On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
> Hi everybody,
>
> thanks a lot for the detailed reviews!
> I'll try to answer to all in one mail.
>
>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned;
> We don't think it's correct if we omit the barrier after initializing
> a volatile field. Previously, we discussed this with Aleksey Shipilev
> and Doug Lea, and they agreed.
> Also, concurrency torture tests
> LongVolatileTest
> AtomicIntegerInitialValueTest
> will fail.
> (In addition, observing 0 instead of the inital value of a volatile field would be
> very counter-intuitive for Java programmers, especially in AtomicInteger.)
The affects of unsafe publication are always surprising - volatiles do
not add anything special here. AFAIK there is nothing in the JMM that
requires the constructor barrier - discussions with Doug and Aleksey
notwithstanding. AFAIK we have not seen those tests fail due to a
missing constructor barrier.
>> proposed for PPC64 is to make volatile reads extremely heavyweight
> Yes, it costs measurable performance. But else it is wrong. We don't
> see a way to implement this cheaper.
>
>> - these algorithms should be expressed using the correct OrderAccess operations
> Basically, I agree on this. But you also have to take into account
> that due to the different memory ordering instructions on different platforms
> just implementing something empty is not sufficient.
> An example:
> MemBarRelease // means LoadStore, StoreStore barrier
> MemBarVolatile // means StoreLoad barrier
> If these are consecutively in the code, sparc code looks like this:
> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore)
> MemBarVolatile --> membar(Assembler::StoreLoad)
> Just doing what is required.
> On Power, we get suboptimal code, as there are no comparable,
> fine grained operations:
> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad
> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad
> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful,
> I need an additional optimization that removes the lwsync. I can not implement
> MemBarRelease empty, as it is also used independently.
>
> Back to the IRIW problem. I think here we have a comparable issue.
> Doing the MemBarVolatile or the OrderAccess::fence() before the read
> is inefficient on platforms that have multiple-read-atomicity.
>
> I would propose to guard the code by
> VM_Version::cpu_is_multiple_read_atomic() or even better
> OrderAccess::cpu_is_multiple_read_atomic()
> Else, David, how would you propose to implement this platform independent?
> (Maybe we can also use above method in taskqueue.hpp.)
I can not possibly answer to the necessary level of detail with a few
moments thought. You are implying there is a problem here that will
impact numerous platforms (unless you can tell me why ppc is so
different?) and I can not take that on face value at the moment. The
only reason I can see IRIW not being handled by the JMM requirements for
volatile accesses is if there are global visibility issues that are not
addressed - but even then I would expect heavy barriers at the store
would deal with that, not at the load. (This situation reminds me of the
need for read-barriers on Alpha architecture due to the use of software
cache-coherency rather than hardware cache-coherency - but we don't have
that on ppc!)
Sorry - There is no quick resolution here and in a couple of days I will
be heading out on vacation for two weeks.
David
-----
> Best regards,
> Goetz.
>
> -- Other ports:
> The IRIW issue requires at least 3 processors to be relevant, so it might
> not happen on small machines. But I can use PPC_ONLY instead
> of PPC64_ONLY if you request so (and if we don't get rid of them).
>
> -- MemBarStoreStore after initialization
> I agree we should not change it in the ppc port. If you wish, I can
> prepare an extra webrev for hotspot-comp.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Tuesday, November 26, 2013 2:49 AM
> To: Vladimir Kozlov
> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>
> Okay this is my second attempt at answering this in a reasonable way :)
>
> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>> I have to ask David to do correctness evaluation.
>
> From what I understand what we see here is an attempt to fix an
> existing issue with the implementation of volatiles so that the IRIW
> problem is addressed. The solution proposed for PPC64 is to make
> volatile reads extremely heavyweight by adding a fence() when doing the
> load.
>
> Now if this was purely handled in ppc64 source code then I would be
> happy to let them do whatever they like (surely this kills performance
> though!). But I do not agree with the changes to the shared code that
> allow this solution to be implemented - even with PPC64_ONLY this is
> polluting the shared code. My concern is similar to what I said with the
> taskQueue changes - these algorithms should be expressed using the
> correct OrderAccess operations to guarantee the desired properties
> independent of architecture. If such a "barrier" is not needed on a
> given architecture then the implementation in OrderAccess should reduce
> to a no-op.
>
> And as Vitaly points out the constructor barriers are not needed under
> the JMM.
>
>> I am fine with suggested changes because you did not change our current
>> code for our platforms (please, do not change do_exits() now).
>> But may be it should be done using more general query which is set
>> depending on platform:
>>
>> OrderAccess::needs_support_iriw_ordering()
>>
>> or similar to what we use now:
>>
>> VM_Version::needs_support_iriw_ordering()
>
> Every platform has to support IRIW this is simply part of the Java
> Memory Model, there should not be any need to call this out explicitly
> like this.
>
> Is there some subtlety of the hardware I am missing here? Are there
> visibility issues beyond the ordering constraints that the JMM defines?
>> From what I understand our ppc port is also affected. David?
>
> We can not discuss that on an OpenJDK mailing list - sorry.
>
> David
> -----
>
>> In library_call.cpp can you add {}? New comment should be inside else {}.
>>
>> I think you should make _wrote_volatile field not ppc64 specific which
>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY()
>> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>>
>> In do_put_xxx() can you combine your changes:
>>
>> if (is_vol) {
>> // See comment in do_get_xxx().
>> #ifndef PPC64
>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>> #else
>> if (is_field) {
>> // Add MemBarRelease for constructors which write volatile field
>> (PPC64).
>> set_wrote_volatile(true);
>> }
>> #endif
>> }
>>
>> Thanks,
>> Vladimir
>>
>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of
>>> the torture test suite:
>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>
>>> Example:
>>> volatile x=0, y=0
>>> __________ __________ __________ __________
>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>
>>> write(x=1) read(x) write(y=1) read(y)
>>> read(y) read(x)
>>>
>>> Disallowed: x=1, y=0 y=1, x=0
>>>
>>>
>>> Solution: This example requires multiple-copy-atomicity. This is only
>>> assured by the sync instruction and if it is executed in the threads
>>> doing the loads. Thus we implement volatile read as sync-load-acquire
>>> and omit the sync/MemBarVolatile after the volatile store.
>>> MemBarVolatile happens to be implemented by sync.
>>> We fix this in C2 and the cpp interpreter.
>>>
>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>>> fails on staxf" for taskqueue.hpp.
>>>
>>> Further this change contains a fix that assures that volatile fields
>>> written in constructors are visible before the reference gets
>>> published.
>>>
>>>
>>> Looking at the code, we found a MemBarRelease that to us, seems too
>>> strong.
>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice.
>>> What do you think?
>>>
>>> Please review and test this change.
>>>
>>> Best regards,
>>> Goetz.
>>>
More information about the ppc-aix-port-dev
mailing list