RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes

David Holmes david.holmes at oracle.com
Wed Nov 27 03:52:43 PST 2013


Hi Goetz,

On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
> Hi David,
>
> -- Volatile in constuctor
>> AFAIK we have not seen those tests fail due to a
>> missing constructor barrier.
> We see them on PPC64. Our test machines have typically 8-32 processors
> and are Power 5-7.  But see also Aleksey's mail.  (Thanks Aleksey!)

And see follow ups - the tests are invalid.

> -- IRIW issue
>> I can not possibly answer to the necessary level of detail with a few
>> moments thought.
> Sure.  There also is no solution as you require for the taskqueue problem yet,
> and that's being discussed now for almost a year.

It may have started a year ago but work on it has hardly been continuous.

>> You are implying there is a problem here that will
>> impact numerous platforms (unless you can tell me why ppc is so different?)
> No, only PPC does not have 'multiple-read-atomicity'.  Therefore I contributed a
> solution with the #defines, and that's correct for all, but not nice, I admit.
> (I don't really know about ARM, though).
> So if I can write down a nicer solution testing for methods that are evaluated
> by the C-compiler I'm happy.
>
> The problem is not that IRIW is not handled by the JMM, the problem
> is that
>    store
>    sync
> does not assure multiple-read-atomicity,
> only
>    sync
>    load
> does so on PPC.  And you require multiple-read-atomicity to
> pass that test.

What is "multiple-read-atomicity"? I'm not familiar with the term and 
can't find any reference to it.

Thanks,
David

   The JMM is fine. And
>    store
>    MemBarVolatile
> is fine on x86, sparc etc. as there exist assembler instructions that
> do what is required.
>
> So if you are off soon, please let's come to a solution that
> might be improvable in the way it's implemented, but that
> allows us to implement a correct PPC64 port.
>
> Best regards,
>    Goetz.
>
>
>
>
>
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Tuesday, November 26, 2013 1:11 PM
> To: Lindenmaier, Goetz
> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>
> Hi Goetz,
>
> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>> Hi everybody,
>>
>> thanks a lot for the detailed reviews!
>> I'll try to answer to all in one mail.
>>
>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned;
>> We don't think it's correct if we omit the barrier after initializing
>> a volatile field.  Previously, we discussed this with Aleksey Shipilev
>> and Doug Lea, and they agreed.
>> Also, concurrency torture tests
>>      LongVolatileTest
>>      AtomicIntegerInitialValueTest
>> will fail.
>> (In addition, observing 0 instead of the inital value of a volatile field would be
>> very counter-intuitive for Java programmers, especially in AtomicInteger.)
>
> The affects of unsafe publication are always surprising - volatiles do
> not add anything special here. AFAIK there is nothing in the JMM that
> requires the constructor barrier - discussions with Doug and Aleksey
> notwithstanding. AFAIK we have not seen those tests fail due to a
> missing constructor barrier.
>
>>> proposed for PPC64 is to make volatile reads extremely heavyweight
>> Yes, it costs measurable performance.  But else it is wrong.  We don't
>> see a way to implement this cheaper.
>>
>>> - these algorithms should be expressed using the correct OrderAccess operations
>> Basically, I agree on this.  But you also have to take into account
>> that due to the different memory ordering instructions on different platforms
>> just implementing something empty is not sufficient.
>> An example:
>>     MemBarRelease    // means LoadStore, StoreStore barrier
>>     MemBarVolatile    // means  StoreLoad barrier
>> If these are consecutively in the code, sparc code looks like this:
>>     MemBarRelease  --> membar(Assembler::LoadStore | Assembler::StoreStore)
>>     MemBarVolatile --> membar(Assembler::StoreLoad)
>> Just doing what is required.
>> On Power, we get suboptimal code, as there are no comparable,
>> fine grained operations:
>>      MemBarRelease --> lwsync   // Doing LoadStore, StoreStore, LoadLoad
>>     MemBarVolatile --> sync    // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad
>> obviously, the lwsync is superfluous.  Thus, as PPC operations are more (too) powerful,
>> I need an additional optimization that removes the lwsync.  I can not implement
>> MemBarRelease empty, as it is also used independently.
>>
>> Back to the IRIW problem.  I think here we have a comparable issue.
>> Doing the MemBarVolatile or the OrderAccess::fence() before the read
>> is inefficient on platforms that have multiple-read-atomicity.
>>
>> I would propose to guard the code by
>> VM_Version::cpu_is_multiple_read_atomic() or even better
>> OrderAccess::cpu_is_multiple_read_atomic()
>> Else, David, how would you propose to implement this platform independent?
>> (Maybe we can also use above method in taskqueue.hpp.)
>
> I can not possibly answer to the necessary level of detail with a few
> moments thought. You are implying there is a problem here that will
> impact numerous platforms (unless you can tell me why ppc is so
> different?) and I can not take that on face value at the moment. The
> only reason I can see IRIW not being handled by the JMM requirements for
> volatile accesses is if there are global visibility issues that are not
> addressed - but even then I would expect heavy barriers at the store
> would deal with that, not at the load. (This situation reminds me of the
> need for read-barriers on Alpha architecture due to the use of software
> cache-coherency rather than hardware cache-coherency - but we don't have
> that on ppc!)
>
> Sorry - There is no quick resolution here and in a couple of days I will
> be heading out on vacation for two weeks.
>
> David
> -----
>
>> Best regards,
>>     Goetz.
>>
>> -- Other ports:
>> The IRIW issue requires at least 3 processors to be relevant, so it might
>> not happen on small machines.  But I can use PPC_ONLY instead
>> of PPC64_ONLY if you request so (and if we don't get rid of them).
>>
>> -- MemBarStoreStore after initialization
>> I agree we should not change it in the ppc port.  If you wish, I can
>> prepare an extra webrev for hotspot-comp.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Tuesday, November 26, 2013 2:49 AM
>> To: Vladimir Kozlov
>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> Okay this is my second attempt at answering this in a reasonable way :)
>>
>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>> I have to ask David to do correctness evaluation.
>>
>>    From what I understand what we see here is an attempt to fix an
>> existing issue with the implementation of volatiles so that the IRIW
>> problem is addressed. The solution proposed for PPC64 is to make
>> volatile reads extremely heavyweight by adding a fence() when doing the
>> load.
>>
>> Now if this was purely handled in ppc64 source code then I would be
>> happy to let them do whatever they like (surely this kills performance
>> though!). But I do not agree with the changes to the shared code that
>> allow this solution to be implemented - even with PPC64_ONLY this is
>> polluting the shared code. My concern is similar to what I said with the
>> taskQueue changes - these algorithms should be expressed using the
>> correct OrderAccess operations to guarantee the desired properties
>> independent of architecture. If such a "barrier" is not needed on a
>> given architecture then the implementation in OrderAccess should reduce
>> to a no-op.
>>
>> And as Vitaly points out the constructor barriers are not needed under
>> the JMM.
>>
>>> I am fine with suggested changes because you did not change our current
>>> code for our platforms (please, do not change do_exits() now).
>>> But may be it should be done using more general query which is set
>>> depending on platform:
>>>
>>> OrderAccess::needs_support_iriw_ordering()
>>>
>>> or similar to what we use now:
>>>
>>> VM_Version::needs_support_iriw_ordering()
>>
>> Every platform has to support IRIW this is simply part of the Java
>> Memory Model, there should not be any need to call this out explicitly
>> like this.
>>
>> Is there some subtlety of the hardware I am missing here? Are there
>> visibility issues beyond the ordering constraints that the JMM defines?
>>>    From what I understand our ppc port is also affected. David?
>>
>> We can not discuss that on an OpenJDK mailing list - sorry.
>>
>> David
>> -----
>>
>>> In library_call.cpp can you add {}? New comment should be inside  else {}.
>>>
>>> I think you should make _wrote_volatile field not ppc64 specific which
>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY()
>>> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>>>
>>> In do_put_xxx() can you combine your changes:
>>>
>>>       if (is_vol) {
>>>         // See comment in do_get_xxx().
>>> #ifndef PPC64
>>>         insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>> #else
>>>         if (is_field) {
>>>           // Add MemBarRelease for constructors which write volatile field
>>> (PPC64).
>>>           set_wrote_volatile(true);
>>>         }
>>> #endif
>>>       }
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>> Hi,
>>>>
>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of
>>>> the torture test suite:
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>
>>>> Example:
>>>> volatile x=0, y=0
>>>>      __________    __________    __________    __________
>>>> | Thread 0 |  | Thread 1 |  | Thread 2 |  | Thread 3 |
>>>>
>>>>       write(x=1)    read(x)       write(y=1)    read(y)
>>>>                     read(y)                     read(x)
>>>>
>>>> Disallowed:     x=1, y=0                    y=1, x=0
>>>>
>>>>
>>>> Solution: This example requires multiple-copy-atomicity. This is only
>>>> assured by the sync instruction and if it is executed in the threads
>>>> doing the loads. Thus we implement volatile read as sync-load-acquire
>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>> MemBarVolatile happens to be implemented by sync.
>>>> We fix this in C2 and the cpp interpreter.
>>>>
>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>>>> fails on staxf" for taskqueue.hpp.
>>>>
>>>> Further this change contains a fix that assures that volatile fields
>>>> written in constructors are visible before the reference gets
>>>> published.
>>>>
>>>>
>>>> Looking at the code, we found a MemBarRelease that to us, seems too
>>>> strong.
>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice.
>>>> What do you think?
>>>>
>>>> Please review and test this change.
>>>>
>>>> Best regards,
>>>>      Goetz.
>>>>


More information about the ppc-aix-port-dev mailing list