RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Nov 27 14:15:45 PST 2013


On 11/27/13 9:15 AM, Lindenmaier, Goetz wrote:
> Hi,
>
> I updated the webrev to fix the issues mentioned by Vladimir:
> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/

Changes in library_call.cpp are incorrect, should be NOT_PPC64():

PPC64_ONLY(insert_mem_bar(Op_MemBarVolatile));

You did not merge code in do_put_xxx() as I asked.

Thanks,
Vladimir

>
> I did not yet add the
>     OrderAccess::needs_support_iriw_ordering()
>     VM_Version::needs_support_iriw_ordering()
> or
>    OrderAccess::cpu_is_multiple_copy_atomic()
> to reduce #defined, as I got no further comment on that.
>
>
> WRT to the validity of the tests and the interpretation of the JMM
> I feel not in the position to contribute substantially.
>
> But we would like to pass the torture test suite as we consider
> this a substantial task in implementing a PPC port.  Also we think
> both tests show behavior a programmer would expect.  It's bad if
> Java code runs fine on the more common x86 platform, and then
> fails on ppc.  This will always first be blamed on the VM.
>
> The fixes for the constructor issue are only needed because we
> remove the sync instruction from behind stores (parse3.cpp:320)
> and place it before loads.  Then there is no sync between volatile store
> and publishing the object.  So we add it again in this one case
> (volatile store in constructor).
>
>
> @David
>>> Sure.  There also is no solution as you require for the taskqueue problem yet,
>>> and that's being discussed now for almost a year.
>> It may have started a year ago but work on it has hardly been continuous.
> That's not true, we did a lot of investigation and testing on this issue.
> And we came up with a solution we consider the best possible. If you
> have objections, you should at least give the draft of a better solution,
> we would volunteer to implement and test it.
> Similarly, we invested time in fixing the concurrency torture issues.
>
> @David
>> What is "multiple-read-atomicity"? I'm not familiar with the term and
>> can't find any reference to it.
> We learned about this reading "A Tutorial Introduction to the ARM and
> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for
> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem.
> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>
> I was wrong in one thing, it's called multiple copy atomicity, I used 'read'
> instead.  Sorry for that.  (I also fixed that in the method name above).
>
> Best regards and thanks for all your involvements,
>    Goetz.
>
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Mittwoch, 27. November 2013 12:53
> To: Lindenmaier, Goetz
> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>
> Hi Goetz,
>
> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>> Hi David,
>>
>> -- Volatile in constuctor
>>> AFAIK we have not seen those tests fail due to a
>>> missing constructor barrier.
>> We see them on PPC64. Our test machines have typically 8-32 processors
>> and are Power 5-7.  But see also Aleksey's mail.  (Thanks Aleksey!)
>
> And see follow ups - the tests are invalid.
>
>> -- IRIW issue
>>> I can not possibly answer to the necessary level of detail with a few
>>> moments thought.
>> Sure.  There also is no solution as you require for the taskqueue problem yet,
>> and that's being discussed now for almost a year.
>
> It may have started a year ago but work on it has hardly been continuous.
>
>>> You are implying there is a problem here that will
>>> impact numerous platforms (unless you can tell me why ppc is so different?)
>> No, only PPC does not have 'multiple-read-atomicity'.  Therefore I contributed a
>> solution with the #defines, and that's correct for all, but not nice, I admit.
>> (I don't really know about ARM, though).
>> So if I can write down a nicer solution testing for methods that are evaluated
>> by the C-compiler I'm happy.
>>
>> The problem is not that IRIW is not handled by the JMM, the problem
>> is that
>>     store
>>     sync
>> does not assure multiple-read-atomicity,
>> only
>>     sync
>>     load
>> does so on PPC.  And you require multiple-read-atomicity to
>> pass that test.
>
> What is "multiple-read-atomicity"? I'm not familiar with the term and
> can't find any reference to it.
>
> Thanks,
> David
>
>     The JMM is fine. And
>>     store
>>     MemBarVolatile
>> is fine on x86, sparc etc. as there exist assembler instructions that
>> do what is required.
>>
>> So if you are off soon, please let's come to a solution that
>> might be improvable in the way it's implemented, but that
>> allows us to implement a correct PPC64 port.
>>
>> Best regards,
>>     Goetz.
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Tuesday, November 26, 2013 1:11 PM
>> To: Lindenmaier, Goetz
>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> Hi Goetz,
>>
>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>> Hi everybody,
>>>
>>> thanks a lot for the detailed reviews!
>>> I'll try to answer to all in one mail.
>>>
>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned;
>>> We don't think it's correct if we omit the barrier after initializing
>>> a volatile field.  Previously, we discussed this with Aleksey Shipilev
>>> and Doug Lea, and they agreed.
>>> Also, concurrency torture tests
>>>       LongVolatileTest
>>>       AtomicIntegerInitialValueTest
>>> will fail.
>>> (In addition, observing 0 instead of the inital value of a volatile field would be
>>> very counter-intuitive for Java programmers, especially in AtomicInteger.)
>>
>> The affects of unsafe publication are always surprising - volatiles do
>> not add anything special here. AFAIK there is nothing in the JMM that
>> requires the constructor barrier - discussions with Doug and Aleksey
>> notwithstanding. AFAIK we have not seen those tests fail due to a
>> missing constructor barrier.
>>
>>>> proposed for PPC64 is to make volatile reads extremely heavyweight
>>> Yes, it costs measurable performance.  But else it is wrong.  We don't
>>> see a way to implement this cheaper.
>>>
>>>> - these algorithms should be expressed using the correct OrderAccess operations
>>> Basically, I agree on this.  But you also have to take into account
>>> that due to the different memory ordering instructions on different platforms
>>> just implementing something empty is not sufficient.
>>> An example:
>>>      MemBarRelease    // means LoadStore, StoreStore barrier
>>>      MemBarVolatile    // means  StoreLoad barrier
>>> If these are consecutively in the code, sparc code looks like this:
>>>      MemBarRelease  --> membar(Assembler::LoadStore | Assembler::StoreStore)
>>>      MemBarVolatile --> membar(Assembler::StoreLoad)
>>> Just doing what is required.
>>> On Power, we get suboptimal code, as there are no comparable,
>>> fine grained operations:
>>>       MemBarRelease --> lwsync   // Doing LoadStore, StoreStore, LoadLoad
>>>      MemBarVolatile --> sync    // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad
>>> obviously, the lwsync is superfluous.  Thus, as PPC operations are more (too) powerful,
>>> I need an additional optimization that removes the lwsync.  I can not implement
>>> MemBarRelease empty, as it is also used independently.
>>>
>>> Back to the IRIW problem.  I think here we have a comparable issue.
>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read
>>> is inefficient on platforms that have multiple-read-atomicity.
>>>
>>> I would propose to guard the code by
>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>> OrderAccess::cpu_is_multiple_read_atomic()
>>> Else, David, how would you propose to implement this platform independent?
>>> (Maybe we can also use above method in taskqueue.hpp.)
>>
>> I can not possibly answer to the necessary level of detail with a few
>> moments thought. You are implying there is a problem here that will
>> impact numerous platforms (unless you can tell me why ppc is so
>> different?) and I can not take that on face value at the moment. The
>> only reason I can see IRIW not being handled by the JMM requirements for
>> volatile accesses is if there are global visibility issues that are not
>> addressed - but even then I would expect heavy barriers at the store
>> would deal with that, not at the load. (This situation reminds me of the
>> need for read-barriers on Alpha architecture due to the use of software
>> cache-coherency rather than hardware cache-coherency - but we don't have
>> that on ppc!)
>>
>> Sorry - There is no quick resolution here and in a couple of days I will
>> be heading out on vacation for two weeks.
>>
>> David
>> -----
>>
>>> Best regards,
>>>      Goetz.
>>>
>>> -- Other ports:
>>> The IRIW issue requires at least 3 processors to be relevant, so it might
>>> not happen on small machines.  But I can use PPC_ONLY instead
>>> of PPC64_ONLY if you request so (and if we don't get rid of them).
>>>
>>> -- MemBarStoreStore after initialization
>>> I agree we should not change it in the ppc port.  If you wish, I can
>>> prepare an extra webrev for hotspot-comp.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>> To: Vladimir Kozlov
>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>
>>> Okay this is my second attempt at answering this in a reasonable way :)
>>>
>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>> I have to ask David to do correctness evaluation.
>>>
>>>     From what I understand what we see here is an attempt to fix an
>>> existing issue with the implementation of volatiles so that the IRIW
>>> problem is addressed. The solution proposed for PPC64 is to make
>>> volatile reads extremely heavyweight by adding a fence() when doing the
>>> load.
>>>
>>> Now if this was purely handled in ppc64 source code then I would be
>>> happy to let them do whatever they like (surely this kills performance
>>> though!). But I do not agree with the changes to the shared code that
>>> allow this solution to be implemented - even with PPC64_ONLY this is
>>> polluting the shared code. My concern is similar to what I said with the
>>> taskQueue changes - these algorithms should be expressed using the
>>> correct OrderAccess operations to guarantee the desired properties
>>> independent of architecture. If such a "barrier" is not needed on a
>>> given architecture then the implementation in OrderAccess should reduce
>>> to a no-op.
>>>
>>> And as Vitaly points out the constructor barriers are not needed under
>>> the JMM.
>>>
>>>> I am fine with suggested changes because you did not change our current
>>>> code for our platforms (please, do not change do_exits() now).
>>>> But may be it should be done using more general query which is set
>>>> depending on platform:
>>>>
>>>> OrderAccess::needs_support_iriw_ordering()
>>>>
>>>> or similar to what we use now:
>>>>
>>>> VM_Version::needs_support_iriw_ordering()
>>>
>>> Every platform has to support IRIW this is simply part of the Java
>>> Memory Model, there should not be any need to call this out explicitly
>>> like this.
>>>
>>> Is there some subtlety of the hardware I am missing here? Are there
>>> visibility issues beyond the ordering constraints that the JMM defines?
>>>>     From what I understand our ppc port is also affected. David?
>>>
>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>
>>> David
>>> -----
>>>
>>>> In library_call.cpp can you add {}? New comment should be inside  else {}.
>>>>
>>>> I think you should make _wrote_volatile field not ppc64 specific which
>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY()
>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>>>>
>>>> In do_put_xxx() can you combine your changes:
>>>>
>>>>        if (is_vol) {
>>>>          // See comment in do_get_xxx().
>>>> #ifndef PPC64
>>>>          insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>> #else
>>>>          if (is_field) {
>>>>            // Add MemBarRelease for constructors which write volatile field
>>>> (PPC64).
>>>>            set_wrote_volatile(true);
>>>>          }
>>>> #endif
>>>>        }
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of
>>>>> the torture test suite:
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>
>>>>> Example:
>>>>> volatile x=0, y=0
>>>>>       __________    __________    __________    __________
>>>>> | Thread 0 |  | Thread 1 |  | Thread 2 |  | Thread 3 |
>>>>>
>>>>>        write(x=1)    read(x)       write(y=1)    read(y)
>>>>>                      read(y)                     read(x)
>>>>>
>>>>> Disallowed:     x=1, y=0                    y=1, x=0
>>>>>
>>>>>
>>>>> Solution: This example requires multiple-copy-atomicity. This is only
>>>>> assured by the sync instruction and if it is executed in the threads
>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire
>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>> MemBarVolatile happens to be implemented by sync.
>>>>> We fix this in C2 and the cpp interpreter.
>>>>>
>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>>>>> fails on staxf" for taskqueue.hpp.
>>>>>
>>>>> Further this change contains a fix that assures that volatile fields
>>>>> written in constructors are visible before the reference gets
>>>>> published.
>>>>>
>>>>>
>>>>> Looking at the code, we found a MemBarRelease that to us, seems too
>>>>> strong.
>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice.
>>>>> What do you think?
>>>>>
>>>>> Please review and test this change.
>>>>>
>>>>> Best regards,
>>>>>       Goetz.
>>>>>


More information about the ppc-aix-port-dev mailing list