RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes

Lindenmaier, Goetz goetz.lindenmaier at sap.com
Fri Jan 17 00:39:05 PST 2014


Hi,

I tried to come up with a webrev that implements the change as proposed in 
your mails:
http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/

Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use 
support_IRIW_for_not_multiple_copy_atomic_cpu.

I left the definition and handling of _wrote_volatile in the code, without
any protection.
I protected issuing the barrier for volatile in constructors with PPC64_ONLY() ,
and put it on one line.

I removed the comment in library_call.cpp.
I also removed the sentence " Solution: implement volatile read as sync-load-acquire."
from the comments as it's PPC specific.

Wrt. to C1: we plan to port C1 to PPC64, too.  During that task, we will fix these 
issues in C1 if nobody did it by then.

Wrt. to performance: Oracle will soon do heavy testing of the port.  If any
performance problems arise, we still can add #ifdef PPC64 to circumvent this.

Best regards,
  Goetz.



-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Donnerstag, 16. Januar 2014 10:05
To: Vladimir Kozlov
Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes

On 16/01/2014 6:54 PM, Vladimir Kozlov wrote:
> On 1/16/14 12:34 AM, David Holmes wrote:
>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>>> This is becoming ugly #ifdef mess. In compiler code we are trying to
>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I
>>> want to keep it this way, it could be useful to have such info on other
>>> platforms too. But I would suggest to remove PPC64 comments in
>>> parse.hpp.
>>>
>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value
>>> which could be checked in all places instead of #ifdef:
>>
>> I asked for the ifdef some time back as I find it much preferable to
>> have this as a build-time construct rather than a
>> runtime one. I don't want to have to pay anything for this if we don't
>> use it.
>
> Any decent C++ compiler will optimize expressions with such constants
> defined in header files. I insist to avoid #ifdefs in C2 code. I really
> don't like the code with #ifdef in unsafe.cpp but I can live with it.

If you insist then we may as well do it all the same way. Better to be 
consistent.

My apologies Goetz for wasting your time going back and forth on this.

That aside I have a further concern with this IRIW support - it is 
incomplete as there is no C1 support, as PPC64 isn't using client. If 
this is going on then we (which probably means the Oracle 'we') need to 
add the missing C1 code.

David
-----

> Vladimir
>
>>
>> David
>>
>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>>> #else
>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>>> #endif
>>>
>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>>
>>> and then:
>>>
>>>   #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>>>     oop p = JNIHandles::resolve(obj); \
>>> +  if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>>> OrderAccess::fence(); \
>>>     volatile type_name v = OrderAccess::load_acquire((volatile
>>> type_name*)index_oop_from_field_offset_long(p, offset));
>>>
>>> And:
>>>
>>> +  if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>>> field->is_volatile()) {
>>> +    insert_mem_bar(Op_MemBarVolatile);   // StoreLoad barrier
>>> +  }
>>>
>>> And so on. The comments will be needed only in globalDefinitions.hpp
>>>
>>> The code in parse1.cpp could be put on one line:
>>>
>>> +  if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>>> method()->is_initializer()) )) {
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 1/15/14 9:25 PM, David Holmes wrote:
>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>>> Hi David,
>>>>>
>>>>> I updated the webrev:
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>
>>>>> - I removed the IRIW example in parse3.cpp
>>>>> - I adapted the comments not to point to that comment, and to
>>>>>    reflect the new flagging.  Also I mention that we support the
>>>>>    volatile constructor issue, but that it's not standard.
>>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>>>    I also think it's better to separate these this way.
>>>>
>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>>> declaration and all uses to be guarded by ifdef PPC64 too
>>>> please.
>>>>
>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>>>> comment doesn't make much sense to me and refers to
>>>> ppc specific stuff in a shared file:
>>>>
>>>>      if (is_volatile) {
>>>> !     if (!is_store) {
>>>>          insert_mem_bar(Op_MemBarAcquire);
>>>> !     } else {
>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>> !       // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>>>>          insert_mem_bar(Op_MemBarVolatile);
>>>> + #endif
>>>> +     }
>>>>
>>>> I don't think the comment is needed.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Thanks for your comments!
>>>>>
>>>>> Best regards,
>>>>>    Goetz.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>>> To: Lindenmaier, Goetz
>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>> 'hotspot-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>> Independent Reads of Independent Writes
>>>>>
>>>>> Hi Goetz,
>>>>>
>>>>> Sorry for the delay in getting back to this.
>>>>>
>>>>> The general changes to the volatile barriers to support IRIW are okay.
>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>>>>> specifically it is
>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>>>>> the commentary excessive, particularly for shared code. In particular
>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the
>>>>> explanation and I don't think we need it to that level of detail.
>>>>> Seems
>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>>>>
>>>>> The changes related to volatile writes in the constructor, as
>>>>> discussed
>>>>> are not required by the Java Memory Model. If you want to keep these
>>>>> then I think they should all be guarded with PPC64 because it is not
>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>>>>> PPC64 porters.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I updated this webrev.  I detected a small flaw I made when editing
>>>>>> this version.
>>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>>> I also based the webrev on the latest version of the stage repo.
>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>
>>>>>> Best regards,
>>>>>>     Goetz.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lindenmaier, Goetz
>>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>>> To: David Holmes
>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>> Independent Reads of Independent Writes
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>>> So we can at least undo #4 now we have established those tests were
>>>>>>> not
>>>>>>> required to pass.
>>>>>> We would prefer if we could keep this in.  We want to avoid that it's
>>>>>> blamed on the VM if java programs are failing on PPC after they
>>>>>> worked
>>>>>> on x86.  To clearly mark it as overfulfilling the spec I would guard
>>>>>> it by
>>>>>> a flag as proposed.  But if you insist I will remove it.  Also, this
>>>>>> part is
>>>>>> not that performance relevant.
>>>>>>
>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>> think
>>>>>> I added a compile-time guard in this new webrev:
>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC.  This introduces
>>>>>> several double negations I don't like, (#ifNdef
>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>>> but this way I only have to change the ppc platform.
>>>>>>
>>>>>> Best regards,
>>>>>>     Goetz
>>>>>>
>>>>>> P.S.:  I will also be available over the Christmas period.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>>> To: Lindenmaier, Goetz
>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>> Independent Reads of Independent Writes
>>>>>>
>>>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check
>>>>>> emails.
>>>>>>
>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> ok, I understand the tests are wrong.  It's good this issue is
>>>>>>> settled.
>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>>>>
>>>>>>> About our change:  David, the causality is the other way round.
>>>>>>> The change is about IRIW.
>>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>>
>>>>>> This is the part I still have some question marks over as the
>>>>>> implications are not nice for performance on non-TSO platforms.
>>>>>> But I'm
>>>>>> no further along in processing that paper I'm afraid.
>>>>>>
>>>>>>> 2. If we do syncs before loads, we don't need to do them after
>>>>>>> stores.
>>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>>> constructor tests.
>>>>>>> 4. So finally we added them again at the end of the constructor
>>>>>>> after stores
>>>>>>>         to pass the volatile constructor tests.
>>>>>>
>>>>>> So we can at least undo #4 now we have established those tests
>>>>>> were not
>>>>>> required to pass.
>>>>>>
>>>>>>> We originally passed the constructor tests because the ppc memory
>>>>>>> order
>>>>>>> instructions are not as find-granular as the
>>>>>>> operations in the IR.  MemBarVolatile is specified as StoreLoad.
>>>>>>> The only instruction
>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore,
>>>>>>> therefore the
>>>>>>> MemBarVolatile after the store fixes the constructor tests.  The
>>>>>>> proper representation
>>>>>>> of the fix in the IR would be adding a MemBarStoreStore.  But now
>>>>>>> it's pointless
>>>>>>> anyways.
>>>>>>>
>>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>>> I'd be happy to add a property
>>>>>>>      OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>
>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>> think
>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>>>>> based not architecture based) as that will allows for turning this
>>>>>> on/off for any architecture for testing purposes.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> or the like to guard the customization.  I'd like that much better.
>>>>>>> Or also
>>>>>>>       OrderAccess::needs_support_iriw_ordering()
>>>>>>>       VM_Version::needs_support_iriw_ordering()
>>>>>>>
>>>>>>>
>>>>>>> Best regards,
>>>>>>>      Goetz.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>>> To: Lindenmaier, Goetz
>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>> Independent Reads of Independent Writes
>>>>>>>
>>>>>>> TL;DR version:
>>>>>>>
>>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>>> constructor-barrier
>>>>>>> for volatiles is not required as part of the JMM specification. It
>>>>>>> *may*
>>>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>>>> ensure
>>>>>>> you can't see uninitialized fields. So the tests for this are
>>>>>>> invalid
>>>>>>> and this part of the patch is not needed in general (ppc64 may
>>>>>>> need it
>>>>>>> due to other factors).
>>>>>>>
>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>>>> term :)
>>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>>
>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>>> visible to
>>>>>>> all other hardware threads at the same time point; these
>>>>>>> architectures
>>>>>>> are not multiple-copy atomic."
>>>>>>>
>>>>>>> This is the visibility issue that I referred to and affects both
>>>>>>> ARM and
>>>>>>> PPC. But of course it is normally handled by using suitable barriers
>>>>>>> after the stores that need to be visible. I think the crux of the
>>>>>>> current issue is what you wrote below:
>>>>>>>
>>>>>>>     > The fixes for the constructor issue are only needed because we
>>>>>>>     > remove the sync instruction from behind stores
>>>>>>> (parse3.cpp:320)
>>>>>>>     > and place it before loads.
>>>>>>>
>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync
>>>>>>> after
>>>>>>> the store then you have to do something around the loads to get the
>>>>>>> same
>>>>>>> results! I still don't know what lead you to the conclusion that the
>>>>>>> only way to fix the IRIW issue was to put the fence before the
>>>>>>> load -
>>>>>>> maybe when I get the chance to read that paper in full it will be
>>>>>>> clearer.
>>>>>>>
>>>>>>> So ... the basic problem is that the current structure in the VM has
>>>>>>> hard-wired one choice of how to get the right semantics for volatile
>>>>>>> variables. You now want to customize that but not all the requisite
>>>>>>> hooks are present. It would be better if volatile_load and
>>>>>>> volatile_store were factored out so that they could be
>>>>>>> implemented as
>>>>>>> desired per-platform. Alternatively there could be pre- and post-
>>>>>>> hooks
>>>>>>> that could then be customized per platform. Otherwise you need
>>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>>
>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think
>>>>>>> this
>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>>>>> abstractions are a confused mess in my opinion.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>
>>>>>>>> I did not yet add the
>>>>>>>>        OrderAccess::needs_support_iriw_ordering()
>>>>>>>>        VM_Version::needs_support_iriw_ordering()
>>>>>>>> or
>>>>>>>>       OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>>
>>>>>>>>
>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>>
>>>>>>>> But we would like to pass the torture test suite as we consider
>>>>>>>> this a substantial task in implementing a PPC port.  Also we think
>>>>>>>> both tests show behavior a programmer would expect.  It's bad if
>>>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>>>> fails on ppc.  This will always first be blamed on the VM.
>>>>>>>>
>>>>>>>> The fixes for the constructor issue are only needed because we
>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>>>> and place it before loads.  Then there is no sync between volatile
>>>>>>>> store
>>>>>>>> and publishing the object.  So we add it again in this one case
>>>>>>>> (volatile store in constructor).
>>>>>>>>
>>>>>>>>
>>>>>>>> @David
>>>>>>>>>> Sure.  There also is no solution as you require for the
>>>>>>>>>> taskqueue problem yet,
>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>> continuous.
>>>>>>>> That's not true, we did a lot of investigation and testing on this
>>>>>>>> issue.
>>>>>>>> And we came up with a solution we consider the best possible. If
>>>>>>>> you
>>>>>>>> have objections, you should at least give the draft of a better
>>>>>>>> solution,
>>>>>>>> we would volunteer to implement and test it.
>>>>>>>> Similarly, we invested time in fixing the concurrency torture
>>>>>>>> issues.
>>>>>>>>
>>>>>>>> @David
>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term
>>>>>>>>> and
>>>>>>>>> can't find any reference to it.
>>>>>>>> We learned about this reading "A Tutorial Introduction to the
>>>>>>>> ARM and
>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>>> Work-Stealing for
>>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>>> taskqueue problem.
>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>>
>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I
>>>>>>>> used 'read'
>>>>>>>> instead.  Sorry for that.  (I also fixed that in the method name
>>>>>>>> above).
>>>>>>>>
>>>>>>>> Best regards and thanks for all your involvements,
>>>>>>>>       Goetz.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>
>>>>>>>> Hi Goetz,
>>>>>>>>
>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> -- Volatile in constuctor
>>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>>> missing constructor barrier.
>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>>> processors
>>>>>>>>> and are Power 5-7.  But see also Aleksey's mail.  (Thanks
>>>>>>>>> Aleksey!)
>>>>>>>>
>>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>>
>>>>>>>>> -- IRIW issue
>>>>>>>>>> I can not possibly answer to the necessary level of detail with
>>>>>>>>>> a few
>>>>>>>>>> moments thought.
>>>>>>>>> Sure.  There also is no solution as you require for the taskqueue
>>>>>>>>> problem yet,
>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>
>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>> continuous.
>>>>>>>>
>>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>> different?)
>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'.  Therefore
>>>>>>>>> I contributed a
>>>>>>>>> solution with the #defines, and that's correct for all, but not
>>>>>>>>> nice, I admit.
>>>>>>>>> (I don't really know about ARM, though).
>>>>>>>>> So if I can write down a nicer solution testing for methods that
>>>>>>>>> are evaluated
>>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>>
>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the
>>>>>>>>> problem
>>>>>>>>> is that
>>>>>>>>>        store
>>>>>>>>>        sync
>>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>>> only
>>>>>>>>>        sync
>>>>>>>>>        load
>>>>>>>>> does so on PPC.  And you require multiple-read-atomicity to
>>>>>>>>> pass that test.
>>>>>>>>
>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>> term and
>>>>>>>> can't find any reference to it.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>        The JMM is fine. And
>>>>>>>>>        store
>>>>>>>>>        MemBarVolatile
>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions
>>>>>>>>> that
>>>>>>>>> do what is required.
>>>>>>>>>
>>>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>        Goetz.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>>> 'hotspot-dev at openjdk.java.net';
>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>
>>>>>>>>> Hi Goetz,
>>>>>>>>>
>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>> Hi everybody,
>>>>>>>>>>
>>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>>
>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM
>>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>>> initializing
>>>>>>>>>> a volatile field.  Previously, we discussed this with Aleksey
>>>>>>>>>> Shipilev
>>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>>> Also, concurrency torture tests
>>>>>>>>>>          LongVolatileTest
>>>>>>>>>>          AtomicIntegerInitialValueTest
>>>>>>>>>> will fail.
>>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>>> volatile field would be
>>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>>> AtomicInteger.)
>>>>>>>>>
>>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>>> volatiles do
>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM
>>>>>>>>> that
>>>>>>>>> requires the constructor barrier - discussions with Doug and
>>>>>>>>> Aleksey
>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>>>>> missing constructor barrier.
>>>>>>>>>
>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely
>>>>>>>>>>> heavyweight
>>>>>>>>>> Yes, it costs measurable performance.  But else it is wrong.  We
>>>>>>>>>> don't
>>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>>
>>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>>> OrderAccess operations
>>>>>>>>>> Basically, I agree on this.  But you also have to take into
>>>>>>>>>> account
>>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>>> different platforms
>>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>>> An example:
>>>>>>>>>>         MemBarRelease    // means LoadStore, StoreStore barrier
>>>>>>>>>>         MemBarVolatile    // means  StoreLoad barrier
>>>>>>>>>> If these are consecutively in the code, sparc code looks like
>>>>>>>>>> this:
>>>>>>>>>>         MemBarRelease  --> membar(Assembler::LoadStore |
>>>>>>>>>> Assembler::StoreStore)
>>>>>>>>>>         MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>>>> Just doing what is required.
>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>>>> fine grained operations:
>>>>>>>>>>          MemBarRelease --> lwsync   // Doing LoadStore,
>>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>>>         MemBarVolatile --> sync    // // Doing LoadStore,
>>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>>> obviously, the lwsync is superfluous.  Thus, as PPC operations
>>>>>>>>>> are more (too) powerful,
>>>>>>>>>> I need an additional optimization that removes the lwsync.  I
>>>>>>>>>> can not implement
>>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>>
>>>>>>>>>> Back to the IRIW problem.  I think here we have a comparable
>>>>>>>>>> issue.
>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the
>>>>>>>>>> read
>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>>>>
>>>>>>>>>> I would propose to guard the code by
>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>>>> independent?
>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>>
>>>>>>>>> I can not possibly answer to the necessary level of detail with a
>>>>>>>>> few
>>>>>>>>> moments thought. You are implying there is a problem here that
>>>>>>>>> will
>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>> different?) and I can not take that on face value at the
>>>>>>>>> moment. The
>>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>>> requirements for
>>>>>>>>> volatile accesses is if there are global visibility issues that
>>>>>>>>> are not
>>>>>>>>> addressed - but even then I would expect heavy barriers at the
>>>>>>>>> store
>>>>>>>>> would deal with that, not at the load. (This situation reminds me
>>>>>>>>> of the
>>>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>>>> software
>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>>>> don't have
>>>>>>>>> that on ppc!)
>>>>>>>>>
>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days
>>>>>>>>> I will
>>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>         Goetz.
>>>>>>>>>>
>>>>>>>>>> -- Other ports:
>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so
>>>>>>>>>> it might
>>>>>>>>>> not happen on small machines.  But I can use PPC_ONLY instead
>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of
>>>>>>>>>> them).
>>>>>>>>>>
>>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>>> I agree we should not change it in the ppc port.  If you wish, I
>>>>>>>>>> can
>>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>
>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable
>>>>>>>>>> way :)
>>>>>>>>>>
>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>>
>>>>>>>>>>        From what I understand what we see here is an attempt to
>>>>>>>>>> fix an
>>>>>>>>>> existing issue with the implementation of volatiles so that the
>>>>>>>>>> IRIW
>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>>>> doing the
>>>>>>>>>> load.
>>>>>>>>>>
>>>>>>>>>> Now if this was purely handled in ppc64 source code then I
>>>>>>>>>> would be
>>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>>> performance
>>>>>>>>>> though!). But I do not agree with the changes to the shared code
>>>>>>>>>> that
>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>>>> this is
>>>>>>>>>> polluting the shared code. My concern is similar to what I said
>>>>>>>>>> with the
>>>>>>>>>> taskQueue changes - these algorithms should be expressed using
>>>>>>>>>> the
>>>>>>>>>> correct OrderAccess operations to guarantee the desired
>>>>>>>>>> properties
>>>>>>>>>> independent of architecture. If such a "barrier" is not needed
>>>>>>>>>> on a
>>>>>>>>>> given architecture then the implementation in OrderAccess should
>>>>>>>>>> reduce
>>>>>>>>>> to a no-op.
>>>>>>>>>>
>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed
>>>>>>>>>> under
>>>>>>>>>> the JMM.
>>>>>>>>>>
>>>>>>>>>>> I am fine with suggested changes because you did not change our
>>>>>>>>>>> current
>>>>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>>>>> But may be it should be done using more general query which
>>>>>>>>>>> is set
>>>>>>>>>>> depending on platform:
>>>>>>>>>>>
>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>
>>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>>
>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>
>>>>>>>>>> Every platform has to support IRIW this is simply part of the
>>>>>>>>>> Java
>>>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>>>> explicitly
>>>>>>>>>> like this.
>>>>>>>>>>
>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are
>>>>>>>>>> there
>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM
>>>>>>>>>> defines?
>>>>>>>>>>>        From what I understand our ppc port is also affected.
>>>>>>>>>>> David?
>>>>>>>>>>
>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>>> inside  else {}.
>>>>>>>>>>>
>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>>> specific which
>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many
>>>>>>>>>>> #ifdefs.
>>>>>>>>>>>
>>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>>
>>>>>>>>>>>           if (is_vol) {
>>>>>>>>>>>             // See comment in do_get_xxx().
>>>>>>>>>>> #ifndef PPC64
>>>>>>>>>>>             insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>>>>> #else
>>>>>>>>>>>             if (is_field) {
>>>>>>>>>>>               // Add MemBarRelease for constructors which write
>>>>>>>>>>> volatile field
>>>>>>>>>>> (PPC64).
>>>>>>>>>>>               set_wrote_volatile(true);
>>>>>>>>>>>             }
>>>>>>>>>>> #endif
>>>>>>>>>>>           }
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>>> the torture test suite:
>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>
>>>>>>>>>>>> Example:
>>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>>>          __________    __________    __________    __________
>>>>>>>>>>>> | Thread 0 |  | Thread 1 |  | Thread 2 |  | Thread 3 |
>>>>>>>>>>>>
>>>>>>>>>>>>           write(x=1)    read(x)       write(y=1)    read(y)
>>>>>>>>>>>>                         read(y)                     read(x)
>>>>>>>>>>>>
>>>>>>>>>>>> Disallowed:     x=1, y=0                    y=1, x=0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This
>>>>>>>>>>>> is only
>>>>>>>>>>>> assured by the sync instruction and if it is executed in the
>>>>>>>>>>>> threads
>>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>>
>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple
>>>>>>>>>>>> SIGSEGVs
>>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>>
>>>>>>>>>>>> Further this change contains a fix that assures that volatile
>>>>>>>>>>>> fields
>>>>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>>>>> published.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>>>> seems too
>>>>>>>>>>>> strong.
>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>>>> suffice.
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>          Goetz.
>>>>>>>>>>>>


More information about the ppc-aix-port-dev mailing list