RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads	of Independent Writes
    Vladimir Kozlov 
    vladimir.kozlov at oracle.com
       
    Wed Jan 15 23:13:27 PST 2014
    
    
  
This is becoming ugly #ifdef mess. In compiler code we are trying to avoid them. I suggested to have _wrote_volatile 
without #ifdef and I want to keep it this way, it could be useful to have such info on other platforms too. But I would 
suggest to remove PPC64 comments in parse.hpp.
In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value which could be checked in all places instead of 
#ifdef:
#ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
#else
const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
#endif
or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
and then:
  #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
    oop p = JNIHandles::resolve(obj); \
+  if (support_IRIW_for_not_multiple_copy_atomic_cpu) OrderAccess::fence(); \
    volatile type_name v = OrderAccess::load_acquire((volatile type_name*)index_oop_from_field_offset_long(p, offset));
And:
+  if (support_IRIW_for_not_multiple_copy_atomic_cpu && field->is_volatile()) {
+    insert_mem_bar(Op_MemBarVolatile);   // StoreLoad barrier
+  }
And so on. The comments will be needed only in globalDefinitions.hpp
The code in parse1.cpp could be put on one line:
+  if (wrote_final() PPC64_ONLY( || (wrote_volatile() && method()->is_initializer()) )) {
Thanks,
Vladimir
On 1/15/14 9:25 PM, David Holmes wrote:
> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>> Hi David,
>>
>> I updated the webrev:
>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>
>> - I removed the IRIW example in parse3.cpp
>> - I adapted the comments not to point to that comment, and to
>>    reflect the new flagging.  Also I mention that we support the
>>    volatile constructor issue, but that it's not standard.
>> - I protected issuing the barrier for the constructor by PPC64.
>>    I also think it's better to separate these this way.
>
> Sorry if I wasn't clear but I'd like the wrote_volatile field declaration and all uses to be guarded by ifdef PPC64 too
> please.
>
> One nit I missed before. In src/share/vm/opto/library_call.cpp this comment doesn't make much sense to me and refers to
> ppc specific stuff in a shared file:
>
>      if (is_volatile) {
> !     if (!is_store) {
>          insert_mem_bar(Op_MemBarAcquire);
> !     } else {
> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
> !       // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>          insert_mem_bar(Op_MemBarVolatile);
> + #endif
> +     }
>
> I don't think the comment is needed.
>
> Thanks,
> David
>
>> Thanks for your comments!
>>
>> Best regards,
>>    Goetz.
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Mittwoch, 15. Januar 2014 01:55
>> To: Lindenmaier, Goetz
>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> Hi Goetz,
>>
>> Sorry for the delay in getting back to this.
>>
>> The general changes to the volatile barriers to support IRIW are okay.
>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>> specifically it is
>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>> the commentary excessive, particularly for shared code. In particular
>> the IRIW example in parse3.cpp - it seems a strange place to give the
>> explanation and I don't think we need it to that level of detail. Seems
>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>
>> The changes related to volatile writes in the constructor, as discussed
>> are not required by the Java Memory Model. If you want to keep these
>> then I think they should all be guarded with PPC64 because it is not
>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>> PPC64 porters.
>>
>> Thanks,
>> David
>>
>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> I updated this webrev.  I detected a small flaw I made when editing this version.
>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>> I also based the webrev on the latest version of the stage repo.
>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>
>>> Best regards,
>>>     Goetz.
>>>
>>> -----Original Message-----
>>> From: Lindenmaier, Goetz
>>> Sent: Freitag, 20. Dezember 2013 13:47
>>> To: David Holmes
>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>
>>> Hi David,
>>>
>>>> So we can at least undo #4 now we have established those tests were not
>>>> required to pass.
>>> We would prefer if we could keep this in.  We want to avoid that it's
>>> blamed on the VM if java programs are failing on PPC after they worked
>>> on x86.  To clearly mark it as overfulfilling the spec I would guard it by
>>> a flag as proposed.  But if you insist I will remove it.  Also, this part is
>>> not that performance relevant.
>>>
>>>> A compile-time guard (ifdef) would be better than a runtime one I think
>>> I added a compile-time guard in this new webrev:
>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC.  This introduces
>>> several double negations I don't like, (#ifNdef CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>> but this way I only have to change the ppc platform.
>>>
>>> Best regards,
>>>     Goetz
>>>
>>> P.S.:  I will also be available over the Christmas period.
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Freitag, 20. Dezember 2013 05:58
>>> To: Lindenmaier, Goetz
>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>
>>> Sorry for the delay, it takes a while to catch up after two weeks
>>> vacation :) Next vacation (ie next two weeks) I'll continue to check emails.
>>>
>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>> Hi,
>>>>
>>>> ok, I understand the tests are wrong.  It's good this issue is settled.
>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>
>>>> About our change:  David, the causality is the other way round.
>>>> The change is about IRIW.
>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>
>>> This is the part I still have some question marks over as the
>>> implications are not nice for performance on non-TSO platforms. But I'm
>>> no further along in processing that paper I'm afraid.
>>>
>>>> 2. If we do syncs before loads, we don't need to do them after stores.
>>>> 3. If we don't do them after stores, we fail the volatile constructor tests.
>>>> 4. So finally we added them again at the end of the constructor after stores
>>>>         to pass the volatile constructor tests.
>>>
>>> So we can at least undo #4 now we have established those tests were not
>>> required to pass.
>>>
>>>> We originally passed the constructor tests because the ppc memory order
>>>> instructions are not as find-granular as the
>>>> operations in the IR.  MemBarVolatile is specified as StoreLoad.  The only instruction
>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the
>>>> MemBarVolatile after the store fixes the constructor tests.  The proper representation
>>>> of the fix in the IR would be adding a MemBarStoreStore.  But now it's pointless
>>>> anyways.
>>>>
>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>> I'd be happy to add a property
>>>>      OrderAccess::cpu_is_multiple_copy_atomic()
>>>
>>> A compile-time guard (ifdef) would be better than a runtime one I think
>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>> based not architecture based) as that will allows for turning this
>>> on/off for any architecture for testing purposes.
>>>
>>> Thanks,
>>> David
>>>
>>>> or the like to guard the customization.  I'd like that much better. Or also
>>>>       OrderAccess::needs_support_iriw_ordering()
>>>>       VM_Version::needs_support_iriw_ordering()
>>>>
>>>>
>>>> Best regards,
>>>>      Goetz.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>> To: Lindenmaier, Goetz
>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>
>>>> TL;DR version:
>>>>
>>>> Discussion on the c-i list has now confirmed that a constructor-barrier
>>>> for volatiles is not required as part of the JMM specification. It *may*
>>>> be required in an implementation that doesn't pre-zero memory to ensure
>>>> you can't see uninitialized fields. So the tests for this are invalid
>>>> and this part of the patch is not needed in general (ppc64 may need it
>>>> due to other factors).
>>>>
>>>> Re: "multiple copy atomicity" - first thanks for correcting the term :)
>>>> Second thanks for the reference to that paper! For reference:
>>>>
>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>> complex interconnect) does not guarantee that a write becomes visible to
>>>> all other hardware threads at the same time point; these architectures
>>>> are not multiple-copy atomic."
>>>>
>>>> This is the visibility issue that I referred to and affects both ARM and
>>>> PPC. But of course it is normally handled by using suitable barriers
>>>> after the stores that need to be visible. I think the crux of the
>>>> current issue is what you wrote below:
>>>>
>>>>     > The fixes for the constructor issue are only needed because we
>>>>     > remove the sync instruction from behind stores (parse3.cpp:320)
>>>>     > and place it before loads.
>>>>
>>>> I hadn't grasped this part. Obviously if you fail to do the sync after
>>>> the store then you have to do something around the loads to get the same
>>>> results! I still don't know what lead you to the conclusion that the
>>>> only way to fix the IRIW issue was to put the fence before the load -
>>>> maybe when I get the chance to read that paper in full it will be clearer.
>>>>
>>>> So ... the basic problem is that the current structure in the VM has
>>>> hard-wired one choice of how to get the right semantics for volatile
>>>> variables. You now want to customize that but not all the requisite
>>>> hooks are present. It would be better if volatile_load and
>>>> volatile_store were factored out so that they could be implemented as
>>>> desired per-platform. Alternatively there could be pre- and post- hooks
>>>> that could then be customized per platform. Otherwise you need
>>>> platform-specific ifdef's to handle it as per your patch.
>>>>
>>>> I'm not happy with the ifdef approach but I won't block it. I think this
>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>> abstractions are a confused mess in my opinion.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>
>>>>> I did not yet add the
>>>>>        OrderAccess::needs_support_iriw_ordering()
>>>>>        VM_Version::needs_support_iriw_ordering()
>>>>> or
>>>>>       OrderAccess::cpu_is_multiple_copy_atomic()
>>>>> to reduce #defined, as I got no further comment on that.
>>>>>
>>>>>
>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>> I feel not in the position to contribute substantially.
>>>>>
>>>>> But we would like to pass the torture test suite as we consider
>>>>> this a substantial task in implementing a PPC port.  Also we think
>>>>> both tests show behavior a programmer would expect.  It's bad if
>>>>> Java code runs fine on the more common x86 platform, and then
>>>>> fails on ppc.  This will always first be blamed on the VM.
>>>>>
>>>>> The fixes for the constructor issue are only needed because we
>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>> and place it before loads.  Then there is no sync between volatile store
>>>>> and publishing the object.  So we add it again in this one case
>>>>> (volatile store in constructor).
>>>>>
>>>>>
>>>>> @David
>>>>>>> Sure.  There also is no solution as you require for the taskqueue problem yet,
>>>>>>> and that's being discussed now for almost a year.
>>>>>> It may have started a year ago but work on it has hardly been continuous.
>>>>> That's not true, we did a lot of investigation and testing on this issue.
>>>>> And we came up with a solution we consider the best possible. If you
>>>>> have objections, you should at least give the draft of a better solution,
>>>>> we would volunteer to implement and test it.
>>>>> Similarly, we invested time in fixing the concurrency torture issues.
>>>>>
>>>>> @David
>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and
>>>>>> can't find any reference to it.
>>>>> We learned about this reading "A Tutorial Introduction to the ARM and
>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for
>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem.
>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>
>>>>> I was wrong in one thing, it's called multiple copy atomicity, I used 'read'
>>>>> instead.  Sorry for that.  (I also fixed that in the method name above).
>>>>>
>>>>> Best regards and thanks for all your involvements,
>>>>>       Goetz.
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>> To: Lindenmaier, Goetz
>>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>>
>>>>> Hi Goetz,
>>>>>
>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> -- Volatile in constuctor
>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>> missing constructor barrier.
>>>>>> We see them on PPC64. Our test machines have typically 8-32 processors
>>>>>> and are Power 5-7.  But see also Aleksey's mail.  (Thanks Aleksey!)
>>>>>
>>>>> And see follow ups - the tests are invalid.
>>>>>
>>>>>> -- IRIW issue
>>>>>>> I can not possibly answer to the necessary level of detail with a few
>>>>>>> moments thought.
>>>>>> Sure.  There also is no solution as you require for the taskqueue problem yet,
>>>>>> and that's being discussed now for almost a year.
>>>>>
>>>>> It may have started a year ago but work on it has hardly been continuous.
>>>>>
>>>>>>> You are implying there is a problem here that will
>>>>>>> impact numerous platforms (unless you can tell me why ppc is so different?)
>>>>>> No, only PPC does not have 'multiple-read-atomicity'.  Therefore I contributed a
>>>>>> solution with the #defines, and that's correct for all, but not nice, I admit.
>>>>>> (I don't really know about ARM, though).
>>>>>> So if I can write down a nicer solution testing for methods that are evaluated
>>>>>> by the C-compiler I'm happy.
>>>>>>
>>>>>> The problem is not that IRIW is not handled by the JMM, the problem
>>>>>> is that
>>>>>>        store
>>>>>>        sync
>>>>>> does not assure multiple-read-atomicity,
>>>>>> only
>>>>>>        sync
>>>>>>        load
>>>>>> does so on PPC.  And you require multiple-read-atomicity to
>>>>>> pass that test.
>>>>>
>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and
>>>>> can't find any reference to it.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>        The JMM is fine. And
>>>>>>        store
>>>>>>        MemBarVolatile
>>>>>> is fine on x86, sparc etc. as there exist assembler instructions that
>>>>>> do what is required.
>>>>>>
>>>>>> So if you are off soon, please let's come to a solution that
>>>>>> might be improvable in the way it's implemented, but that
>>>>>> allows us to implement a correct PPC64 port.
>>>>>>
>>>>>> Best regards,
>>>>>>        Goetz.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>> To: Lindenmaier, Goetz
>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>>>
>>>>>> Hi Goetz,
>>>>>>
>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>> Hi everybody,
>>>>>>>
>>>>>>> thanks a lot for the detailed reviews!
>>>>>>> I'll try to answer to all in one mail.
>>>>>>>
>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned;
>>>>>>> We don't think it's correct if we omit the barrier after initializing
>>>>>>> a volatile field.  Previously, we discussed this with Aleksey Shipilev
>>>>>>> and Doug Lea, and they agreed.
>>>>>>> Also, concurrency torture tests
>>>>>>>          LongVolatileTest
>>>>>>>          AtomicIntegerInitialValueTest
>>>>>>> will fail.
>>>>>>> (In addition, observing 0 instead of the inital value of a volatile field would be
>>>>>>> very counter-intuitive for Java programmers, especially in AtomicInteger.)
>>>>>>
>>>>>> The affects of unsafe publication are always surprising - volatiles do
>>>>>> not add anything special here. AFAIK there is nothing in the JMM that
>>>>>> requires the constructor barrier - discussions with Doug and Aleksey
>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>> missing constructor barrier.
>>>>>>
>>>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight
>>>>>>> Yes, it costs measurable performance.  But else it is wrong.  We don't
>>>>>>> see a way to implement this cheaper.
>>>>>>>
>>>>>>>> - these algorithms should be expressed using the correct OrderAccess operations
>>>>>>> Basically, I agree on this.  But you also have to take into account
>>>>>>> that due to the different memory ordering instructions on different platforms
>>>>>>> just implementing something empty is not sufficient.
>>>>>>> An example:
>>>>>>>         MemBarRelease    // means LoadStore, StoreStore barrier
>>>>>>>         MemBarVolatile    // means  StoreLoad barrier
>>>>>>> If these are consecutively in the code, sparc code looks like this:
>>>>>>>         MemBarRelease  --> membar(Assembler::LoadStore | Assembler::StoreStore)
>>>>>>>         MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>> Just doing what is required.
>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>> fine grained operations:
>>>>>>>          MemBarRelease --> lwsync   // Doing LoadStore, StoreStore, LoadLoad
>>>>>>>         MemBarVolatile --> sync    // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad
>>>>>>> obviously, the lwsync is superfluous.  Thus, as PPC operations are more (too) powerful,
>>>>>>> I need an additional optimization that removes the lwsync.  I can not implement
>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>
>>>>>>> Back to the IRIW problem.  I think here we have a comparable issue.
>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read
>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>
>>>>>>> I would propose to guard the code by
>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>> Else, David, how would you propose to implement this platform independent?
>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>
>>>>>> I can not possibly answer to the necessary level of detail with a few
>>>>>> moments thought. You are implying there is a problem here that will
>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>> different?) and I can not take that on face value at the moment. The
>>>>>> only reason I can see IRIW not being handled by the JMM requirements for
>>>>>> volatile accesses is if there are global visibility issues that are not
>>>>>> addressed - but even then I would expect heavy barriers at the store
>>>>>> would deal with that, not at the load. (This situation reminds me of the
>>>>>> need for read-barriers on Alpha architecture due to the use of software
>>>>>> cache-coherency rather than hardware cache-coherency - but we don't have
>>>>>> that on ppc!)
>>>>>>
>>>>>> Sorry - There is no quick resolution here and in a couple of days I will
>>>>>> be heading out on vacation for two weeks.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Best regards,
>>>>>>>         Goetz.
>>>>>>>
>>>>>>> -- Other ports:
>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so it might
>>>>>>> not happen on small machines.  But I can use PPC_ONLY instead
>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them).
>>>>>>>
>>>>>>> -- MemBarStoreStore after initialization
>>>>>>> I agree we should not change it in the ppc port.  If you wish, I can
>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>> To: Vladimir Kozlov
>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>>>>
>>>>>>> Okay this is my second attempt at answering this in a reasonable way :)
>>>>>>>
>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>
>>>>>>>        From what I understand what we see here is an attempt to fix an
>>>>>>> existing issue with the implementation of volatiles so that the IRIW
>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>> volatile reads extremely heavyweight by adding a fence() when doing the
>>>>>>> load.
>>>>>>>
>>>>>>> Now if this was purely handled in ppc64 source code then I would be
>>>>>>> happy to let them do whatever they like (surely this kills performance
>>>>>>> though!). But I do not agree with the changes to the shared code that
>>>>>>> allow this solution to be implemented - even with PPC64_ONLY this is
>>>>>>> polluting the shared code. My concern is similar to what I said with the
>>>>>>> taskQueue changes - these algorithms should be expressed using the
>>>>>>> correct OrderAccess operations to guarantee the desired properties
>>>>>>> independent of architecture. If such a "barrier" is not needed on a
>>>>>>> given architecture then the implementation in OrderAccess should reduce
>>>>>>> to a no-op.
>>>>>>>
>>>>>>> And as Vitaly points out the constructor barriers are not needed under
>>>>>>> the JMM.
>>>>>>>
>>>>>>>> I am fine with suggested changes because you did not change our current
>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>> But may be it should be done using more general query which is set
>>>>>>>> depending on platform:
>>>>>>>>
>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>
>>>>>>>> or similar to what we use now:
>>>>>>>>
>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>
>>>>>>> Every platform has to support IRIW this is simply part of the Java
>>>>>>> Memory Model, there should not be any need to call this out explicitly
>>>>>>> like this.
>>>>>>>
>>>>>>> Is there some subtlety of the hardware I am missing here? Are there
>>>>>>> visibility issues beyond the ordering constraints that the JMM defines?
>>>>>>>>        From what I understand our ppc port is also affected. David?
>>>>>>>
>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> In library_call.cpp can you add {}? New comment should be inside  else {}.
>>>>>>>>
>>>>>>>> I think you should make _wrote_volatile field not ppc64 specific which
>>>>>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY()
>>>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>>>>>>>>
>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>
>>>>>>>>           if (is_vol) {
>>>>>>>>             // See comment in do_get_xxx().
>>>>>>>> #ifndef PPC64
>>>>>>>>             insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>> #else
>>>>>>>>             if (is_field) {
>>>>>>>>               // Add MemBarRelease for constructors which write volatile field
>>>>>>>> (PPC64).
>>>>>>>>               set_wrote_volatile(true);
>>>>>>>>             }
>>>>>>>> #endif
>>>>>>>>           }
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of
>>>>>>>>> the torture test suite:
>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>
>>>>>>>>> Example:
>>>>>>>>> volatile x=0, y=0
>>>>>>>>>          __________    __________    __________    __________
>>>>>>>>> | Thread 0 |  | Thread 1 |  | Thread 2 |  | Thread 3 |
>>>>>>>>>
>>>>>>>>>           write(x=1)    read(x)       write(y=1)    read(y)
>>>>>>>>>                         read(y)                     read(x)
>>>>>>>>>
>>>>>>>>> Disallowed:     x=1, y=0                    y=1, x=0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This is only
>>>>>>>>> assured by the sync instruction and if it is executed in the threads
>>>>>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire
>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>
>>>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>
>>>>>>>>> Further this change contains a fix that assures that volatile fields
>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>> published.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at the code, we found a MemBarRelease that to us, seems too
>>>>>>>>> strong.
>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice.
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> Please review and test this change.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>          Goetz.
>>>>>>>>>
    
    
More information about the ppc-aix-port-dev
mailing list