RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Jan 16 00:54:57 PST 2014
On 1/16/14 12:34 AM, David Holmes wrote:
> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>> This is becoming ugly #ifdef mess. In compiler code we are trying to
>> avoid them. I suggested to have _wrote_volatile without #ifdef and I
>> want to keep it this way, it could be useful to have such info on other
>> platforms too. But I would suggest to remove PPC64 comments in parse.hpp.
>>
>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value
>> which could be checked in all places instead of #ifdef:
>
> I asked for the ifdef some time back as I find it much preferable to have this as a build-time construct rather than a
> runtime one. I don't want to have to pay anything for this if we don't use it.
Any decent C++ compiler will optimize expressions with such constants defined in header files. I insist to avoid #ifdefs
in C2 code. I really don't like the code with #ifdef in unsafe.cpp but I can live with it.
Vladimir
>
> David
>
>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>> #else
>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>> #endif
>>
>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>
>> and then:
>>
>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>> oop p = JNIHandles::resolve(obj); \
>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>> OrderAccess::fence(); \
>> volatile type_name v = OrderAccess::load_acquire((volatile
>> type_name*)index_oop_from_field_offset_long(p, offset));
>>
>> And:
>>
>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>> field->is_volatile()) {
>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier
>> + }
>>
>> And so on. The comments will be needed only in globalDefinitions.hpp
>>
>> The code in parse1.cpp could be put on one line:
>>
>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>> method()->is_initializer()) )) {
>>
>> Thanks,
>> Vladimir
>>
>> On 1/15/14 9:25 PM, David Holmes wrote:
>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>> Hi David,
>>>>
>>>> I updated the webrev:
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>
>>>> - I removed the IRIW example in parse3.cpp
>>>> - I adapted the comments not to point to that comment, and to
>>>> reflect the new flagging. Also I mention that we support the
>>>> volatile constructor issue, but that it's not standard.
>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>> I also think it's better to separate these this way.
>>>
>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>> declaration and all uses to be guarded by ifdef PPC64 too
>>> please.
>>>
>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>>> comment doesn't make much sense to me and refers to
>>> ppc specific stuff in a shared file:
>>>
>>> if (is_volatile) {
>>> ! if (!is_store) {
>>> insert_mem_bar(Op_MemBarAcquire);
>>> ! } else {
>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>>> insert_mem_bar(Op_MemBarVolatile);
>>> + #endif
>>> + }
>>>
>>> I don't think the comment is needed.
>>>
>>> Thanks,
>>> David
>>>
>>>> Thanks for your comments!
>>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>> To: Lindenmaier, Goetz
>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>> Independent Reads of Independent Writes
>>>>
>>>> Hi Goetz,
>>>>
>>>> Sorry for the delay in getting back to this.
>>>>
>>>> The general changes to the volatile barriers to support IRIW are okay.
>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>>>> specifically it is
>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>>>> the commentary excessive, particularly for shared code. In particular
>>>> the IRIW example in parse3.cpp - it seems a strange place to give the
>>>> explanation and I don't think we need it to that level of detail. Seems
>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>>>
>>>> The changes related to volatile writes in the constructor, as discussed
>>>> are not required by the Java Memory Model. If you want to keep these
>>>> then I think they should all be guarded with PPC64 because it is not
>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>>>> PPC64 porters.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> I updated this webrev. I detected a small flaw I made when editing
>>>>> this version.
>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>> I also based the webrev on the latest version of the stage repo.
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>
>>>>> Best regards,
>>>>> Goetz.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Lindenmaier, Goetz
>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>> To: David Holmes
>>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>> Independent Reads of Independent Writes
>>>>>
>>>>> Hi David,
>>>>>
>>>>>> So we can at least undo #4 now we have established those tests were
>>>>>> not
>>>>>> required to pass.
>>>>> We would prefer if we could keep this in. We want to avoid that it's
>>>>> blamed on the VM if java programs are failing on PPC after they worked
>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard
>>>>> it by
>>>>> a flag as proposed. But if you insist I will remove it. Also, this
>>>>> part is
>>>>> not that performance relevant.
>>>>>
>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>> think
>>>>> I added a compile-time guard in this new webrev:
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>>>>> several double negations I don't like, (#ifNdef
>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>> but this way I only have to change the ppc platform.
>>>>>
>>>>> Best regards,
>>>>> Goetz
>>>>>
>>>>> P.S.: I will also be available over the Christmas period.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>> To: Lindenmaier, Goetz
>>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>> Independent Reads of Independent Writes
>>>>>
>>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check
>>>>> emails.
>>>>>
>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> ok, I understand the tests are wrong. It's good this issue is
>>>>>> settled.
>>>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>>>
>>>>>> About our change: David, the causality is the other way round.
>>>>>> The change is about IRIW.
>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>
>>>>> This is the part I still have some question marks over as the
>>>>> implications are not nice for performance on non-TSO platforms. But I'm
>>>>> no further along in processing that paper I'm afraid.
>>>>>
>>>>>> 2. If we do syncs before loads, we don't need to do them after stores.
>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>> constructor tests.
>>>>>> 4. So finally we added them again at the end of the constructor
>>>>>> after stores
>>>>>> to pass the volatile constructor tests.
>>>>>
>>>>> So we can at least undo #4 now we have established those tests were not
>>>>> required to pass.
>>>>>
>>>>>> We originally passed the constructor tests because the ppc memory
>>>>>> order
>>>>>> instructions are not as find-granular as the
>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad.
>>>>>> The only instruction
>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore,
>>>>>> therefore the
>>>>>> MemBarVolatile after the store fixes the constructor tests. The
>>>>>> proper representation
>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now
>>>>>> it's pointless
>>>>>> anyways.
>>>>>>
>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>> I'd be happy to add a property
>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>
>>>>> A compile-time guard (ifdef) would be better than a runtime one I think
>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>>>> based not architecture based) as that will allows for turning this
>>>>> on/off for any architecture for testing purposes.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> or the like to guard the customization. I'd like that much better.
>>>>>> Or also
>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Goetz.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>> To: Lindenmaier, Goetz
>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>> Independent Reads of Independent Writes
>>>>>>
>>>>>> TL;DR version:
>>>>>>
>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>> constructor-barrier
>>>>>> for volatiles is not required as part of the JMM specification. It
>>>>>> *may*
>>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>>> ensure
>>>>>> you can't see uninitialized fields. So the tests for this are invalid
>>>>>> and this part of the patch is not needed in general (ppc64 may need it
>>>>>> due to other factors).
>>>>>>
>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>>> term :)
>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>
>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>> visible to
>>>>>> all other hardware threads at the same time point; these architectures
>>>>>> are not multiple-copy atomic."
>>>>>>
>>>>>> This is the visibility issue that I referred to and affects both
>>>>>> ARM and
>>>>>> PPC. But of course it is normally handled by using suitable barriers
>>>>>> after the stores that need to be visible. I think the crux of the
>>>>>> current issue is what you wrote below:
>>>>>>
>>>>>> > The fixes for the constructor issue are only needed because we
>>>>>> > remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>> > and place it before loads.
>>>>>>
>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync after
>>>>>> the store then you have to do something around the loads to get the
>>>>>> same
>>>>>> results! I still don't know what lead you to the conclusion that the
>>>>>> only way to fix the IRIW issue was to put the fence before the load -
>>>>>> maybe when I get the chance to read that paper in full it will be
>>>>>> clearer.
>>>>>>
>>>>>> So ... the basic problem is that the current structure in the VM has
>>>>>> hard-wired one choice of how to get the right semantics for volatile
>>>>>> variables. You now want to customize that but not all the requisite
>>>>>> hooks are present. It would be better if volatile_load and
>>>>>> volatile_store were factored out so that they could be implemented as
>>>>>> desired per-platform. Alternatively there could be pre- and post-
>>>>>> hooks
>>>>>> that could then be customized per platform. Otherwise you need
>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>
>>>>>> I'm not happy with the ifdef approach but I won't block it. I think
>>>>>> this
>>>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>>>> abstractions are a confused mess in my opinion.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>
>>>>>>> I did not yet add the
>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>> or
>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>
>>>>>>>
>>>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>
>>>>>>> But we would like to pass the torture test suite as we consider
>>>>>>> this a substantial task in implementing a PPC port. Also we think
>>>>>>> both tests show behavior a programmer would expect. It's bad if
>>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>>> fails on ppc. This will always first be blamed on the VM.
>>>>>>>
>>>>>>> The fixes for the constructor issue are only needed because we
>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>>> and place it before loads. Then there is no sync between volatile
>>>>>>> store
>>>>>>> and publishing the object. So we add it again in this one case
>>>>>>> (volatile store in constructor).
>>>>>>>
>>>>>>>
>>>>>>> @David
>>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>>> taskqueue problem yet,
>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>> continuous.
>>>>>>> That's not true, we did a lot of investigation and testing on this
>>>>>>> issue.
>>>>>>> And we came up with a solution we consider the best possible. If you
>>>>>>> have objections, you should at least give the draft of a better
>>>>>>> solution,
>>>>>>> we would volunteer to implement and test it.
>>>>>>> Similarly, we invested time in fixing the concurrency torture issues.
>>>>>>>
>>>>>>> @David
>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term
>>>>>>>> and
>>>>>>>> can't find any reference to it.
>>>>>>> We learned about this reading "A Tutorial Introduction to the ARM and
>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>> Work-Stealing for
>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>> taskqueue problem.
>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>
>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I
>>>>>>> used 'read'
>>>>>>> instead. Sorry for that. (I also fixed that in the method name
>>>>>>> above).
>>>>>>>
>>>>>>> Best regards and thanks for all your involvements,
>>>>>>> Goetz.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>> To: Lindenmaier, Goetz
>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>> Independent Reads of Independent Writes
>>>>>>>
>>>>>>> Hi Goetz,
>>>>>>>
>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> -- Volatile in constuctor
>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>> missing constructor barrier.
>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>> processors
>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!)
>>>>>>>
>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>
>>>>>>>> -- IRIW issue
>>>>>>>>> I can not possibly answer to the necessary level of detail with
>>>>>>>>> a few
>>>>>>>>> moments thought.
>>>>>>>> Sure. There also is no solution as you require for the taskqueue
>>>>>>>> problem yet,
>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>
>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>> continuous.
>>>>>>>
>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>> different?)
>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore
>>>>>>>> I contributed a
>>>>>>>> solution with the #defines, and that's correct for all, but not
>>>>>>>> nice, I admit.
>>>>>>>> (I don't really know about ARM, though).
>>>>>>>> So if I can write down a nicer solution testing for methods that
>>>>>>>> are evaluated
>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>
>>>>>>>> The problem is not that IRIW is not handled by the JMM, the problem
>>>>>>>> is that
>>>>>>>> store
>>>>>>>> sync
>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>> only
>>>>>>>> sync
>>>>>>>> load
>>>>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>>>>> pass that test.
>>>>>>>
>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and
>>>>>>> can't find any reference to it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> The JMM is fine. And
>>>>>>>> store
>>>>>>>> MemBarVolatile
>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions
>>>>>>>> that
>>>>>>>> do what is required.
>>>>>>>>
>>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>> 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>
>>>>>>>> Hi Goetz,
>>>>>>>>
>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi everybody,
>>>>>>>>>
>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>
>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM
>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>> initializing
>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey
>>>>>>>>> Shipilev
>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>> Also, concurrency torture tests
>>>>>>>>> LongVolatileTest
>>>>>>>>> AtomicIntegerInitialValueTest
>>>>>>>>> will fail.
>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>> volatile field would be
>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>> AtomicInteger.)
>>>>>>>>
>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>> volatiles do
>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM
>>>>>>>> that
>>>>>>>> requires the constructor barrier - discussions with Doug and Aleksey
>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>>>> missing constructor barrier.
>>>>>>>>
>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight
>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We
>>>>>>>>> don't
>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>
>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>> OrderAccess operations
>>>>>>>>> Basically, I agree on this. But you also have to take into account
>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>> different platforms
>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>> An example:
>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier
>>>>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>>>>> If these are consecutively in the code, sparc code looks like this:
>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore |
>>>>>>>>> Assembler::StoreStore)
>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>>> Just doing what is required.
>>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>>> fine grained operations:
>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore,
>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore,
>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations
>>>>>>>>> are more (too) powerful,
>>>>>>>>> I need an additional optimization that removes the lwsync. I
>>>>>>>>> can not implement
>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>
>>>>>>>>> Back to the IRIW problem. I think here we have a comparable issue.
>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the
>>>>>>>>> read
>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>>>
>>>>>>>>> I would propose to guard the code by
>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>>> independent?
>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>
>>>>>>>> I can not possibly answer to the necessary level of detail with a
>>>>>>>> few
>>>>>>>> moments thought. You are implying there is a problem here that will
>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>> different?) and I can not take that on face value at the moment. The
>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>> requirements for
>>>>>>>> volatile accesses is if there are global visibility issues that
>>>>>>>> are not
>>>>>>>> addressed - but even then I would expect heavy barriers at the store
>>>>>>>> would deal with that, not at the load. (This situation reminds me
>>>>>>>> of the
>>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>>> software
>>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>>> don't have
>>>>>>>> that on ppc!)
>>>>>>>>
>>>>>>>> Sorry - There is no quick resolution here and in a couple of days
>>>>>>>> I will
>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Goetz.
>>>>>>>>>
>>>>>>>>> -- Other ports:
>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so
>>>>>>>>> it might
>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead
>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them).
>>>>>>>>>
>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I
>>>>>>>>> can
>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>
>>>>>>>>> Okay this is my second attempt at answering this in a reasonable
>>>>>>>>> way :)
>>>>>>>>>
>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>
>>>>>>>>> From what I understand what we see here is an attempt to
>>>>>>>>> fix an
>>>>>>>>> existing issue with the implementation of volatiles so that the
>>>>>>>>> IRIW
>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>>> doing the
>>>>>>>>> load.
>>>>>>>>>
>>>>>>>>> Now if this was purely handled in ppc64 source code then I would be
>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>> performance
>>>>>>>>> though!). But I do not agree with the changes to the shared code
>>>>>>>>> that
>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>>> this is
>>>>>>>>> polluting the shared code. My concern is similar to what I said
>>>>>>>>> with the
>>>>>>>>> taskQueue changes - these algorithms should be expressed using the
>>>>>>>>> correct OrderAccess operations to guarantee the desired properties
>>>>>>>>> independent of architecture. If such a "barrier" is not needed on a
>>>>>>>>> given architecture then the implementation in OrderAccess should
>>>>>>>>> reduce
>>>>>>>>> to a no-op.
>>>>>>>>>
>>>>>>>>> And as Vitaly points out the constructor barriers are not needed
>>>>>>>>> under
>>>>>>>>> the JMM.
>>>>>>>>>
>>>>>>>>>> I am fine with suggested changes because you did not change our
>>>>>>>>>> current
>>>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>>>> But may be it should be done using more general query which is set
>>>>>>>>>> depending on platform:
>>>>>>>>>>
>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>
>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>
>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>
>>>>>>>>> Every platform has to support IRIW this is simply part of the Java
>>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>>> explicitly
>>>>>>>>> like this.
>>>>>>>>>
>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are there
>>>>>>>>> visibility issues beyond the ordering constraints that the JMM
>>>>>>>>> defines?
>>>>>>>>>> From what I understand our ppc port is also affected.
>>>>>>>>>> David?
>>>>>>>>>
>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>> inside else {}.
>>>>>>>>>>
>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>> specific which
>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>>>>>>>>>>
>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>
>>>>>>>>>> if (is_vol) {
>>>>>>>>>> // See comment in do_get_xxx().
>>>>>>>>>> #ifndef PPC64
>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>>>> #else
>>>>>>>>>> if (is_field) {
>>>>>>>>>> // Add MemBarRelease for constructors which write
>>>>>>>>>> volatile field
>>>>>>>>>> (PPC64).
>>>>>>>>>> set_wrote_volatile(true);
>>>>>>>>>> }
>>>>>>>>>> #endif
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>> the torture test suite:
>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>
>>>>>>>>>>> Example:
>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>> __________ __________ __________ __________
>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>>>>
>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y)
>>>>>>>>>>> read(y) read(x)
>>>>>>>>>>>
>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This
>>>>>>>>>>> is only
>>>>>>>>>>> assured by the sync instruction and if it is executed in the
>>>>>>>>>>> threads
>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>
>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>
>>>>>>>>>>> Further this change contains a fix that assures that volatile
>>>>>>>>>>> fields
>>>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>>>> published.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>>> seems too
>>>>>>>>>>> strong.
>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>>> suffice.
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Goetz.
>>>>>>>>>>>
More information about the ppc-aix-port-dev
mailing list