RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
David Holmes
david.holmes at oracle.com
Thu Jan 16 00:34:10 PST 2014
On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
> This is becoming ugly #ifdef mess. In compiler code we are trying to
> avoid them. I suggested to have _wrote_volatile without #ifdef and I
> want to keep it this way, it could be useful to have such info on other
> platforms too. But I would suggest to remove PPC64 comments in parse.hpp.
>
> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value
> which could be checked in all places instead of #ifdef:
I asked for the ifdef some time back as I find it much preferable to
have this as a build-time construct rather than a runtime one. I don't
want to have to pay anything for this if we don't use it.
David
> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
> #else
> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
> #endif
>
> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>
> and then:
>
> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
> oop p = JNIHandles::resolve(obj); \
> + if (support_IRIW_for_not_multiple_copy_atomic_cpu)
> OrderAccess::fence(); \
> volatile type_name v = OrderAccess::load_acquire((volatile
> type_name*)index_oop_from_field_offset_long(p, offset));
>
> And:
>
> + if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
> field->is_volatile()) {
> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier
> + }
>
> And so on. The comments will be needed only in globalDefinitions.hpp
>
> The code in parse1.cpp could be put on one line:
>
> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
> method()->is_initializer()) )) {
>
> Thanks,
> Vladimir
>
> On 1/15/14 9:25 PM, David Holmes wrote:
>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>> Hi David,
>>>
>>> I updated the webrev:
>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>
>>> - I removed the IRIW example in parse3.cpp
>>> - I adapted the comments not to point to that comment, and to
>>> reflect the new flagging. Also I mention that we support the
>>> volatile constructor issue, but that it's not standard.
>>> - I protected issuing the barrier for the constructor by PPC64.
>>> I also think it's better to separate these this way.
>>
>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>> declaration and all uses to be guarded by ifdef PPC64 too
>> please.
>>
>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>> comment doesn't make much sense to me and refers to
>> ppc specific stuff in a shared file:
>>
>> if (is_volatile) {
>> ! if (!is_store) {
>> insert_mem_bar(Op_MemBarAcquire);
>> ! } else {
>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>> insert_mem_bar(Op_MemBarVolatile);
>> + #endif
>> + }
>>
>> I don't think the comment is needed.
>>
>> Thanks,
>> David
>>
>>> Thanks for your comments!
>>>
>>> Best regards,
>>> Goetz.
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>> To: Lindenmaier, Goetz
>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>> Independent Reads of Independent Writes
>>>
>>> Hi Goetz,
>>>
>>> Sorry for the delay in getting back to this.
>>>
>>> The general changes to the volatile barriers to support IRIW are okay.
>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>>> specifically it is
>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>>> the commentary excessive, particularly for shared code. In particular
>>> the IRIW example in parse3.cpp - it seems a strange place to give the
>>> explanation and I don't think we need it to that level of detail. Seems
>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>>
>>> The changes related to volatile writes in the constructor, as discussed
>>> are not required by the Java Memory Model. If you want to keep these
>>> then I think they should all be guarded with PPC64 because it is not
>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>>> PPC64 porters.
>>>
>>> Thanks,
>>> David
>>>
>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>> Hi,
>>>>
>>>> I updated this webrev. I detected a small flaw I made when editing
>>>> this version.
>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>> I also based the webrev on the latest version of the stage repo.
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>> -----Original Message-----
>>>> From: Lindenmaier, Goetz
>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>> To: David Holmes
>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>> Independent Reads of Independent Writes
>>>>
>>>> Hi David,
>>>>
>>>>> So we can at least undo #4 now we have established those tests were
>>>>> not
>>>>> required to pass.
>>>> We would prefer if we could keep this in. We want to avoid that it's
>>>> blamed on the VM if java programs are failing on PPC after they worked
>>>> on x86. To clearly mark it as overfulfilling the spec I would guard
>>>> it by
>>>> a flag as proposed. But if you insist I will remove it. Also, this
>>>> part is
>>>> not that performance relevant.
>>>>
>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>> think
>>>> I added a compile-time guard in this new webrev:
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>>>> several double negations I don't like, (#ifNdef
>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>> but this way I only have to change the ppc platform.
>>>>
>>>> Best regards,
>>>> Goetz
>>>>
>>>> P.S.: I will also be available over the Christmas period.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>> To: Lindenmaier, Goetz
>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>> Independent Reads of Independent Writes
>>>>
>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check
>>>> emails.
>>>>
>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> ok, I understand the tests are wrong. It's good this issue is
>>>>> settled.
>>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>>
>>>>> About our change: David, the causality is the other way round.
>>>>> The change is about IRIW.
>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>
>>>> This is the part I still have some question marks over as the
>>>> implications are not nice for performance on non-TSO platforms. But I'm
>>>> no further along in processing that paper I'm afraid.
>>>>
>>>>> 2. If we do syncs before loads, we don't need to do them after stores.
>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>> constructor tests.
>>>>> 4. So finally we added them again at the end of the constructor
>>>>> after stores
>>>>> to pass the volatile constructor tests.
>>>>
>>>> So we can at least undo #4 now we have established those tests were not
>>>> required to pass.
>>>>
>>>>> We originally passed the constructor tests because the ppc memory
>>>>> order
>>>>> instructions are not as find-granular as the
>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad.
>>>>> The only instruction
>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore,
>>>>> therefore the
>>>>> MemBarVolatile after the store fixes the constructor tests. The
>>>>> proper representation
>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now
>>>>> it's pointless
>>>>> anyways.
>>>>>
>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>> I'd be happy to add a property
>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>
>>>> A compile-time guard (ifdef) would be better than a runtime one I think
>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>>> based not architecture based) as that will allows for turning this
>>>> on/off for any architecture for testing purposes.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> or the like to guard the customization. I'd like that much better.
>>>>> Or also
>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Goetz.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>> To: Lindenmaier, Goetz
>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>> Independent Reads of Independent Writes
>>>>>
>>>>> TL;DR version:
>>>>>
>>>>> Discussion on the c-i list has now confirmed that a
>>>>> constructor-barrier
>>>>> for volatiles is not required as part of the JMM specification. It
>>>>> *may*
>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>> ensure
>>>>> you can't see uninitialized fields. So the tests for this are invalid
>>>>> and this part of the patch is not needed in general (ppc64 may need it
>>>>> due to other factors).
>>>>>
>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>> term :)
>>>>> Second thanks for the reference to that paper! For reference:
>>>>>
>>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>>> complex interconnect) does not guarantee that a write becomes
>>>>> visible to
>>>>> all other hardware threads at the same time point; these architectures
>>>>> are not multiple-copy atomic."
>>>>>
>>>>> This is the visibility issue that I referred to and affects both
>>>>> ARM and
>>>>> PPC. But of course it is normally handled by using suitable barriers
>>>>> after the stores that need to be visible. I think the crux of the
>>>>> current issue is what you wrote below:
>>>>>
>>>>> > The fixes for the constructor issue are only needed because we
>>>>> > remove the sync instruction from behind stores (parse3.cpp:320)
>>>>> > and place it before loads.
>>>>>
>>>>> I hadn't grasped this part. Obviously if you fail to do the sync after
>>>>> the store then you have to do something around the loads to get the
>>>>> same
>>>>> results! I still don't know what lead you to the conclusion that the
>>>>> only way to fix the IRIW issue was to put the fence before the load -
>>>>> maybe when I get the chance to read that paper in full it will be
>>>>> clearer.
>>>>>
>>>>> So ... the basic problem is that the current structure in the VM has
>>>>> hard-wired one choice of how to get the right semantics for volatile
>>>>> variables. You now want to customize that but not all the requisite
>>>>> hooks are present. It would be better if volatile_load and
>>>>> volatile_store were factored out so that they could be implemented as
>>>>> desired per-platform. Alternatively there could be pre- and post-
>>>>> hooks
>>>>> that could then be customized per platform. Otherwise you need
>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>
>>>>> I'm not happy with the ifdef approach but I won't block it. I think
>>>>> this
>>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>>> abstractions are a confused mess in my opinion.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> -----
>>>>>
>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>
>>>>>> I did not yet add the
>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>> or
>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>
>>>>>>
>>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>>> I feel not in the position to contribute substantially.
>>>>>>
>>>>>> But we would like to pass the torture test suite as we consider
>>>>>> this a substantial task in implementing a PPC port. Also we think
>>>>>> both tests show behavior a programmer would expect. It's bad if
>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>> fails on ppc. This will always first be blamed on the VM.
>>>>>>
>>>>>> The fixes for the constructor issue are only needed because we
>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>> and place it before loads. Then there is no sync between volatile
>>>>>> store
>>>>>> and publishing the object. So we add it again in this one case
>>>>>> (volatile store in constructor).
>>>>>>
>>>>>>
>>>>>> @David
>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>> taskqueue problem yet,
>>>>>>>> and that's being discussed now for almost a year.
>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>> continuous.
>>>>>> That's not true, we did a lot of investigation and testing on this
>>>>>> issue.
>>>>>> And we came up with a solution we consider the best possible. If you
>>>>>> have objections, you should at least give the draft of a better
>>>>>> solution,
>>>>>> we would volunteer to implement and test it.
>>>>>> Similarly, we invested time in fixing the concurrency torture issues.
>>>>>>
>>>>>> @David
>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term
>>>>>>> and
>>>>>>> can't find any reference to it.
>>>>>> We learned about this reading "A Tutorial Introduction to the ARM and
>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>> Work-Stealing for
>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>> taskqueue problem.
>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>
>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I
>>>>>> used 'read'
>>>>>> instead. Sorry for that. (I also fixed that in the method name
>>>>>> above).
>>>>>>
>>>>>> Best regards and thanks for all your involvements,
>>>>>> Goetz.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>> To: Lindenmaier, Goetz
>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>> Independent Reads of Independent Writes
>>>>>>
>>>>>> Hi Goetz,
>>>>>>
>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> -- Volatile in constuctor
>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>> missing constructor barrier.
>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>> processors
>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!)
>>>>>>
>>>>>> And see follow ups - the tests are invalid.
>>>>>>
>>>>>>> -- IRIW issue
>>>>>>>> I can not possibly answer to the necessary level of detail with
>>>>>>>> a few
>>>>>>>> moments thought.
>>>>>>> Sure. There also is no solution as you require for the taskqueue
>>>>>>> problem yet,
>>>>>>> and that's being discussed now for almost a year.
>>>>>>
>>>>>> It may have started a year ago but work on it has hardly been
>>>>>> continuous.
>>>>>>
>>>>>>>> You are implying there is a problem here that will
>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>> different?)
>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore
>>>>>>> I contributed a
>>>>>>> solution with the #defines, and that's correct for all, but not
>>>>>>> nice, I admit.
>>>>>>> (I don't really know about ARM, though).
>>>>>>> So if I can write down a nicer solution testing for methods that
>>>>>>> are evaluated
>>>>>>> by the C-compiler I'm happy.
>>>>>>>
>>>>>>> The problem is not that IRIW is not handled by the JMM, the problem
>>>>>>> is that
>>>>>>> store
>>>>>>> sync
>>>>>>> does not assure multiple-read-atomicity,
>>>>>>> only
>>>>>>> sync
>>>>>>> load
>>>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>>>> pass that test.
>>>>>>
>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and
>>>>>> can't find any reference to it.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> The JMM is fine. And
>>>>>>> store
>>>>>>> MemBarVolatile
>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions
>>>>>>> that
>>>>>>> do what is required.
>>>>>>>
>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Goetz.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>> To: Lindenmaier, Goetz
>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>> 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>> Independent Reads of Independent Writes
>>>>>>>
>>>>>>> Hi Goetz,
>>>>>>>
>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi everybody,
>>>>>>>>
>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>
>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM
>>>>>>>>> to occur before the reference is assigned;
>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>> initializing
>>>>>>>> a volatile field. Previously, we discussed this with Aleksey
>>>>>>>> Shipilev
>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>> Also, concurrency torture tests
>>>>>>>> LongVolatileTest
>>>>>>>> AtomicIntegerInitialValueTest
>>>>>>>> will fail.
>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>> volatile field would be
>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>> AtomicInteger.)
>>>>>>>
>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>> volatiles do
>>>>>>> not add anything special here. AFAIK there is nothing in the JMM
>>>>>>> that
>>>>>>> requires the constructor barrier - discussions with Doug and Aleksey
>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>>> missing constructor barrier.
>>>>>>>
>>>>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight
>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We
>>>>>>>> don't
>>>>>>>> see a way to implement this cheaper.
>>>>>>>>
>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>> OrderAccess operations
>>>>>>>> Basically, I agree on this. But you also have to take into account
>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>> different platforms
>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>> An example:
>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier
>>>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>>>> If these are consecutively in the code, sparc code looks like this:
>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore |
>>>>>>>> Assembler::StoreStore)
>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>> Just doing what is required.
>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>> fine grained operations:
>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore,
>>>>>>>> StoreStore, LoadLoad
>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore,
>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations
>>>>>>>> are more (too) powerful,
>>>>>>>> I need an additional optimization that removes the lwsync. I
>>>>>>>> can not implement
>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>
>>>>>>>> Back to the IRIW problem. I think here we have a comparable issue.
>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the
>>>>>>>> read
>>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>>
>>>>>>>> I would propose to guard the code by
>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>> independent?
>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>
>>>>>>> I can not possibly answer to the necessary level of detail with a
>>>>>>> few
>>>>>>> moments thought. You are implying there is a problem here that will
>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>> different?) and I can not take that on face value at the moment. The
>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>> requirements for
>>>>>>> volatile accesses is if there are global visibility issues that
>>>>>>> are not
>>>>>>> addressed - but even then I would expect heavy barriers at the store
>>>>>>> would deal with that, not at the load. (This situation reminds me
>>>>>>> of the
>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>> software
>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>> don't have
>>>>>>> that on ppc!)
>>>>>>>
>>>>>>> Sorry - There is no quick resolution here and in a couple of days
>>>>>>> I will
>>>>>>> be heading out on vacation for two weeks.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz.
>>>>>>>>
>>>>>>>> -- Other ports:
>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so
>>>>>>>> it might
>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead
>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them).
>>>>>>>>
>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>> I agree we should not change it in the ppc port. If you wish, I
>>>>>>>> can
>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>> To: Vladimir Kozlov
>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>
>>>>>>>> Okay this is my second attempt at answering this in a reasonable
>>>>>>>> way :)
>>>>>>>>
>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>
>>>>>>>> From what I understand what we see here is an attempt to
>>>>>>>> fix an
>>>>>>>> existing issue with the implementation of volatiles so that the
>>>>>>>> IRIW
>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>> doing the
>>>>>>>> load.
>>>>>>>>
>>>>>>>> Now if this was purely handled in ppc64 source code then I would be
>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>> performance
>>>>>>>> though!). But I do not agree with the changes to the shared code
>>>>>>>> that
>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>> this is
>>>>>>>> polluting the shared code. My concern is similar to what I said
>>>>>>>> with the
>>>>>>>> taskQueue changes - these algorithms should be expressed using the
>>>>>>>> correct OrderAccess operations to guarantee the desired properties
>>>>>>>> independent of architecture. If such a "barrier" is not needed on a
>>>>>>>> given architecture then the implementation in OrderAccess should
>>>>>>>> reduce
>>>>>>>> to a no-op.
>>>>>>>>
>>>>>>>> And as Vitaly points out the constructor barriers are not needed
>>>>>>>> under
>>>>>>>> the JMM.
>>>>>>>>
>>>>>>>>> I am fine with suggested changes because you did not change our
>>>>>>>>> current
>>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>>> But may be it should be done using more general query which is set
>>>>>>>>> depending on platform:
>>>>>>>>>
>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>
>>>>>>>>> or similar to what we use now:
>>>>>>>>>
>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>
>>>>>>>> Every platform has to support IRIW this is simply part of the Java
>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>> explicitly
>>>>>>>> like this.
>>>>>>>>
>>>>>>>> Is there some subtlety of the hardware I am missing here? Are there
>>>>>>>> visibility issues beyond the ordering constraints that the JMM
>>>>>>>> defines?
>>>>>>>>> From what I understand our ppc port is also affected.
>>>>>>>>> David?
>>>>>>>>
>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>> inside else {}.
>>>>>>>>>
>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>> specific which
>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>> PPC64_ONLY()
>>>>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>>>>>>>>>
>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>
>>>>>>>>> if (is_vol) {
>>>>>>>>> // See comment in do_get_xxx().
>>>>>>>>> #ifndef PPC64
>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>>> #else
>>>>>>>>> if (is_field) {
>>>>>>>>> // Add MemBarRelease for constructors which write
>>>>>>>>> volatile field
>>>>>>>>> (PPC64).
>>>>>>>>> set_wrote_volatile(true);
>>>>>>>>> }
>>>>>>>>> #endif
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>> the torture test suite:
>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>
>>>>>>>>>> Example:
>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>> __________ __________ __________ __________
>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>>>
>>>>>>>>>> write(x=1) read(x) write(y=1) read(y)
>>>>>>>>>> read(y) read(x)
>>>>>>>>>>
>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This
>>>>>>>>>> is only
>>>>>>>>>> assured by the sync instruction and if it is executed in the
>>>>>>>>>> threads
>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>> sync-load-acquire
>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>
>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>
>>>>>>>>>> Further this change contains a fix that assures that volatile
>>>>>>>>>> fields
>>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>>> published.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>> seems too
>>>>>>>>>> strong.
>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>> suffice.
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>> Please review and test this change.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Goetz.
>>>>>>>>>>
More information about the hotspot-dev
mailing list