RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Jan 22 22:55:57 PST 2014
Okay, lets resolve it.
David, are you talking about this change?:
http://hg.openjdk.java.net/ppc-aix-port/stage/hotspot/diff/3205e78d8193/src/share/vm/runtime/thread.hpp
Which may penalize our binaries without need. I think Goetz suggested already to put it under #ifdef PPC64 but I may be
misremember. Would #ifdef solution satisfy you?
Thanks,
Vladimir
On 1/22/14 10:10 PM, David Holmes wrote:
> On 23/01/2014 3:20 AM, Lindenmaier, Goetz wrote:
>> Hi Vladimir,
>>
>>> Yes, I will backport it.
>> That's good, thank you!
>>
>>> I will build bundles and give them to SQE for final testing.
>> __final__ That's even better, that's great!!!
>
> Hmmm I still have reservations concerning the _thread_state changes that were pushed.
>
> David
> -----
>
>> Best regards,
>> Goetz.
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Mittwoch, 22. Januar 2014 18:02
>> To: Lindenmaier, Goetz
>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> Hi Goetz
>>
>> On 1/22/14 1:20 AM, Lindenmaier, Goetz wrote:
>>> Hi Vladimir,
>>>
>>> Thanks for testing and pushing!
>>>
>>> Will you push this also to stage? I assume we can handle this
>>> as the other three hotspot changes, without a new bug-id?
>>
>> Yes, I will backport it.
>>
>> What about JDK changes Volker pushed: 8028537, 8031134, 8031997 and new one from today 8031581?
>> Should I backport all of them into 8u stage? From conversion between Volker and Alan some of them need backport a fix
>> from jdk9. Or I am mistaking?
>>
>>>
>>> Also, when do you think we (you unfortunately) should update
>>> the repos again? Stage-9 maybe after Volkers last change is submitted?
>>
>> After I test and push 8031581 I will do sync with latest jdk9 sources (b01).
>> I will build bundles and give them to SQE for final testing.
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Best regards,
>>> Goetz
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: hotspot-dev-bounces at openjdk.java.net [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
>>> Sent: Dienstag, 21. Januar 2014 21:00
>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>
>>> Thanks. I am pushing it.
>>>
>>> Vladimir
>>>
>>> On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote:
>>>> Sorry, I missed that. fixed.
>>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Dienstag, 21. Januar 2014 12:53
>>>> To: Lindenmaier, Goetz; Vladimir Kozlov
>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>
>>>> Thanks Goetz!
>>>>
>>>> This typo still exists:
>>>>
>>>> + bool _wrote_volatile; // Did we write a final field?
>>>>
>>>> s/final/volatile/
>>>>
>>>> Otherwise no further comments from me.
>>>>
>>>> David
>>>>
>>>> On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> I made a new webrev
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/
>>>>> differing from
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>>>> only in the comments.
>>>>>
>>>>> I removed
>>>>> // Support ordering of "Independent Reads of Independent Writes".
>>>>> everywhere, and edited the comments in the globalDefinition*.hpp
>>>>> files.
>>>>>
>>>>> Best regards,
>>>>> Goetz.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Dienstag, 21. Januar 2014 05:55
>>>>> To: Lindenmaier, Goetz; Vladimir Kozlov
>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>>
>>>>> Hi Goetz,
>>>>>
>>>>> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I tried to come up with a webrev that implements the change as proposed in
>>>>>> your mails:
>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>>>>>
>>>>>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use
>>>>>> support_IRIW_for_not_multiple_copy_atomic_cpu.
>>>>>
>>>>> Given the flag name the commentary eg:
>>>>>
>>>>> + // Support ordering of "Independent Reads of Independent Writes".
>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
>>>>>
>>>>> seems somewhat redundant.
>>>>>
>>>>>> I left the definition and handling of _wrote_volatile in the code, without
>>>>>> any protection.
>>>>>
>>>>> + bool _wrote_volatile; // Did we write a final field?
>>>>>
>>>>> s/final/volatile
>>>>>
>>>>>> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() ,
>>>>>> and put it on one line.
>>>>>>
>>>>>> I removed the comment in library_call.cpp.
>>>>>> I also removed the sentence " Solution: implement volatile read as sync-load-acquire."
>>>>>> from the comments as it's PPC specific.
>>>>>
>>>>> I think the primary IRIW comment/explanation should go in
>>>>> globalDefinitions.hpp where
>>>>> support_IRIW_for_not_multiple_copy_atomic_cpu is defined.
>>>>>
>>>>>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these
>>>>>> issues in C1 if nobody did it by then.
>>>>>
>>>>> I've filed:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8032366
>>>>>
>>>>> "Implement C1 support for IRIW conformance on non-multiple-copy-atomic
>>>>> platforms"
>>>>>
>>>>> to cover this task, as it may be needed sooner rather than later.
>>>>>
>>>>>> Wrt. to performance: Oracle will soon do heavy testing of the port. If any
>>>>>> performance problems arise, we still can add #ifdef PPC64 to circumvent this.
>>>>>
>>>>> Ok.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Best regards,
>>>>>> Goetz.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Donnerstag, 16. Januar 2014 10:05
>>>>>> To: Vladimir Kozlov
>>>>>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>>>
>>>>>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote:
>>>>>>> On 1/16/14 12:34 AM, David Holmes wrote:
>>>>>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>>>>>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to
>>>>>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I
>>>>>>>>> want to keep it this way, it could be useful to have such info on other
>>>>>>>>> platforms too. But I would suggest to remove PPC64 comments in
>>>>>>>>> parse.hpp.
>>>>>>>>>
>>>>>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value
>>>>>>>>> which could be checked in all places instead of #ifdef:
>>>>>>>>
>>>>>>>> I asked for the ifdef some time back as I find it much preferable to
>>>>>>>> have this as a build-time construct rather than a
>>>>>>>> runtime one. I don't want to have to pay anything for this if we don't
>>>>>>>> use it.
>>>>>>>
>>>>>>> Any decent C++ compiler will optimize expressions with such constants
>>>>>>> defined in header files. I insist to avoid #ifdefs in C2 code. I really
>>>>>>> don't like the code with #ifdef in unsafe.cpp but I can live with it.
>>>>>>
>>>>>> If you insist then we may as well do it all the same way. Better to be
>>>>>> consistent.
>>>>>>
>>>>>> My apologies Goetz for wasting your time going back and forth on this.
>>>>>>
>>>>>> That aside I have a further concern with this IRIW support - it is
>>>>>> incomplete as there is no C1 support, as PPC64 isn't using client. If
>>>>>> this is going on then we (which probably means the Oracle 'we') need to
>>>>>> add the missing C1 code.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Vladimir
>>>>>>>
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>>>>>>>>> #else
>>>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>>>>>>>>> #endif
>>>>>>>>>
>>>>>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>>>>>>>>
>>>>>>>>> and then:
>>>>>>>>>
>>>>>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>>>>>>>>> oop p = JNIHandles::resolve(obj); \
>>>>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>>>>>>>>> OrderAccess::fence(); \
>>>>>>>>> volatile type_name v = OrderAccess::load_acquire((volatile
>>>>>>>>> type_name*)index_oop_from_field_offset_long(p, offset));
>>>>>>>>>
>>>>>>>>> And:
>>>>>>>>>
>>>>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>>>>>>>>> field->is_volatile()) {
>>>>>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier
>>>>>>>>> + }
>>>>>>>>>
>>>>>>>>> And so on. The comments will be needed only in globalDefinitions.hpp
>>>>>>>>>
>>>>>>>>> The code in parse1.cpp could be put on one line:
>>>>>>>>>
>>>>>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>>>>>>>>> method()->is_initializer()) )) {
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 1/15/14 9:25 PM, David Holmes wrote:
>>>>>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>> I updated the webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>>
>>>>>>>>>>> - I removed the IRIW example in parse3.cpp
>>>>>>>>>>> - I adapted the comments not to point to that comment, and to
>>>>>>>>>>> reflect the new flagging. Also I mention that we support the
>>>>>>>>>>> volatile constructor issue, but that it's not standard.
>>>>>>>>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>>>>>>>>> I also think it's better to separate these this way.
>>>>>>>>>>
>>>>>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>>>>>>>>> declaration and all uses to be guarded by ifdef PPC64 too
>>>>>>>>>> please.
>>>>>>>>>>
>>>>>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>>>>>>>>>> comment doesn't make much sense to me and refers to
>>>>>>>>>> ppc specific stuff in a shared file:
>>>>>>>>>>
>>>>>>>>>> if (is_volatile) {
>>>>>>>>>> ! if (!is_store) {
>>>>>>>>>> insert_mem_bar(Op_MemBarAcquire);
>>>>>>>>>> ! } else {
>>>>>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile);
>>>>>>>>>> + #endif
>>>>>>>>>> + }
>>>>>>>>>>
>>>>>>>>>> I don't think the comment is needed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> Thanks for your comments!
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Goetz.
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>>>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>
>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the delay in getting back to this.
>>>>>>>>>>>
>>>>>>>>>>> The general changes to the volatile barriers to support IRIW are okay.
>>>>>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>>>>>>>>>>> specifically it is
>>>>>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>>>>>>>>>>> the commentary excessive, particularly for shared code. In particular
>>>>>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the
>>>>>>>>>>> explanation and I don't think we need it to that level of detail.
>>>>>>>>>>> Seems
>>>>>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>>>>>>>>>>
>>>>>>>>>>> The changes related to volatile writes in the constructor, as
>>>>>>>>>>> discussed
>>>>>>>>>>> are not required by the Java Memory Model. If you want to keep these
>>>>>>>>>>> then I think they should all be guarded with PPC64 because it is not
>>>>>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>>>>>>>>>>> PPC64 porters.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I updated this webrev. I detected a small flaw I made when editing
>>>>>>>>>>>> this version.
>>>>>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>>>>>>>>> I also based the webrev on the latest version of the stage repo.
>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Lindenmaier, Goetz
>>>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>>>>>>>>> To: David Holmes
>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>>> So we can at least undo #4 now we have established those tests were
>>>>>>>>>>>>> not
>>>>>>>>>>>>> required to pass.
>>>>>>>>>>>> We would prefer if we could keep this in. We want to avoid that it's
>>>>>>>>>>>> blamed on the VM if java programs are failing on PPC after they
>>>>>>>>>>>> worked
>>>>>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard
>>>>>>>>>>>> it by
>>>>>>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this
>>>>>>>>>>>> part is
>>>>>>>>>>>> not that performance relevant.
>>>>>>>>>>>>
>>>>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>>>>>>> think
>>>>>>>>>>>> I added a compile-time guard in this new webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>>>>>>>>>>>> several double negations I don't like, (#ifNdef
>>>>>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>>>>>>>>> but this way I only have to change the ppc platform.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Goetz
>>>>>>>>>>>>
>>>>>>>>>>>> P.S.: I will also be available over the Christmas period.
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>>>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check
>>>>>>>>>>>> emails.
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> ok, I understand the tests are wrong. It's good this issue is
>>>>>>>>>>>>> settled.
>>>>>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>>>>>>>>>>
>>>>>>>>>>>>> About our change: David, the causality is the other way round.
>>>>>>>>>>>>> The change is about IRIW.
>>>>>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>>>>>>>>
>>>>>>>>>>>> This is the part I still have some question marks over as the
>>>>>>>>>>>> implications are not nice for performance on non-TSO platforms.
>>>>>>>>>>>> But I'm
>>>>>>>>>>>> no further along in processing that paper I'm afraid.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after
>>>>>>>>>>>>> stores.
>>>>>>>>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>>>>>>>>> constructor tests.
>>>>>>>>>>>>> 4. So finally we added them again at the end of the constructor
>>>>>>>>>>>>> after stores
>>>>>>>>>>>>> to pass the volatile constructor tests.
>>>>>>>>>>>>
>>>>>>>>>>>> So we can at least undo #4 now we have established those tests
>>>>>>>>>>>> were not
>>>>>>>>>>>> required to pass.
>>>>>>>>>>>>
>>>>>>>>>>>>> We originally passed the constructor tests because the ppc memory
>>>>>>>>>>>>> order
>>>>>>>>>>>>> instructions are not as find-granular as the
>>>>>>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad.
>>>>>>>>>>>>> The only instruction
>>>>>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore,
>>>>>>>>>>>>> therefore the
>>>>>>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The
>>>>>>>>>>>>> proper representation
>>>>>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now
>>>>>>>>>>>>> it's pointless
>>>>>>>>>>>>> anyways.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>>>>>>>>> I'd be happy to add a property
>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>>>
>>>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>>>>>> think
>>>>>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>>>>>>>>>>> based not architecture based) as that will allows for turning this
>>>>>>>>>>>> on/off for any architecture for testing purposes.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>> or the like to guard the customization. I'd like that much better.
>>>>>>>>>>>>> Or also
>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>
>>>>>>>>>>>>> TL;DR version:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>>>>>>>>> constructor-barrier
>>>>>>>>>>>>> for volatiles is not required as part of the JMM specification. It
>>>>>>>>>>>>> *may*
>>>>>>>>>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>>>>>>>>>> ensure
>>>>>>>>>>>>> you can't see uninitialized fields. So the tests for this are
>>>>>>>>>>>>> invalid
>>>>>>>>>>>>> and this part of the patch is not needed in general (ppc64 may
>>>>>>>>>>>>> need it
>>>>>>>>>>>>> due to other factors).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>>>>>>>>>> term :)
>>>>>>>>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>>>>>>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>>>>>>>>> visible to
>>>>>>>>>>>>> all other hardware threads at the same time point; these
>>>>>>>>>>>>> architectures
>>>>>>>>>>>>> are not multiple-copy atomic."
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is the visibility issue that I referred to and affects both
>>>>>>>>>>>>> ARM and
>>>>>>>>>>>>> PPC. But of course it is normally handled by using suitable barriers
>>>>>>>>>>>>> after the stores that need to be visible. I think the crux of the
>>>>>>>>>>>>> current issue is what you wrote below:
>>>>>>>>>>>>>
>>>>>>>>>>>>> > The fixes for the constructor issue are only needed because we
>>>>>>>>>>>>> > remove the sync instruction from behind stores
>>>>>>>>>>>>> (parse3.cpp:320)
>>>>>>>>>>>>> > and place it before loads.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync
>>>>>>>>>>>>> after
>>>>>>>>>>>>> the store then you have to do something around the loads to get the
>>>>>>>>>>>>> same
>>>>>>>>>>>>> results! I still don't know what lead you to the conclusion that the
>>>>>>>>>>>>> only way to fix the IRIW issue was to put the fence before the
>>>>>>>>>>>>> load -
>>>>>>>>>>>>> maybe when I get the chance to read that paper in full it will be
>>>>>>>>>>>>> clearer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So ... the basic problem is that the current structure in the VM has
>>>>>>>>>>>>> hard-wired one choice of how to get the right semantics for volatile
>>>>>>>>>>>>> variables. You now want to customize that but not all the requisite
>>>>>>>>>>>>> hooks are present. It would be better if volatile_load and
>>>>>>>>>>>>> volatile_store were factored out so that they could be
>>>>>>>>>>>>> implemented as
>>>>>>>>>>>>> desired per-platform. Alternatively there could be pre- and post-
>>>>>>>>>>>>> hooks
>>>>>>>>>>>>> that could then be customized per platform. Otherwise you need
>>>>>>>>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think
>>>>>>>>>>>>> this
>>>>>>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>>>>>>>>>>> abstractions are a confused mess in my opinion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did not yet add the
>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>>>>>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But we would like to pass the torture test suite as we consider
>>>>>>>>>>>>>> this a substantial task in implementing a PPC port. Also we think
>>>>>>>>>>>>>> both tests show behavior a programmer would expect. It's bad if
>>>>>>>>>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>>>>>>>>>> fails on ppc. This will always first be blamed on the VM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The fixes for the constructor issue are only needed because we
>>>>>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>>>>>>>>>> and place it before loads. Then there is no sync between volatile
>>>>>>>>>>>>>> store
>>>>>>>>>>>>>> and publishing the object. So we add it again in this one case
>>>>>>>>>>>>>> (volatile store in constructor).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @David
>>>>>>>>>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>>>>>>>>>> taskqueue problem yet,
>>>>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>>>>>>> continuous.
>>>>>>>>>>>>>> That's not true, we did a lot of investigation and testing on this
>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>> And we came up with a solution we consider the best possible. If
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> have objections, you should at least give the draft of a better
>>>>>>>>>>>>>> solution,
>>>>>>>>>>>>>> we would volunteer to implement and test it.
>>>>>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture
>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @David
>>>>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the
>>>>>>>>>>>>>> ARM and
>>>>>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>>>>>>>>> Work-Stealing for
>>>>>>>>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>>>>>>>>> taskqueue problem.
>>>>>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I
>>>>>>>>>>>>>> used 'read'
>>>>>>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name
>>>>>>>>>>>>>> above).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards and thanks for all your involvements,
>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Volatile in constuctor
>>>>>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>>>>>>>>> processors
>>>>>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks
>>>>>>>>>>>>>>> Aleksey!)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- IRIW issue
>>>>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with
>>>>>>>>>>>>>>>> a few
>>>>>>>>>>>>>>>> moments thought.
>>>>>>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue
>>>>>>>>>>>>>>> problem yet,
>>>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>>>>>> continuous.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>>>>>>>> different?)
>>>>>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore
>>>>>>>>>>>>>>> I contributed a
>>>>>>>>>>>>>>> solution with the #defines, and that's correct for all, but not
>>>>>>>>>>>>>>> nice, I admit.
>>>>>>>>>>>>>>> (I don't really know about ARM, though).
>>>>>>>>>>>>>>> So if I can write down a nicer solution testing for methods that
>>>>>>>>>>>>>>> are evaluated
>>>>>>>>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>> is that
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>> load
>>>>>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>>>>>>>>>>>> pass that test.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>>>>>>>> term and
>>>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The JMM is fine. And
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> MemBarVolatile
>>>>>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> do what is required.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>>>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>>>>>>>>> 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>> Hi everybody,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM
>>>>>>>>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>>>>>>>>> initializing
>>>>>>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey
>>>>>>>>>>>>>>>> Shipilev
>>>>>>>>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>>>>>>>>> Also, concurrency torture tests
>>>>>>>>>>>>>>>> LongVolatileTest
>>>>>>>>>>>>>>>> AtomicIntegerInitialValueTest
>>>>>>>>>>>>>>>> will fail.
>>>>>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>>>>>>>>> volatile field would be
>>>>>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>>>>>>>>> AtomicInteger.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>>>>>>>>> volatiles do
>>>>>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> requires the constructor barrier - discussions with Doug and
>>>>>>>>>>>>>>> Aleksey
>>>>>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely
>>>>>>>>>>>>>>>>> heavyweight
>>>>>>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We
>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>>>>>>>>> OrderAccess operations
>>>>>>>>>>>>>>>> Basically, I agree on this. But you also have to take into
>>>>>>>>>>>>>>>> account
>>>>>>>>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>>>>>>>>> different platforms
>>>>>>>>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>>>>>>>>> An example:
>>>>>>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier
>>>>>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like
>>>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore |
>>>>>>>>>>>>>>>> Assembler::StoreStore)
>>>>>>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>>>>>>>>>> Just doing what is required.
>>>>>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>>>>>>>>>> fine grained operations:
>>>>>>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore,
>>>>>>>>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore,
>>>>>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations
>>>>>>>>>>>>>>>> are more (too) powerful,
>>>>>>>>>>>>>>>> I need an additional optimization that removes the lwsync. I
>>>>>>>>>>>>>>>> can not implement
>>>>>>>>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable
>>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the
>>>>>>>>>>>>>>>> read
>>>>>>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would propose to guard the code by
>>>>>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>>>>>>>>>> independent?
>>>>>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with a
>>>>>>>>>>>>>>> few
>>>>>>>>>>>>>>> moments thought. You are implying there is a problem here that
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>>>>>>> different?) and I can not take that on face value at the
>>>>>>>>>>>>>>> moment. The
>>>>>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>>>>>>>>> requirements for
>>>>>>>>>>>>>>> volatile accesses is if there are global visibility issues that
>>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> would deal with that, not at the load. (This situation reminds me
>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>>>>>>>>>> software
>>>>>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>>>>>>>>>> don't have
>>>>>>>>>>>>>>> that on ppc!)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days
>>>>>>>>>>>>>>> I will
>>>>>>>>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- Other ports:
>>>>>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so
>>>>>>>>>>>>>>>> it might
>>>>>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead
>>>>>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of
>>>>>>>>>>>>>>>> them).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I
>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable
>>>>>>>>>>>>>>>> way :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> From what I understand what we see here is an attempt to
>>>>>>>>>>>>>>>> fix an
>>>>>>>>>>>>>>>> existing issue with the implementation of volatiles so that the
>>>>>>>>>>>>>>>> IRIW
>>>>>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>>>>>>>>>> doing the
>>>>>>>>>>>>>>>> load.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I
>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>> though!). But I do not agree with the changes to the shared code
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>>>>>>>>>> this is
>>>>>>>>>>>>>>>> polluting the shared code. My concern is similar to what I said
>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired
>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed
>>>>>>>>>>>>>>>> on a
>>>>>>>>>>>>>>>> given architecture then the implementation in OrderAccess should
>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>> to a no-op.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed
>>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>>> the JMM.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am fine with suggested changes because you did not change our
>>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>>>>>>>>>>> But may be it should be done using more general query which
>>>>>>>>>>>>>>>>> is set
>>>>>>>>>>>>>>>>> depending on platform:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the
>>>>>>>>>>>>>>>> Java
>>>>>>>>>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>>>>>>>>>> explicitly
>>>>>>>>>>>>>>>> like this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are
>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM
>>>>>>>>>>>>>>>> defines?
>>>>>>>>>>>>>>>>> From what I understand our ppc port is also affected.
>>>>>>>>>>>>>>>>> David?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>>>>>>>>> inside else {}.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>>>>>>>>> specific which
>>>>>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many
>>>>>>>>>>>>>>>>> #ifdefs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> if (is_vol) {
>>>>>>>>>>>>>>>>> // See comment in do_get_xxx().
>>>>>>>>>>>>>>>>> #ifndef PPC64
>>>>>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>>>> if (is_field) {
>>>>>>>>>>>>>>>>> // Add MemBarRelease for constructors which write
>>>>>>>>>>>>>>>>> volatile field
>>>>>>>>>>>>>>>>> (PPC64).
>>>>>>>>>>>>>>>>> set_wrote_volatile(true);
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>>>>>>>>> the torture test suite:
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>>>>>>>>> __________ __________ __________ __________
>>>>>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y)
>>>>>>>>>>>>>>>>>> read(y) read(x)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This
>>>>>>>>>>>>>>>>>> is only
>>>>>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the
>>>>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple
>>>>>>>>>>>>>>>>>> SIGSEGVs
>>>>>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Further this change contains a fix that assures that volatile
>>>>>>>>>>>>>>>>>> fields
>>>>>>>>>>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>>>>>>>>>>> published.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>>>>>>>>>> seems too
>>>>>>>>>>>>>>>>>> strong.
>>>>>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>>>>>>>>>> suffice.
>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>>>>
More information about the ppc-aix-port-dev
mailing list