RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
David Holmes
david.holmes at oracle.com
Thu Jan 23 23:28:12 PST 2014
On 23/01/2014 4:55 PM, Vladimir Kozlov wrote:
> Okay, lets resolve it.
>
> David, are you talking about this change?:
>
> http://hg.openjdk.java.net/ppc-aix-port/stage/hotspot/diff/3205e78d8193/src/share/vm/runtime/thread.hpp
Yes.
> Which may penalize our binaries without need. I think Goetz suggested
> already to put it under #ifdef PPC64 but I may be misremember. Would
> #ifdef solution satisfy you?
Based on other discussions this ties in with the use, or not, of
UseMembar with non-TSO systems. But an ifdef PPC64 would be okay pending
more complete resolution of these matters sometime in the future.
Thanks,
David
> Thanks,
> Vladimir
>
> On 1/22/14 10:10 PM, David Holmes wrote:
>> On 23/01/2014 3:20 AM, Lindenmaier, Goetz wrote:
>>> Hi Vladimir,
>>>
>>>> Yes, I will backport it.
>>> That's good, thank you!
>>>
>>>> I will build bundles and give them to SQE for final testing.
>>> __final__ That's even better, that's great!!!
>>
>> Hmmm I still have reservations concerning the _thread_state changes
>> that were pushed.
>>
>> David
>> -----
>>
>>> Best regards,
>>> Goetz.
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Mittwoch, 22. Januar 2014 18:02
>>> To: Lindenmaier, Goetz
>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>> Independent Reads of Independent Writes
>>>
>>> Hi Goetz
>>>
>>> On 1/22/14 1:20 AM, Lindenmaier, Goetz wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks for testing and pushing!
>>>>
>>>> Will you push this also to stage? I assume we can handle this
>>>> as the other three hotspot changes, without a new bug-id?
>>>
>>> Yes, I will backport it.
>>>
>>> What about JDK changes Volker pushed: 8028537, 8031134, 8031997 and
>>> new one from today 8031581?
>>> Should I backport all of them into 8u stage? From conversion between
>>> Volker and Alan some of them need backport a fix
>>> from jdk9. Or I am mistaking?
>>>
>>>>
>>>> Also, when do you think we (you unfortunately) should update
>>>> the repos again? Stage-9 maybe after Volkers last change is submitted?
>>>
>>> After I test and push 8031581 I will do sync with latest jdk9 sources
>>> (b01).
>>> I will build bundles and give them to SQE for final testing.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Best regards,
>>>> Goetz
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-dev-bounces at openjdk.java.net
>>>> [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Vladimir
>>>> Kozlov
>>>> Sent: Dienstag, 21. Januar 2014 21:00
>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>> Independent Reads of Independent Writes
>>>>
>>>> Thanks. I am pushing it.
>>>>
>>>> Vladimir
>>>>
>>>> On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote:
>>>>> Sorry, I missed that. fixed.
>>>>>
>>>>> Best regards,
>>>>> Goetz.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Dienstag, 21. Januar 2014 12:53
>>>>> To: Lindenmaier, Goetz; Vladimir Kozlov
>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>> 'hotspot-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>> Independent Reads of Independent Writes
>>>>>
>>>>> Thanks Goetz!
>>>>>
>>>>> This typo still exists:
>>>>>
>>>>> + bool _wrote_volatile; // Did we write a final field?
>>>>>
>>>>> s/final/volatile/
>>>>>
>>>>> Otherwise no further comments from me.
>>>>>
>>>>> David
>>>>>
>>>>> On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I made a new webrev
>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/
>>>>>> differing from
>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>>>>> only in the comments.
>>>>>>
>>>>>> I removed
>>>>>> // Support ordering of "Independent Reads of Independent Writes".
>>>>>> everywhere, and edited the comments in the globalDefinition*.hpp
>>>>>> files.
>>>>>>
>>>>>> Best regards,
>>>>>> Goetz.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Dienstag, 21. Januar 2014 05:55
>>>>>> To: Lindenmaier, Goetz; Vladimir Kozlov
>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>> Independent Reads of Independent Writes
>>>>>>
>>>>>> Hi Goetz,
>>>>>>
>>>>>> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I tried to come up with a webrev that implements the change as
>>>>>>> proposed in
>>>>>>> your mails:
>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>>>>>>
>>>>>>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use
>>>>>>> support_IRIW_for_not_multiple_copy_atomic_cpu.
>>>>>>
>>>>>> Given the flag name the commentary eg:
>>>>>>
>>>>>> + // Support ordering of "Independent Reads of Independent
>>>>>> Writes".
>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
>>>>>>
>>>>>> seems somewhat redundant.
>>>>>>
>>>>>>> I left the definition and handling of _wrote_volatile in the
>>>>>>> code, without
>>>>>>> any protection.
>>>>>>
>>>>>> + bool _wrote_volatile; // Did we write a final field?
>>>>>>
>>>>>> s/final/volatile
>>>>>>
>>>>>>> I protected issuing the barrier for volatile in constructors with
>>>>>>> PPC64_ONLY() ,
>>>>>>> and put it on one line.
>>>>>>>
>>>>>>> I removed the comment in library_call.cpp.
>>>>>>> I also removed the sentence " Solution: implement volatile read
>>>>>>> as sync-load-acquire."
>>>>>>> from the comments as it's PPC specific.
>>>>>>
>>>>>> I think the primary IRIW comment/explanation should go in
>>>>>> globalDefinitions.hpp where
>>>>>> support_IRIW_for_not_multiple_copy_atomic_cpu is defined.
>>>>>>
>>>>>>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task,
>>>>>>> we will fix these
>>>>>>> issues in C1 if nobody did it by then.
>>>>>>
>>>>>> I've filed:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8032366
>>>>>>
>>>>>> "Implement C1 support for IRIW conformance on
>>>>>> non-multiple-copy-atomic
>>>>>> platforms"
>>>>>>
>>>>>> to cover this task, as it may be needed sooner rather than later.
>>>>>>
>>>>>>> Wrt. to performance: Oracle will soon do heavy testing of the
>>>>>>> port. If any
>>>>>>> performance problems arise, we still can add #ifdef PPC64 to
>>>>>>> circumvent this.
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Best regards,
>>>>>>> Goetz.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Donnerstag, 16. Januar 2014 10:05
>>>>>>> To: Vladimir Kozlov
>>>>>>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net';
>>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>> Independent Reads of Independent Writes
>>>>>>>
>>>>>>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote:
>>>>>>>> On 1/16/14 12:34 AM, David Holmes wrote:
>>>>>>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>>>>>>>>>> This is becoming ugly #ifdef mess. In compiler code we are
>>>>>>>>>> trying to
>>>>>>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef
>>>>>>>>>> and I
>>>>>>>>>> want to keep it this way, it could be useful to have such info
>>>>>>>>>> on other
>>>>>>>>>> platforms too. But I would suggest to remove PPC64 comments in
>>>>>>>>>> parse.hpp.
>>>>>>>>>>
>>>>>>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp
>>>>>>>>>> define a value
>>>>>>>>>> which could be checked in all places instead of #ifdef:
>>>>>>>>>
>>>>>>>>> I asked for the ifdef some time back as I find it much
>>>>>>>>> preferable to
>>>>>>>>> have this as a build-time construct rather than a
>>>>>>>>> runtime one. I don't want to have to pay anything for this if
>>>>>>>>> we don't
>>>>>>>>> use it.
>>>>>>>>
>>>>>>>> Any decent C++ compiler will optimize expressions with such
>>>>>>>> constants
>>>>>>>> defined in header files. I insist to avoid #ifdefs in C2 code. I
>>>>>>>> really
>>>>>>>> don't like the code with #ifdef in unsafe.cpp but I can live
>>>>>>>> with it.
>>>>>>>
>>>>>>> If you insist then we may as well do it all the same way. Better
>>>>>>> to be
>>>>>>> consistent.
>>>>>>>
>>>>>>> My apologies Goetz for wasting your time going back and forth on
>>>>>>> this.
>>>>>>>
>>>>>>> That aside I have a further concern with this IRIW support - it is
>>>>>>> incomplete as there is no C1 support, as PPC64 isn't using
>>>>>>> client. If
>>>>>>> this is going on then we (which probably means the Oracle 'we')
>>>>>>> need to
>>>>>>> add the missing C1 code.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>>>>>>>>>> #else
>>>>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>>>>>>>>>> #endif
>>>>>>>>>>
>>>>>>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>>>>>>>>>
>>>>>>>>>> and then:
>>>>>>>>>>
>>>>>>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>>>>>>>>>> oop p = JNIHandles::resolve(obj); \
>>>>>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>>>>>>>>>> OrderAccess::fence(); \
>>>>>>>>>> volatile type_name v =
>>>>>>>>>> OrderAccess::load_acquire((volatile
>>>>>>>>>> type_name*)index_oop_from_field_offset_long(p, offset));
>>>>>>>>>>
>>>>>>>>>> And:
>>>>>>>>>>
>>>>>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>>>>>>>>>> field->is_volatile()) {
>>>>>>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier
>>>>>>>>>> + }
>>>>>>>>>>
>>>>>>>>>> And so on. The comments will be needed only in
>>>>>>>>>> globalDefinitions.hpp
>>>>>>>>>>
>>>>>>>>>> The code in parse1.cpp could be put on one line:
>>>>>>>>>>
>>>>>>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>>>>>>>>>> method()->is_initializer()) )) {
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 1/15/14 9:25 PM, David Holmes wrote:
>>>>>>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>
>>>>>>>>>>>> I updated the webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>>>
>>>>>>>>>>>> - I removed the IRIW example in parse3.cpp
>>>>>>>>>>>> - I adapted the comments not to point to that comment, and to
>>>>>>>>>>>> reflect the new flagging. Also I mention that we
>>>>>>>>>>>> support the
>>>>>>>>>>>> volatile constructor issue, but that it's not standard.
>>>>>>>>>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>>>>>>>>>> I also think it's better to separate these this way.
>>>>>>>>>>>
>>>>>>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>>>>>>>>>> declaration and all uses to be guarded by ifdef PPC64 too
>>>>>>>>>>> please.
>>>>>>>>>>>
>>>>>>>>>>> One nit I missed before. In
>>>>>>>>>>> src/share/vm/opto/library_call.cpp this
>>>>>>>>>>> comment doesn't make much sense to me and refers to
>>>>>>>>>>> ppc specific stuff in a shared file:
>>>>>>>>>>>
>>>>>>>>>>> if (is_volatile) {
>>>>>>>>>>> ! if (!is_store) {
>>>>>>>>>>> insert_mem_bar(Op_MemBarAcquire);
>>>>>>>>>>> ! } else {
>>>>>>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>>>>>> ! // Changed volatiles/Unsafe: lwsync-store,
>>>>>>>>>>> sync-load-acquire.
>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile);
>>>>>>>>>>> + #endif
>>>>>>>>>>> + }
>>>>>>>>>>>
>>>>>>>>>>> I don't think the comment is needed.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your comments!
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>>>>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry for the delay in getting back to this.
>>>>>>>>>>>>
>>>>>>>>>>>> The general changes to the volatile barriers to support IRIW
>>>>>>>>>>>> are okay.
>>>>>>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this
>>>>>>>>>>>> (though more
>>>>>>>>>>>> specifically it is
>>>>>>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I
>>>>>>>>>>>> find much of
>>>>>>>>>>>> the commentary excessive, particularly for shared code. In
>>>>>>>>>>>> particular
>>>>>>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to
>>>>>>>>>>>> give the
>>>>>>>>>>>> explanation and I don't think we need it to that level of
>>>>>>>>>>>> detail.
>>>>>>>>>>>> Seems
>>>>>>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite
>>>>>>>>>>>> adequate.
>>>>>>>>>>>>
>>>>>>>>>>>> The changes related to volatile writes in the constructor, as
>>>>>>>>>>>> discussed
>>>>>>>>>>>> are not required by the Java Memory Model. If you want to
>>>>>>>>>>>> keep these
>>>>>>>>>>>> then I think they should all be guarded with PPC64 because
>>>>>>>>>>>> it is not
>>>>>>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being
>>>>>>>>>>>> made by the
>>>>>>>>>>>> PPC64 porters.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I updated this webrev. I detected a small flaw I made when
>>>>>>>>>>>>> editing
>>>>>>>>>>>>> this version.
>>>>>>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>>>>>>>>>> I also based the webrev on the latest version of the stage
>>>>>>>>>>>>> repo.
>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Lindenmaier, Goetz
>>>>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>>>>>>>>>> To: David Holmes
>>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> So we can at least undo #4 now we have established those
>>>>>>>>>>>>>> tests were
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> required to pass.
>>>>>>>>>>>>> We would prefer if we could keep this in. We want to avoid
>>>>>>>>>>>>> that it's
>>>>>>>>>>>>> blamed on the VM if java programs are failing on PPC after
>>>>>>>>>>>>> they
>>>>>>>>>>>>> worked
>>>>>>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I
>>>>>>>>>>>>> would guard
>>>>>>>>>>>>> it by
>>>>>>>>>>>>> a flag as proposed. But if you insist I will remove it.
>>>>>>>>>>>>> Also, this
>>>>>>>>>>>>> part is
>>>>>>>>>>>>> not that performance relevant.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> A compile-time guard (ifdef) would be better than a
>>>>>>>>>>>>>> runtime one I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>> I added a compile-time guard in this new webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>>>>>>>>>>>>> several double negations I don't like, (#ifNdef
>>>>>>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>>>>>>>>>> but this way I only have to change the ppc platform.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Goetz
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S.: I will also be available over the Christmas period.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for the delay, it takes a while to catch up after two
>>>>>>>>>>>>> weeks
>>>>>>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue
>>>>>>>>>>>>> to check
>>>>>>>>>>>>> emails.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ok, I understand the tests are wrong. It's good this
>>>>>>>>>>>>>> issue is
>>>>>>>>>>>>>> settled.
>>>>>>>>>>>>>> Thanks Aleksey and Andreas for going into the details of
>>>>>>>>>>>>>> the proof!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> About our change: David, the causality is the other way
>>>>>>>>>>>>>> round.
>>>>>>>>>>>>>> The change is about IRIW.
>>>>>>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is the part I still have some question marks over as the
>>>>>>>>>>>>> implications are not nice for performance on non-TSO
>>>>>>>>>>>>> platforms.
>>>>>>>>>>>>> But I'm
>>>>>>>>>>>>> no further along in processing that paper I'm afraid.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. If we do syncs before loads, we don't need to do them
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>> stores.
>>>>>>>>>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>>>>>>>>>> constructor tests.
>>>>>>>>>>>>>> 4. So finally we added them again at the end of the
>>>>>>>>>>>>>> constructor
>>>>>>>>>>>>>> after stores
>>>>>>>>>>>>>> to pass the volatile constructor tests.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So we can at least undo #4 now we have established those tests
>>>>>>>>>>>>> were not
>>>>>>>>>>>>> required to pass.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> We originally passed the constructor tests because the ppc
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>> order
>>>>>>>>>>>>>> instructions are not as find-granular as the
>>>>>>>>>>>>>> operations in the IR. MemBarVolatile is specified as
>>>>>>>>>>>>>> StoreLoad.
>>>>>>>>>>>>>> The only instruction
>>>>>>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does
>>>>>>>>>>>>>> StoreStore,
>>>>>>>>>>>>>> therefore the
>>>>>>>>>>>>>> MemBarVolatile after the store fixes the constructor
>>>>>>>>>>>>>> tests. The
>>>>>>>>>>>>>> proper representation
>>>>>>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore.
>>>>>>>>>>>>>> But now
>>>>>>>>>>>>>> it's pointless
>>>>>>>>>>>>>> anyways.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>>>>>>>>>> I'd be happy to add a property
>>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>>>>
>>>>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime
>>>>>>>>>>>>> one I
>>>>>>>>>>>>> think
>>>>>>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization
>>>>>>>>>>>>> (something semantic
>>>>>>>>>>>>> based not architecture based) as that will allows for
>>>>>>>>>>>>> turning this
>>>>>>>>>>>>> on/off for any architecture for testing purposes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> or the like to guard the customization. I'd like that
>>>>>>>>>>>>>> much better.
>>>>>>>>>>>>>> Or also
>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> TL;DR version:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>>>>>>>>>> constructor-barrier
>>>>>>>>>>>>>> for volatiles is not required as part of the JMM
>>>>>>>>>>>>>> specification. It
>>>>>>>>>>>>>> *may*
>>>>>>>>>>>>>> be required in an implementation that doesn't pre-zero
>>>>>>>>>>>>>> memory to
>>>>>>>>>>>>>> ensure
>>>>>>>>>>>>>> you can't see uninitialized fields. So the tests for this are
>>>>>>>>>>>>>> invalid
>>>>>>>>>>>>>> and this part of the patch is not needed in general (ppc64
>>>>>>>>>>>>>> may
>>>>>>>>>>>>>> need it
>>>>>>>>>>>>>> due to other factors).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Re: "multiple copy atomicity" - first thanks for
>>>>>>>>>>>>>> correcting the
>>>>>>>>>>>>>> term :)
>>>>>>>>>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "The memory system (perhaps involving a hierarchy of
>>>>>>>>>>>>>> buffers and a
>>>>>>>>>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>>>>>>>>>> visible to
>>>>>>>>>>>>>> all other hardware threads at the same time point; these
>>>>>>>>>>>>>> architectures
>>>>>>>>>>>>>> are not multiple-copy atomic."
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is the visibility issue that I referred to and
>>>>>>>>>>>>>> affects both
>>>>>>>>>>>>>> ARM and
>>>>>>>>>>>>>> PPC. But of course it is normally handled by using
>>>>>>>>>>>>>> suitable barriers
>>>>>>>>>>>>>> after the stores that need to be visible. I think the crux
>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>> current issue is what you wrote below:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > The fixes for the constructor issue are only
>>>>>>>>>>>>>> needed because we
>>>>>>>>>>>>>> > remove the sync instruction from behind stores
>>>>>>>>>>>>>> (parse3.cpp:320)
>>>>>>>>>>>>>> > and place it before loads.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do
>>>>>>>>>>>>>> the sync
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>> the store then you have to do something around the loads
>>>>>>>>>>>>>> to get the
>>>>>>>>>>>>>> same
>>>>>>>>>>>>>> results! I still don't know what lead you to the
>>>>>>>>>>>>>> conclusion that the
>>>>>>>>>>>>>> only way to fix the IRIW issue was to put the fence before
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> load -
>>>>>>>>>>>>>> maybe when I get the chance to read that paper in full it
>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>> clearer.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So ... the basic problem is that the current structure in
>>>>>>>>>>>>>> the VM has
>>>>>>>>>>>>>> hard-wired one choice of how to get the right semantics
>>>>>>>>>>>>>> for volatile
>>>>>>>>>>>>>> variables. You now want to customize that but not all the
>>>>>>>>>>>>>> requisite
>>>>>>>>>>>>>> hooks are present. It would be better if volatile_load and
>>>>>>>>>>>>>> volatile_store were factored out so that they could be
>>>>>>>>>>>>>> implemented as
>>>>>>>>>>>>>> desired per-platform. Alternatively there could be pre-
>>>>>>>>>>>>>> and post-
>>>>>>>>>>>>>> hooks
>>>>>>>>>>>>>> that could then be customized per platform. Otherwise you
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block
>>>>>>>>>>>>>> it. I think
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> is an area where a lot of clean up is needed in the VM.
>>>>>>>>>>>>>> The barrier
>>>>>>>>>>>>>> abstractions are a confused mess in my opinion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I updated the webrev to fix the issues mentioned by
>>>>>>>>>>>>>>> Vladimir:
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did not yet add the
>>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> WRT to the validity of the tests and the interpretation
>>>>>>>>>>>>>>> of the JMM
>>>>>>>>>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But we would like to pass the torture test suite as we
>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>> this a substantial task in implementing a PPC port. Also
>>>>>>>>>>>>>>> we think
>>>>>>>>>>>>>>> both tests show behavior a programmer would expect. It's
>>>>>>>>>>>>>>> bad if
>>>>>>>>>>>>>>> Java code runs fine on the more common x86 platform, and
>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>> fails on ppc. This will always first be blamed on the VM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The fixes for the constructor issue are only needed
>>>>>>>>>>>>>>> because we
>>>>>>>>>>>>>>> remove the sync instruction from behind stores
>>>>>>>>>>>>>>> (parse3.cpp:320)
>>>>>>>>>>>>>>> and place it before loads. Then there is no sync between
>>>>>>>>>>>>>>> volatile
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> and publishing the object. So we add it again in this
>>>>>>>>>>>>>>> one case
>>>>>>>>>>>>>>> (volatile store in constructor).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @David
>>>>>>>>>>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>>>>>>>>>>> taskqueue problem yet,
>>>>>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>>>>> It may have started a year ago but work on it has hardly
>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> continuous.
>>>>>>>>>>>>>>> That's not true, we did a lot of investigation and
>>>>>>>>>>>>>>> testing on this
>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>> And we came up with a solution we consider the best
>>>>>>>>>>>>>>> possible. If
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> have objections, you should at least give the draft of a
>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>> solution,
>>>>>>>>>>>>>>> we would volunteer to implement and test it.
>>>>>>>>>>>>>>> Similarly, we invested time in fixing the concurrency
>>>>>>>>>>>>>>> torture
>>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> @David
>>>>>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with
>>>>>>>>>>>>>>>> the term
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>>>>> We learned about this reading "A Tutorial Introduction to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> ARM and
>>>>>>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit
>>>>>>>>>>>>>>> Sarkar and
>>>>>>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>>>>>>>>>> Work-Stealing for
>>>>>>>>>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert
>>>>>>>>>>>>>>> Cohen
>>>>>>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>>>>>>>>>> taskqueue problem.
>>>>>>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was wrong in one thing, it's called multiple copy
>>>>>>>>>>>>>>> atomicity, I
>>>>>>>>>>>>>>> used 'read'
>>>>>>>>>>>>>>> instead. Sorry for that. (I also fixed that in the
>>>>>>>>>>>>>>> method name
>>>>>>>>>>>>>>> above).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards and thanks for all your involvements,
>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- Volatile in constuctor
>>>>>>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>>>>>>>>>> processors
>>>>>>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks
>>>>>>>>>>>>>>>> Aleksey!)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- IRIW issue
>>>>>>>>>>>>>>>>> I can not possibly answer to the necessary level of
>>>>>>>>>>>>>>>>> detail with
>>>>>>>>>>>>>>>>> a few
>>>>>>>>>>>>>>>>> moments thought.
>>>>>>>>>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>>>>>>>>>> taskqueue
>>>>>>>>>>>>>>>> problem yet,
>>>>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It may have started a year ago but work on it has hardly
>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>> continuous.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why
>>>>>>>>>>>>>>>>> ppc is so
>>>>>>>>>>>>>>>>> different?)
>>>>>>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'.
>>>>>>>>>>>>>>>> Therefore
>>>>>>>>>>>>>>>> I contributed a
>>>>>>>>>>>>>>>> solution with the #defines, and that's correct for all,
>>>>>>>>>>>>>>>> but not
>>>>>>>>>>>>>>>> nice, I admit.
>>>>>>>>>>>>>>>> (I don't really know about ARM, though).
>>>>>>>>>>>>>>>> So if I can write down a nicer solution testing for
>>>>>>>>>>>>>>>> methods that
>>>>>>>>>>>>>>>> are evaluated
>>>>>>>>>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> is that
>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>>> load
>>>>>>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>>>>>>>>>>>>> pass that test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>>>>>>>>> term and
>>>>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The JMM is fine. And
>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>> MemBarVolatile
>>>>>>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler
>>>>>>>>>>>>>>>> instructions
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> do what is required.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So if you are off soon, please let's come to a solution
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>>>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>>>>>>>>>> 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>>> Hi everybody,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Volatile fields written in constructor aren't
>>>>>>>>>>>>>>>>>> guaranteed by JMM
>>>>>>>>>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>>>>>>>>>> initializing
>>>>>>>>>>>>>>>>> a volatile field. Previously, we discussed this with
>>>>>>>>>>>>>>>>> Aleksey
>>>>>>>>>>>>>>>>> Shipilev
>>>>>>>>>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>>>>>>>>>> Also, concurrency torture tests
>>>>>>>>>>>>>>>>> LongVolatileTest
>>>>>>>>>>>>>>>>> AtomicIntegerInitialValueTest
>>>>>>>>>>>>>>>>> will fail.
>>>>>>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>>>>>>>>>> volatile field would be
>>>>>>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>>>>>>>>>> AtomicInteger.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>>>>>>>>>> volatiles do
>>>>>>>>>>>>>>>> not add anything special here. AFAIK there is nothing in
>>>>>>>>>>>>>>>> the JMM
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> requires the constructor barrier - discussions with Doug
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> Aleksey
>>>>>>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail
>>>>>>>>>>>>>>>> due to a
>>>>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely
>>>>>>>>>>>>>>>>>> heavyweight
>>>>>>>>>>>>>>>>> Yes, it costs measurable performance. But else it is
>>>>>>>>>>>>>>>>> wrong. We
>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>>>>>>>>>> OrderAccess operations
>>>>>>>>>>>>>>>>> Basically, I agree on this. But you also have to take
>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>> account
>>>>>>>>>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>>>>>>>>>> different platforms
>>>>>>>>>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>>>>>>>>>> An example:
>>>>>>>>>>>>>>>>> MemBarRelease // means LoadStore,
>>>>>>>>>>>>>>>>> StoreStore barrier
>>>>>>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>>>>>>>>>>>>> If these are consecutively in the code, sparc code
>>>>>>>>>>>>>>>>> looks like
>>>>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>>>>> MemBarRelease -->
>>>>>>>>>>>>>>>>> membar(Assembler::LoadStore |
>>>>>>>>>>>>>>>>> Assembler::StoreStore)
>>>>>>>>>>>>>>>>> MemBarVolatile -->
>>>>>>>>>>>>>>>>> membar(Assembler::StoreLoad)
>>>>>>>>>>>>>>>>> Just doing what is required.
>>>>>>>>>>>>>>>>> On Power, we get suboptimal code, as there are no
>>>>>>>>>>>>>>>>> comparable,
>>>>>>>>>>>>>>>>> fine grained operations:
>>>>>>>>>>>>>>>>> MemBarRelease --> lwsync // Doing
>>>>>>>>>>>>>>>>> LoadStore,
>>>>>>>>>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>>>>>>>>>> MemBarVolatile --> sync // // Doing
>>>>>>>>>>>>>>>>> LoadStore,
>>>>>>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC
>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>> are more (too) powerful,
>>>>>>>>>>>>>>>>> I need an additional optimization that removes the
>>>>>>>>>>>>>>>>> lwsync. I
>>>>>>>>>>>>>>>>> can not implement
>>>>>>>>>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Back to the IRIW problem. I think here we have a
>>>>>>>>>>>>>>>>> comparable
>>>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence()
>>>>>>>>>>>>>>>>> before the
>>>>>>>>>>>>>>>>> read
>>>>>>>>>>>>>>>>> is inefficient on platforms that have
>>>>>>>>>>>>>>>>> multiple-read-atomicity.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would propose to guard the code by
>>>>>>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>>>>>>>>>> Else, David, how would you propose to implement this
>>>>>>>>>>>>>>>>> platform
>>>>>>>>>>>>>>>>> independent?
>>>>>>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I can not possibly answer to the necessary level of
>>>>>>>>>>>>>>>> detail with a
>>>>>>>>>>>>>>>> few
>>>>>>>>>>>>>>>> moments thought. You are implying there is a problem
>>>>>>>>>>>>>>>> here that
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why
>>>>>>>>>>>>>>>> ppc is so
>>>>>>>>>>>>>>>> different?) and I can not take that on face value at the
>>>>>>>>>>>>>>>> moment. The
>>>>>>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>>>>>>>>>> requirements for
>>>>>>>>>>>>>>>> volatile accesses is if there are global visibility
>>>>>>>>>>>>>>>> issues that
>>>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>>>> addressed - but even then I would expect heavy barriers
>>>>>>>>>>>>>>>> at the
>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>> would deal with that, not at the load. (This situation
>>>>>>>>>>>>>>>> reminds me
>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>> need for read-barriers on Alpha architecture due to the
>>>>>>>>>>>>>>>> use of
>>>>>>>>>>>>>>>> software
>>>>>>>>>>>>>>>> cache-coherency rather than hardware cache-coherency -
>>>>>>>>>>>>>>>> but we
>>>>>>>>>>>>>>>> don't have
>>>>>>>>>>>>>>>> that on ppc!)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sorry - There is no quick resolution here and in a
>>>>>>>>>>>>>>>> couple of days
>>>>>>>>>>>>>>>> I will
>>>>>>>>>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -- Other ports:
>>>>>>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be
>>>>>>>>>>>>>>>>> relevant, so
>>>>>>>>>>>>>>>>> it might
>>>>>>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY
>>>>>>>>>>>>>>>>> instead
>>>>>>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get
>>>>>>>>>>>>>>>>> rid of
>>>>>>>>>>>>>>>>> them).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>>>>>>>>>> I agree we should not change it in the ppc port. If
>>>>>>>>>>>>>>>>> you wish, I
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211):
>>>>>>>>>>>>>>>>> ordering of
>>>>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Okay this is my second attempt at answering this in a
>>>>>>>>>>>>>>>>> reasonable
>>>>>>>>>>>>>>>>> way :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> From what I understand what we see here is
>>>>>>>>>>>>>>>>> an attempt to
>>>>>>>>>>>>>>>>> fix an
>>>>>>>>>>>>>>>>> existing issue with the implementation of volatiles so
>>>>>>>>>>>>>>>>> that the
>>>>>>>>>>>>>>>>> IRIW
>>>>>>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64
>>>>>>>>>>>>>>>>> is to make
>>>>>>>>>>>>>>>>> volatile reads extremely heavyweight by adding a
>>>>>>>>>>>>>>>>> fence() when
>>>>>>>>>>>>>>>>> doing the
>>>>>>>>>>>>>>>>> load.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I
>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>>>> though!). But I do not agree with the changes to the
>>>>>>>>>>>>>>>>> shared code
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> allow this solution to be implemented - even with
>>>>>>>>>>>>>>>>> PPC64_ONLY
>>>>>>>>>>>>>>>>> this is
>>>>>>>>>>>>>>>>> polluting the shared code. My concern is similar to
>>>>>>>>>>>>>>>>> what I said
>>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>>> taskQueue changes - these algorithms should be
>>>>>>>>>>>>>>>>> expressed using
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired
>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>> independent of architecture. If such a "barrier" is not
>>>>>>>>>>>>>>>>> needed
>>>>>>>>>>>>>>>>> on a
>>>>>>>>>>>>>>>>> given architecture then the implementation in
>>>>>>>>>>>>>>>>> OrderAccess should
>>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>>> to a no-op.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And as Vitaly points out the constructor barriers are
>>>>>>>>>>>>>>>>> not needed
>>>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>>>> the JMM.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am fine with suggested changes because you did not
>>>>>>>>>>>>>>>>>> change our
>>>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>>> code for our platforms (please, do not change
>>>>>>>>>>>>>>>>>> do_exits() now).
>>>>>>>>>>>>>>>>>> But may be it should be done using more general query
>>>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>> is set
>>>>>>>>>>>>>>>>>> depending on platform:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Every platform has to support IRIW this is simply part
>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>> Java
>>>>>>>>>>>>>>>>> Memory Model, there should not be any need to call this
>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>> explicitly
>>>>>>>>>>>>>>>>> like this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there some subtlety of the hardware I am missing
>>>>>>>>>>>>>>>>> here? Are
>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> visibility issues beyond the ordering constraints that
>>>>>>>>>>>>>>>>> the JMM
>>>>>>>>>>>>>>>>> defines?
>>>>>>>>>>>>>>>>>> From what I understand our ppc port is
>>>>>>>>>>>>>>>>>> also affected.
>>>>>>>>>>>>>>>>>> David?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list -
>>>>>>>>>>>>>>>>> sorry.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>>>>>>>>>> inside else {}.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>>>>>>>>>> specific which
>>>>>>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not
>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many
>>>>>>>>>>>>>>>>>> #ifdefs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> if (is_vol) {
>>>>>>>>>>>>>>>>>> // See comment in do_get_xxx().
>>>>>>>>>>>>>>>>>> #ifndef PPC64
>>>>>>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); //
>>>>>>>>>>>>>>>>>> Use fat membar
>>>>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>>>>> if (is_field) {
>>>>>>>>>>>>>>>>>> // Add MemBarRelease for
>>>>>>>>>>>>>>>>>> constructors which write
>>>>>>>>>>>>>>>>>> volatile field
>>>>>>>>>>>>>>>>>> (PPC64).
>>>>>>>>>>>>>>>>>> set_wrote_volatile(true);
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>>>>>>>>>> the torture test suite:
>>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>>>>>>>>>> __________ __________
>>>>>>>>>>>>>>>>>>> __________ __________
>>>>>>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> write(x=1) read(x)
>>>>>>>>>>>>>>>>>>> write(y=1) read(y)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> read(y) read(x)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Solution: This example requires
>>>>>>>>>>>>>>>>>>> multiple-copy-atomicity. This
>>>>>>>>>>>>>>>>>>> is only
>>>>>>>>>>>>>>>>>>> assured by the sync instruction and if it is executed
>>>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile
>>>>>>>>>>>>>>>>>>> store.
>>>>>>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple
>>>>>>>>>>>>>>>>>>> SIGSEGVs
>>>>>>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Further this change contains a fix that assures that
>>>>>>>>>>>>>>>>>>> volatile
>>>>>>>>>>>>>>>>>>> fields
>>>>>>>>>>>>>>>>>>> written in constructors are visible before the
>>>>>>>>>>>>>>>>>>> reference gets
>>>>>>>>>>>>>>>>>>> published.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to
>>>>>>>>>>>>>>>>>>> us,
>>>>>>>>>>>>>>>>>>> seems too
>>>>>>>>>>>>>>>>>>> strong.
>>>>>>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> suffice.
>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>>>>>
More information about the ppc-aix-port-dev
mailing list