RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes

Volker Simonis volker.simonis at gmail.com
Wed Jan 22 09:42:18 PST 2014


On Wed, Jan 22, 2014 at 6:02 PM, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> Hi Goetz
>
>
> On 1/22/14 1:20 AM, Lindenmaier, Goetz wrote:
>>
>> Hi Vladimir,
>>
>> Thanks for testing and pushing!
>>
>> Will you push this also to stage?  I assume we can handle this
>> as the other three hotspot changes, without a new bug-id?
>
>
> Yes, I will backport it.
>
> What about JDK changes Volker pushed: 8028537, 8031134, 8031997 and new one
> from today 8031581?
> Should I backport all of them into 8u stage? From conversion between Volker
> and Alan some of them need backport a fix from jdk9. Or I am mistaking?
>

It would be great if you could push 8028537, 8031134, 8031997 and
8031581 into 8u stage.

There is no real source-code level dependency between these changes
and the change "7133499: (fc) FileChannel.read not preempted by
asynchronous close on OS X)" which is still to be pushed by Alan to
jdk9. However the AIX-port will automatically benefit from 7133499
once it will be pushed.

>
>>
>> Also, when do you think we (you unfortunately) should update
>> the repos again?  Stage-9 maybe after Volkers last change is submitted?
>
>
> After I test and push 8031581 I will do sync with latest jdk9 sources (b01).

It would be great if you could push the change from:

http://cr.openjdk.java.net/~simonis/webrevs/8031581_3

It is exactly the same like the one from
http://cr.openjdk.java.net/~simonis/webrevs/8031581_2 with only one
comment line changed to fix the typo "legel"->"legal" in
src/share/native/java/util/zip/zip_util.c .

Thanks a lot,
Volker

> I will build bundles and give them to SQE for final testing.
>
> Thanks,
> Vladimir
>
>
>>
>> Best regards,
>>    Goetz
>>
>>
>>
>> -----Original Message-----
>> From: hotspot-dev-bounces at openjdk.java.net
>> [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
>> Sent: Dienstag, 21. Januar 2014 21:00
>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent
>> Reads of Independent Writes
>>
>> Thanks. I am pushing it.
>>
>> Vladimir
>>
>> On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote:
>>>
>>> Sorry, I missed that.  fixed.
>>>
>>> Best regards,
>>>     Goetz.
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Dienstag, 21. Januar 2014 12:53
>>> To: Lindenmaier, Goetz; Vladimir Kozlov
>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent
>>> Reads of Independent Writes
>>>
>>> Thanks Goetz!
>>>
>>> This typo still exists:
>>>
>>> +   bool          _wrote_volatile;     // Did we write a final field?
>>>
>>> s/final/volatile/
>>>
>>> Otherwise no further comments from me.
>>>
>>> David
>>>
>>> On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote:
>>>>
>>>> Hi,
>>>>
>>>> I made a new webrev
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/
>>>> differing from
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>>> only in the comments.
>>>>
>>>> I removed
>>>> // Support ordering of "Independent Reads of Independent Writes".
>>>> everywhere, and edited the comments in the globalDefinition*.hpp
>>>> files.
>>>>
>>>> Best regards,
>>>>      Goetz.
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Dienstag, 21. Januar 2014 05:55
>>>> To: Lindenmaier, Goetz; Vladimir Kozlov
>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent
>>>> Reads of Independent Writes
>>>>
>>>> Hi Goetz,
>>>>
>>>> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I tried to come up with a webrev that implements the change as proposed
>>>>> in
>>>>> your mails:
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>>>>
>>>>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use
>>>>> support_IRIW_for_not_multiple_copy_atomic_cpu.
>>>>
>>>>
>>>> Given the flag name the commentary eg:
>>>>
>>>> +       // Support ordering of "Independent Reads of Independent
>>>> Writes".
>>>> +       if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
>>>>
>>>> seems somewhat redundant.
>>>>
>>>>> I left the definition and handling of _wrote_volatile in the code,
>>>>> without
>>>>> any protection.
>>>>
>>>>
>>>> +   bool          _wrote_volatile;     // Did we write a final field?
>>>>
>>>> s/final/volatile
>>>>
>>>>> I protected issuing the barrier for volatile in constructors with
>>>>> PPC64_ONLY() ,
>>>>> and put it on one line.
>>>>>
>>>>> I removed the comment in library_call.cpp.
>>>>> I also removed the sentence " Solution: implement volatile read as
>>>>> sync-load-acquire."
>>>>> from the comments as it's PPC specific.
>>>>
>>>>
>>>> I think the primary IRIW comment/explanation should go in
>>>> globalDefinitions.hpp where
>>>> support_IRIW_for_not_multiple_copy_atomic_cpu is defined.
>>>>
>>>>> Wrt. to C1: we plan to port C1 to PPC64, too.  During that task, we
>>>>> will fix these
>>>>> issues in C1 if nobody did it by then.
>>>>
>>>>
>>>> I've filed:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8032366
>>>>
>>>> "Implement C1 support for IRIW conformance on non-multiple-copy-atomic
>>>> platforms"
>>>>
>>>> to cover this task, as it may be needed sooner rather than later.
>>>>
>>>>> Wrt. to performance: Oracle will soon do heavy testing of the port.  If
>>>>> any
>>>>> performance problems arise, we still can add #ifdef PPC64 to circumvent
>>>>> this.
>>>>
>>>>
>>>> Ok.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Best regards,
>>>>>       Goetz.
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Donnerstag, 16. Januar 2014 10:05
>>>>> To: Vladimir Kozlov
>>>>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net';
>>>>> 'hotspot-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent
>>>>> Reads of Independent Writes
>>>>>
>>>>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote:
>>>>>>
>>>>>> On 1/16/14 12:34 AM, David Holmes wrote:
>>>>>>>
>>>>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>>>>>>>>
>>>>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to
>>>>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I
>>>>>>>> want to keep it this way, it could be useful to have such info on
>>>>>>>> other
>>>>>>>> platforms too. But I would suggest to remove PPC64 comments in
>>>>>>>> parse.hpp.
>>>>>>>>
>>>>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a
>>>>>>>> value
>>>>>>>> which could be checked in all places instead of #ifdef:
>>>>>>>
>>>>>>>
>>>>>>> I asked for the ifdef some time back as I find it much preferable to
>>>>>>> have this as a build-time construct rather than a
>>>>>>> runtime one. I don't want to have to pay anything for this if we
>>>>>>> don't
>>>>>>> use it.
>>>>>>
>>>>>>
>>>>>> Any decent C++ compiler will optimize expressions with such constants
>>>>>> defined in header files. I insist to avoid #ifdefs in C2 code. I
>>>>>> really
>>>>>> don't like the code with #ifdef in unsafe.cpp but I can live with it.
>>>>>
>>>>>
>>>>> If you insist then we may as well do it all the same way. Better to be
>>>>> consistent.
>>>>>
>>>>> My apologies Goetz for wasting your time going back and forth on this.
>>>>>
>>>>> That aside I have a further concern with this IRIW support - it is
>>>>> incomplete as there is no C1 support, as PPC64 isn't using client. If
>>>>> this is going on then we (which probably means the Oracle 'we') need to
>>>>> add the missing C1 code.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>>>>>>>> #else
>>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>>>>>>>> #endif
>>>>>>>>
>>>>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>>>>>>>
>>>>>>>> and then:
>>>>>>>>
>>>>>>>>       #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>>>>>>>>         oop p = JNIHandles::resolve(obj); \
>>>>>>>> +  if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>>>>>>>> OrderAccess::fence(); \
>>>>>>>>         volatile type_name v = OrderAccess::load_acquire((volatile
>>>>>>>> type_name*)index_oop_from_field_offset_long(p, offset));
>>>>>>>>
>>>>>>>> And:
>>>>>>>>
>>>>>>>> +  if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>>>>>>>> field->is_volatile()) {
>>>>>>>> +    insert_mem_bar(Op_MemBarVolatile);   // StoreLoad barrier
>>>>>>>> +  }
>>>>>>>>
>>>>>>>> And so on. The comments will be needed only in globalDefinitions.hpp
>>>>>>>>
>>>>>>>> The code in parse1.cpp could be put on one line:
>>>>>>>>
>>>>>>>> +  if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>>>>>>>> method()->is_initializer()) )) {
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 1/15/14 9:25 PM, David Holmes wrote:
>>>>>>>>>
>>>>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> I updated the webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>
>>>>>>>>>> - I removed the IRIW example in parse3.cpp
>>>>>>>>>> - I adapted the comments not to point to that comment, and to
>>>>>>>>>>        reflect the new flagging.  Also I mention that we support
>>>>>>>>>> the
>>>>>>>>>>        volatile constructor issue, but that it's not standard.
>>>>>>>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>>>>>>>>        I also think it's better to separate these this way.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>>>>>>>> declaration and all uses to be guarded by ifdef PPC64 too
>>>>>>>>> please.
>>>>>>>>>
>>>>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>>>>>>>>> comment doesn't make much sense to me and refers to
>>>>>>>>> ppc specific stuff in a shared file:
>>>>>>>>>
>>>>>>>>>          if (is_volatile) {
>>>>>>>>> !     if (!is_store) {
>>>>>>>>>              insert_mem_bar(Op_MemBarAcquire);
>>>>>>>>> !     } else {
>>>>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>>>> !       // Changed volatiles/Unsafe: lwsync-store,
>>>>>>>>> sync-load-acquire.
>>>>>>>>>              insert_mem_bar(Op_MemBarVolatile);
>>>>>>>>> + #endif
>>>>>>>>> +     }
>>>>>>>>>
>>>>>>>>> I don't think the comment is needed.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> Thanks for your comments!
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>        Goetz.
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>
>>>>>>>>>> Hi Goetz,
>>>>>>>>>>
>>>>>>>>>> Sorry for the delay in getting back to this.
>>>>>>>>>>
>>>>>>>>>> The general changes to the volatile barriers to support IRIW are
>>>>>>>>>> okay.
>>>>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though
>>>>>>>>>> more
>>>>>>>>>> specifically it is
>>>>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much
>>>>>>>>>> of
>>>>>>>>>> the commentary excessive, particularly for shared code. In
>>>>>>>>>> particular
>>>>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give
>>>>>>>>>> the
>>>>>>>>>> explanation and I don't think we need it to that level of detail.
>>>>>>>>>> Seems
>>>>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite
>>>>>>>>>> adequate.
>>>>>>>>>>
>>>>>>>>>> The changes related to volatile writes in the constructor, as
>>>>>>>>>> discussed
>>>>>>>>>> are not required by the Java Memory Model. If you want to keep
>>>>>>>>>> these
>>>>>>>>>> then I think they should all be guarded with PPC64 because it is
>>>>>>>>>> not
>>>>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by
>>>>>>>>>> the
>>>>>>>>>> PPC64 porters.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I updated this webrev.  I detected a small flaw I made when
>>>>>>>>>>> editing
>>>>>>>>>>> this version.
>>>>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>>>>>>>> I also based the webrev on the latest version of the stage repo.
>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>         Goetz.
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Lindenmaier, Goetz
>>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>>>>>>>> To: David Holmes
>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>>> So we can at least undo #4 now we have established those tests
>>>>>>>>>>>> were
>>>>>>>>>>>> not
>>>>>>>>>>>> required to pass.
>>>>>>>>>>>
>>>>>>>>>>> We would prefer if we could keep this in.  We want to avoid that
>>>>>>>>>>> it's
>>>>>>>>>>> blamed on the VM if java programs are failing on PPC after they
>>>>>>>>>>> worked
>>>>>>>>>>> on x86.  To clearly mark it as overfulfilling the spec I would
>>>>>>>>>>> guard
>>>>>>>>>>> it by
>>>>>>>>>>> a flag as proposed.  But if you insist I will remove it.  Also,
>>>>>>>>>>> this
>>>>>>>>>>> part is
>>>>>>>>>>> not that performance relevant.
>>>>>>>>>>>
>>>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one
>>>>>>>>>>>> I
>>>>>>>>>>>> think
>>>>>>>>>>>
>>>>>>>>>>> I added a compile-time guard in this new webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC.  This introduces
>>>>>>>>>>> several double negations I don't like, (#ifNdef
>>>>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>>>>>>>> but this way I only have to change the ppc platform.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>>         Goetz
>>>>>>>>>>>
>>>>>>>>>>> P.S.:  I will also be available over the Christmas period.
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to
>>>>>>>>>>> check
>>>>>>>>>>> emails.
>>>>>>>>>>>
>>>>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> ok, I understand the tests are wrong.  It's good this issue is
>>>>>>>>>>>> settled.
>>>>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the
>>>>>>>>>>>> proof!
>>>>>>>>>>>>
>>>>>>>>>>>> About our change:  David, the causality is the other way round.
>>>>>>>>>>>> The change is about IRIW.
>>>>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This is the part I still have some question marks over as the
>>>>>>>>>>> implications are not nice for performance on non-TSO platforms.
>>>>>>>>>>> But I'm
>>>>>>>>>>> no further along in processing that paper I'm afraid.
>>>>>>>>>>>
>>>>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after
>>>>>>>>>>>> stores.
>>>>>>>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>>>>>>>> constructor tests.
>>>>>>>>>>>> 4. So finally we added them again at the end of the constructor
>>>>>>>>>>>> after stores
>>>>>>>>>>>>             to pass the volatile constructor tests.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So we can at least undo #4 now we have established those tests
>>>>>>>>>>> were not
>>>>>>>>>>> required to pass.
>>>>>>>>>>>
>>>>>>>>>>>> We originally passed the constructor tests because the ppc
>>>>>>>>>>>> memory
>>>>>>>>>>>> order
>>>>>>>>>>>> instructions are not as find-granular as the
>>>>>>>>>>>> operations in the IR.  MemBarVolatile is specified as StoreLoad.
>>>>>>>>>>>> The only instruction
>>>>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does
>>>>>>>>>>>> StoreStore,
>>>>>>>>>>>> therefore the
>>>>>>>>>>>> MemBarVolatile after the store fixes the constructor tests.  The
>>>>>>>>>>>> proper representation
>>>>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore.  But
>>>>>>>>>>>> now
>>>>>>>>>>>> it's pointless
>>>>>>>>>>>> anyways.
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd be happy to add a property
>>>>>>>>>>>>          OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>>>>> think
>>>>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something
>>>>>>>>>>> semantic
>>>>>>>>>>> based not architecture based) as that will allows for turning
>>>>>>>>>>> this
>>>>>>>>>>> on/off for any architecture for testing purposes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> or the like to guard the customization.  I'd like that much
>>>>>>>>>>>> better.
>>>>>>>>>>>> Or also
>>>>>>>>>>>>           OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>           VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>          Goetz.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>
>>>>>>>>>>>> TL;DR version:
>>>>>>>>>>>>
>>>>>>>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>>>>>>>> constructor-barrier
>>>>>>>>>>>> for volatiles is not required as part of the JMM specification.
>>>>>>>>>>>> It
>>>>>>>>>>>> *may*
>>>>>>>>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>>>>>>>>> ensure
>>>>>>>>>>>> you can't see uninitialized fields. So the tests for this are
>>>>>>>>>>>> invalid
>>>>>>>>>>>> and this part of the patch is not needed in general (ppc64 may
>>>>>>>>>>>> need it
>>>>>>>>>>>> due to other factors).
>>>>>>>>>>>>
>>>>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>>>>>>>>> term :)
>>>>>>>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>>>>>>>
>>>>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and
>>>>>>>>>>>> a
>>>>>>>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>>>>>>>> visible to
>>>>>>>>>>>> all other hardware threads at the same time point; these
>>>>>>>>>>>> architectures
>>>>>>>>>>>> are not multiple-copy atomic."
>>>>>>>>>>>>
>>>>>>>>>>>> This is the visibility issue that I referred to and affects both
>>>>>>>>>>>> ARM and
>>>>>>>>>>>> PPC. But of course it is normally handled by using suitable
>>>>>>>>>>>> barriers
>>>>>>>>>>>> after the stores that need to be visible. I think the crux of
>>>>>>>>>>>> the
>>>>>>>>>>>> current issue is what you wrote below:
>>>>>>>>>>>>
>>>>>>>>>>>>         > The fixes for the constructor issue are only needed
>>>>>>>>>>>> because we
>>>>>>>>>>>>         > remove the sync instruction from behind stores
>>>>>>>>>>>> (parse3.cpp:320)
>>>>>>>>>>>>         > and place it before loads.
>>>>>>>>>>>>
>>>>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync
>>>>>>>>>>>> after
>>>>>>>>>>>> the store then you have to do something around the loads to get
>>>>>>>>>>>> the
>>>>>>>>>>>> same
>>>>>>>>>>>> results! I still don't know what lead you to the conclusion that
>>>>>>>>>>>> the
>>>>>>>>>>>> only way to fix the IRIW issue was to put the fence before the
>>>>>>>>>>>> load -
>>>>>>>>>>>> maybe when I get the chance to read that paper in full it will
>>>>>>>>>>>> be
>>>>>>>>>>>> clearer.
>>>>>>>>>>>>
>>>>>>>>>>>> So ... the basic problem is that the current structure in the VM
>>>>>>>>>>>> has
>>>>>>>>>>>> hard-wired one choice of how to get the right semantics for
>>>>>>>>>>>> volatile
>>>>>>>>>>>> variables. You now want to customize that but not all the
>>>>>>>>>>>> requisite
>>>>>>>>>>>> hooks are present. It would be better if volatile_load and
>>>>>>>>>>>> volatile_store were factored out so that they could be
>>>>>>>>>>>> implemented as
>>>>>>>>>>>> desired per-platform. Alternatively there could be pre- and
>>>>>>>>>>>> post-
>>>>>>>>>>>> hooks
>>>>>>>>>>>> that could then be customized per platform. Otherwise you need
>>>>>>>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I
>>>>>>>>>>>> think
>>>>>>>>>>>> this
>>>>>>>>>>>> is an area where a lot of clean up is needed in the VM. The
>>>>>>>>>>>> barrier
>>>>>>>>>>>> abstractions are a confused mess in my opinion.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did not yet add the
>>>>>>>>>>>>>            OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>            VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>> or
>>>>>>>>>>>>>           OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> WRT to the validity of the tests and the interpretation of the
>>>>>>>>>>>>> JMM
>>>>>>>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But we would like to pass the torture test suite as we consider
>>>>>>>>>>>>> this a substantial task in implementing a PPC port.  Also we
>>>>>>>>>>>>> think
>>>>>>>>>>>>> both tests show behavior a programmer would expect.  It's bad
>>>>>>>>>>>>> if
>>>>>>>>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>>>>>>>>> fails on ppc.  This will always first be blamed on the VM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The fixes for the constructor issue are only needed because we
>>>>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>>>>>>>>> and place it before loads.  Then there is no sync between
>>>>>>>>>>>>> volatile
>>>>>>>>>>>>> store
>>>>>>>>>>>>> and publishing the object.  So we add it again in this one case
>>>>>>>>>>>>> (volatile store in constructor).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> @David
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sure.  There also is no solution as you require for the
>>>>>>>>>>>>>>> taskqueue problem yet,
>>>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>>>>>> continuous.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's not true, we did a lot of investigation and testing on
>>>>>>>>>>>>> this
>>>>>>>>>>>>> issue.
>>>>>>>>>>>>> And we came up with a solution we consider the best possible.
>>>>>>>>>>>>> If
>>>>>>>>>>>>> you
>>>>>>>>>>>>> have objections, you should at least give the draft of a better
>>>>>>>>>>>>> solution,
>>>>>>>>>>>>> we would volunteer to implement and test it.
>>>>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture
>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>
>>>>>>>>>>>>> @David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>>>>>>>> term
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the
>>>>>>>>>>>>> ARM and
>>>>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>>>>>>>> Work-Stealing for
>>>>>>>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>>>>>>>> taskqueue problem.
>>>>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>>>>>>>
>>>>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity,
>>>>>>>>>>>>> I
>>>>>>>>>>>>> used 'read'
>>>>>>>>>>>>> instead.  Sorry for that.  (I also fixed that in the method
>>>>>>>>>>>>> name
>>>>>>>>>>>>> above).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards and thanks for all your involvements,
>>>>>>>>>>>>>           Goetz.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Volatile in constuctor
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>>>>>>>> processors
>>>>>>>>>>>>>> and are Power 5-7.  But see also Aleksey's mail.  (Thanks
>>>>>>>>>>>>>> Aleksey!)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- IRIW issue
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> a few
>>>>>>>>>>>>>>> moments thought.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sure.  There also is no solution as you require for the
>>>>>>>>>>>>>> taskqueue
>>>>>>>>>>>>>> problem yet,
>>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>>>>> continuous.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is
>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>> different?)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'.
>>>>>>>>>>>>>> Therefore
>>>>>>>>>>>>>> I contributed a
>>>>>>>>>>>>>> solution with the #defines, and that's correct for all, but
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> nice, I admit.
>>>>>>>>>>>>>> (I don't really know about ARM, though).
>>>>>>>>>>>>>> So if I can write down a nicer solution testing for methods
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> are evaluated
>>>>>>>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the
>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>> is that
>>>>>>>>>>>>>>            store
>>>>>>>>>>>>>>            sync
>>>>>>>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>            sync
>>>>>>>>>>>>>>            load
>>>>>>>>>>>>>> does so on PPC.  And you require multiple-read-atomicity to
>>>>>>>>>>>>>> pass that test.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>>>>>>> term and
>>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>            The JMM is fine. And
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>            store
>>>>>>>>>>>>>>            MemBarVolatile
>>>>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler
>>>>>>>>>>>>>> instructions
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> do what is required.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>            Goetz.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>>>>>>>> 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi everybody,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by
>>>>>>>>>>>>>>>> JMM
>>>>>>>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>>>>>>>> initializing
>>>>>>>>>>>>>>> a volatile field.  Previously, we discussed this with Aleksey
>>>>>>>>>>>>>>> Shipilev
>>>>>>>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>>>>>>>> Also, concurrency torture tests
>>>>>>>>>>>>>>>              LongVolatileTest
>>>>>>>>>>>>>>>              AtomicIntegerInitialValueTest
>>>>>>>>>>>>>>> will fail.
>>>>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>>>>>>>> volatile field would be
>>>>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>>>>>>>> AtomicInteger.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>>>>>>>> volatiles do
>>>>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the
>>>>>>>>>>>>>> JMM
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> requires the constructor barrier - discussions with Doug and
>>>>>>>>>>>>>> Aleksey
>>>>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due
>>>>>>>>>>>>>> to a
>>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely
>>>>>>>>>>>>>>>> heavyweight
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, it costs measurable performance.  But else it is wrong.
>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>>>>>>>> OrderAccess operations
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Basically, I agree on this.  But you also have to take into
>>>>>>>>>>>>>>> account
>>>>>>>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>>>>>>>> different platforms
>>>>>>>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>>>>>>>> An example:
>>>>>>>>>>>>>>>             MemBarRelease    // means LoadStore, StoreStore
>>>>>>>>>>>>>>> barrier
>>>>>>>>>>>>>>>             MemBarVolatile    // means  StoreLoad barrier
>>>>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like
>>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>>>             MemBarRelease  --> membar(Assembler::LoadStore |
>>>>>>>>>>>>>>> Assembler::StoreStore)
>>>>>>>>>>>>>>>             MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>>>>>>>>> Just doing what is required.
>>>>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>>>>>>>>> fine grained operations:
>>>>>>>>>>>>>>>              MemBarRelease --> lwsync   // Doing LoadStore,
>>>>>>>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>>>>>>>>             MemBarVolatile --> sync    // // Doing LoadStore,
>>>>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>>>>>>>> obviously, the lwsync is superfluous.  Thus, as PPC
>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>> are more (too) powerful,
>>>>>>>>>>>>>>> I need an additional optimization that removes the lwsync.  I
>>>>>>>>>>>>>>> can not implement
>>>>>>>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Back to the IRIW problem.  I think here we have a comparable
>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> read
>>>>>>>>>>>>>>> is inefficient on platforms that have
>>>>>>>>>>>>>>> multiple-read-atomicity.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would propose to guard the code by
>>>>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>>>>>>>>> independent?
>>>>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail
>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>> few
>>>>>>>>>>>>>> moments thought. You are implying there is a problem here that
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is
>>>>>>>>>>>>>> so
>>>>>>>>>>>>>> different?) and I can not take that on face value at the
>>>>>>>>>>>>>> moment. The
>>>>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>>>>>>>> requirements for
>>>>>>>>>>>>>> volatile accesses is if there are global visibility issues
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the
>>>>>>>>>>>>>> store
>>>>>>>>>>>>>> would deal with that, not at the load. (This situation reminds
>>>>>>>>>>>>>> me
>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>>>>>>>>> software
>>>>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>>>>>>>>> don't have
>>>>>>>>>>>>>> that on ppc!)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of
>>>>>>>>>>>>>> days
>>>>>>>>>>>>>> I will
>>>>>>>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>             Goetz.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Other ports:
>>>>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant,
>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>> it might
>>>>>>>>>>>>>>> not happen on small machines.  But I can use PPC_ONLY instead
>>>>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of
>>>>>>>>>>>>>>> them).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>>>>>>>> I agree we should not change it in the ppc port.  If you
>>>>>>>>>>>>>>> wish, I
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Okay this is my second attempt at answering this in a
>>>>>>>>>>>>>>> reasonable
>>>>>>>>>>>>>>> way :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>            From what I understand what we see here is an
>>>>>>>>>>>>>>> attempt to
>>>>>>>>>>>>>>> fix an
>>>>>>>>>>>>>>> existing issue with the implementation of volatiles so that
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> IRIW
>>>>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>>>>>>>>> doing the
>>>>>>>>>>>>>>> load.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I
>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>> though!). But I do not agree with the changes to the shared
>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>>>>>>>>> this is
>>>>>>>>>>>>>>> polluting the shared code. My concern is similar to what I
>>>>>>>>>>>>>>> said
>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired
>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>> independent of architecture. If such a "barrier" is not
>>>>>>>>>>>>>>> needed
>>>>>>>>>>>>>>> on a
>>>>>>>>>>>>>>> given architecture then the implementation in OrderAccess
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> to a no-op.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not
>>>>>>>>>>>>>>> needed
>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>> the JMM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am fine with suggested changes because you did not change
>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>> code for our platforms (please, do not change do_exits()
>>>>>>>>>>>>>>>> now).
>>>>>>>>>>>>>>>> But may be it should be done using more general query which
>>>>>>>>>>>>>>>> is set
>>>>>>>>>>>>>>>> depending on platform:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the
>>>>>>>>>>>>>>> Java
>>>>>>>>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>>>>>>>>> explicitly
>>>>>>>>>>>>>>> like this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> visibility issues beyond the ordering constraints that the
>>>>>>>>>>>>>>> JMM
>>>>>>>>>>>>>>> defines?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>            From what I understand our ppc port is also
>>>>>>>>>>>>>>>> affected.
>>>>>>>>>>>>>>>> David?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>>>>>>>> inside  else {}.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>>>>>>>> specific which
>>>>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many
>>>>>>>>>>>>>>>> #ifdefs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>               if (is_vol) {
>>>>>>>>>>>>>>>>                 // See comment in do_get_xxx().
>>>>>>>>>>>>>>>> #ifndef PPC64
>>>>>>>>>>>>>>>>                 insert_mem_bar(Op_MemBarVolatile); // Use
>>>>>>>>>>>>>>>> fat membar
>>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>>>                 if (is_field) {
>>>>>>>>>>>>>>>>                   // Add MemBarRelease for constructors
>>>>>>>>>>>>>>>> which write
>>>>>>>>>>>>>>>> volatile field
>>>>>>>>>>>>>>>> (PPC64).
>>>>>>>>>>>>>>>>                   set_wrote_volatile(true);
>>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>>>>>>>> the torture test suite:
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>>>>>>>>              __________    __________    __________
>>>>>>>>>>>>>>>>> __________
>>>>>>>>>>>>>>>>> | Thread 0 |  | Thread 1 |  | Thread 2 |  | Thread 3 |
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>               write(x=1)    read(x)       write(y=1)
>>>>>>>>>>>>>>>>> read(y)
>>>>>>>>>>>>>>>>>                             read(y)
>>>>>>>>>>>>>>>>> read(x)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Disallowed:     x=1, y=0                    y=1, x=0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity.
>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> is only
>>>>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple
>>>>>>>>>>>>>>>>> SIGSEGVs
>>>>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Further this change contains a fix that assures that
>>>>>>>>>>>>>>>>> volatile
>>>>>>>>>>>>>>>>> fields
>>>>>>>>>>>>>>>>> written in constructors are visible before the reference
>>>>>>>>>>>>>>>>> gets
>>>>>>>>>>>>>>>>> published.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>>>>>>>>> seems too
>>>>>>>>>>>>>>>>> strong.
>>>>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>>>>>>>>> suffice.
>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>>              Goetz.
>>>>>>>>>>>>>>>>>
>


More information about the hotspot-dev mailing list