RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Jan 22 09:02:04 PST 2014
Hi Goetz
On 1/22/14 1:20 AM, Lindenmaier, Goetz wrote:
> Hi Vladimir,
>
> Thanks for testing and pushing!
>
> Will you push this also to stage? I assume we can handle this
> as the other three hotspot changes, without a new bug-id?
Yes, I will backport it.
What about JDK changes Volker pushed: 8028537, 8031134, 8031997 and new one from today 8031581?
Should I backport all of them into 8u stage? From conversion between Volker and Alan some of them need backport a fix
from jdk9. Or I am mistaking?
>
> Also, when do you think we (you unfortunately) should update
> the repos again? Stage-9 maybe after Volkers last change is submitted?
After I test and push 8031581 I will do sync with latest jdk9 sources (b01).
I will build bundles and give them to SQE for final testing.
Thanks,
Vladimir
>
> Best regards,
> Goetz
>
>
>
> -----Original Message-----
> From: hotspot-dev-bounces at openjdk.java.net [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Kozlov
> Sent: Dienstag, 21. Januar 2014 21:00
> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>
> Thanks. I am pushing it.
>
> Vladimir
>
> On 1/21/14 5:19 AM, Lindenmaier, Goetz wrote:
>> Sorry, I missed that. fixed.
>>
>> Best regards,
>> Goetz.
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 21. Januar 2014 12:53
>> To: Lindenmaier, Goetz; Vladimir Kozlov
>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> Thanks Goetz!
>>
>> This typo still exists:
>>
>> + bool _wrote_volatile; // Did we write a final field?
>>
>> s/final/volatile/
>>
>> Otherwise no further comments from me.
>>
>> David
>>
>> On 21/01/2014 7:22 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> I made a new webrev
>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/
>>> differing from
>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>> only in the comments.
>>>
>>> I removed
>>> // Support ordering of "Independent Reads of Independent Writes".
>>> everywhere, and edited the comments in the globalDefinition*.hpp
>>> files.
>>>
>>> Best regards,
>>> Goetz.
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Dienstag, 21. Januar 2014 05:55
>>> To: Lindenmaier, Goetz; Vladimir Kozlov
>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>
>>> Hi Goetz,
>>>
>>> On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote:
>>>> Hi,
>>>>
>>>> I tried to come up with a webrev that implements the change as proposed in
>>>> your mails:
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>>>
>>>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use
>>>> support_IRIW_for_not_multiple_copy_atomic_cpu.
>>>
>>> Given the flag name the commentary eg:
>>>
>>> + // Support ordering of "Independent Reads of Independent Writes".
>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
>>>
>>> seems somewhat redundant.
>>>
>>>> I left the definition and handling of _wrote_volatile in the code, without
>>>> any protection.
>>>
>>> + bool _wrote_volatile; // Did we write a final field?
>>>
>>> s/final/volatile
>>>
>>>> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() ,
>>>> and put it on one line.
>>>>
>>>> I removed the comment in library_call.cpp.
>>>> I also removed the sentence " Solution: implement volatile read as sync-load-acquire."
>>>> from the comments as it's PPC specific.
>>>
>>> I think the primary IRIW comment/explanation should go in
>>> globalDefinitions.hpp where
>>> support_IRIW_for_not_multiple_copy_atomic_cpu is defined.
>>>
>>>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these
>>>> issues in C1 if nobody did it by then.
>>>
>>> I've filed:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8032366
>>>
>>> "Implement C1 support for IRIW conformance on non-multiple-copy-atomic
>>> platforms"
>>>
>>> to cover this task, as it may be needed sooner rather than later.
>>>
>>>> Wrt. to performance: Oracle will soon do heavy testing of the port. If any
>>>> performance problems arise, we still can add #ifdef PPC64 to circumvent this.
>>>
>>> Ok.
>>>
>>> Thanks,
>>> David
>>>
>>>> Best regards,
>>>> Goetz.
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Donnerstag, 16. Januar 2014 10:05
>>>> To: Vladimir Kozlov
>>>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>
>>>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote:
>>>>> On 1/16/14 12:34 AM, David Holmes wrote:
>>>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>>>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to
>>>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I
>>>>>>> want to keep it this way, it could be useful to have such info on other
>>>>>>> platforms too. But I would suggest to remove PPC64 comments in
>>>>>>> parse.hpp.
>>>>>>>
>>>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value
>>>>>>> which could be checked in all places instead of #ifdef:
>>>>>>
>>>>>> I asked for the ifdef some time back as I find it much preferable to
>>>>>> have this as a build-time construct rather than a
>>>>>> runtime one. I don't want to have to pay anything for this if we don't
>>>>>> use it.
>>>>>
>>>>> Any decent C++ compiler will optimize expressions with such constants
>>>>> defined in header files. I insist to avoid #ifdefs in C2 code. I really
>>>>> don't like the code with #ifdef in unsafe.cpp but I can live with it.
>>>>
>>>> If you insist then we may as well do it all the same way. Better to be
>>>> consistent.
>>>>
>>>> My apologies Goetz for wasting your time going back and forth on this.
>>>>
>>>> That aside I have a further concern with this IRIW support - it is
>>>> incomplete as there is no C1 support, as PPC64 isn't using client. If
>>>> this is going on then we (which probably means the Oracle 'we') need to
>>>> add the missing C1 code.
>>>>
>>>> David
>>>> -----
>>>>
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>>>>>>> #else
>>>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>>>>>>> #endif
>>>>>>>
>>>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>>>>>>
>>>>>>> and then:
>>>>>>>
>>>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>>>>>>> oop p = JNIHandles::resolve(obj); \
>>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>>>>>>> OrderAccess::fence(); \
>>>>>>> volatile type_name v = OrderAccess::load_acquire((volatile
>>>>>>> type_name*)index_oop_from_field_offset_long(p, offset));
>>>>>>>
>>>>>>> And:
>>>>>>>
>>>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>>>>>>> field->is_volatile()) {
>>>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier
>>>>>>> + }
>>>>>>>
>>>>>>> And so on. The comments will be needed only in globalDefinitions.hpp
>>>>>>>
>>>>>>> The code in parse1.cpp could be put on one line:
>>>>>>>
>>>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>>>>>>> method()->is_initializer()) )) {
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 1/15/14 9:25 PM, David Holmes wrote:
>>>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> I updated the webrev:
>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>
>>>>>>>>> - I removed the IRIW example in parse3.cpp
>>>>>>>>> - I adapted the comments not to point to that comment, and to
>>>>>>>>> reflect the new flagging. Also I mention that we support the
>>>>>>>>> volatile constructor issue, but that it's not standard.
>>>>>>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>>>>>>> I also think it's better to separate these this way.
>>>>>>>>
>>>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>>>>>>> declaration and all uses to be guarded by ifdef PPC64 too
>>>>>>>> please.
>>>>>>>>
>>>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>>>>>>>> comment doesn't make much sense to me and refers to
>>>>>>>> ppc specific stuff in a shared file:
>>>>>>>>
>>>>>>>> if (is_volatile) {
>>>>>>>> ! if (!is_store) {
>>>>>>>> insert_mem_bar(Op_MemBarAcquire);
>>>>>>>> ! } else {
>>>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>>>>>>>> insert_mem_bar(Op_MemBarVolatile);
>>>>>>>> + #endif
>>>>>>>> + }
>>>>>>>>
>>>>>>>> I don't think the comment is needed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Thanks for your comments!
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Goetz.
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>
>>>>>>>>> Hi Goetz,
>>>>>>>>>
>>>>>>>>> Sorry for the delay in getting back to this.
>>>>>>>>>
>>>>>>>>> The general changes to the volatile barriers to support IRIW are okay.
>>>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>>>>>>>>> specifically it is
>>>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>>>>>>>>> the commentary excessive, particularly for shared code. In particular
>>>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the
>>>>>>>>> explanation and I don't think we need it to that level of detail.
>>>>>>>>> Seems
>>>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>>>>>>>>
>>>>>>>>> The changes related to volatile writes in the constructor, as
>>>>>>>>> discussed
>>>>>>>>> are not required by the Java Memory Model. If you want to keep these
>>>>>>>>> then I think they should all be guarded with PPC64 because it is not
>>>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>>>>>>>>> PPC64 porters.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I updated this webrev. I detected a small flaw I made when editing
>>>>>>>>>> this version.
>>>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>>>>>>> I also based the webrev on the latest version of the stage repo.
>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Goetz.
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Lindenmaier, Goetz
>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>>>>>>> To: David Holmes
>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>>> So we can at least undo #4 now we have established those tests were
>>>>>>>>>>> not
>>>>>>>>>>> required to pass.
>>>>>>>>>> We would prefer if we could keep this in. We want to avoid that it's
>>>>>>>>>> blamed on the VM if java programs are failing on PPC after they
>>>>>>>>>> worked
>>>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard
>>>>>>>>>> it by
>>>>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this
>>>>>>>>>> part is
>>>>>>>>>> not that performance relevant.
>>>>>>>>>>
>>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>>>>> think
>>>>>>>>>> I added a compile-time guard in this new webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>>>>>>>>>> several double negations I don't like, (#ifNdef
>>>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>>>>>>> but this way I only have to change the ppc platform.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Goetz
>>>>>>>>>>
>>>>>>>>>> P.S.: I will also be available over the Christmas period.
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>
>>>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check
>>>>>>>>>> emails.
>>>>>>>>>>
>>>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> ok, I understand the tests are wrong. It's good this issue is
>>>>>>>>>>> settled.
>>>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>>>>>>>>
>>>>>>>>>>> About our change: David, the causality is the other way round.
>>>>>>>>>>> The change is about IRIW.
>>>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>>>>>>
>>>>>>>>>> This is the part I still have some question marks over as the
>>>>>>>>>> implications are not nice for performance on non-TSO platforms.
>>>>>>>>>> But I'm
>>>>>>>>>> no further along in processing that paper I'm afraid.
>>>>>>>>>>
>>>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after
>>>>>>>>>>> stores.
>>>>>>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>>>>>>> constructor tests.
>>>>>>>>>>> 4. So finally we added them again at the end of the constructor
>>>>>>>>>>> after stores
>>>>>>>>>>> to pass the volatile constructor tests.
>>>>>>>>>>
>>>>>>>>>> So we can at least undo #4 now we have established those tests
>>>>>>>>>> were not
>>>>>>>>>> required to pass.
>>>>>>>>>>
>>>>>>>>>>> We originally passed the constructor tests because the ppc memory
>>>>>>>>>>> order
>>>>>>>>>>> instructions are not as find-granular as the
>>>>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad.
>>>>>>>>>>> The only instruction
>>>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore,
>>>>>>>>>>> therefore the
>>>>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The
>>>>>>>>>>> proper representation
>>>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now
>>>>>>>>>>> it's pointless
>>>>>>>>>>> anyways.
>>>>>>>>>>>
>>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>>>>>>> I'd be happy to add a property
>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>
>>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>>>> think
>>>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>>>>>>>>> based not architecture based) as that will allows for turning this
>>>>>>>>>> on/off for any architecture for testing purposes.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> or the like to guard the customization. I'd like that much better.
>>>>>>>>>>> Or also
>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Goetz.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>
>>>>>>>>>>> TL;DR version:
>>>>>>>>>>>
>>>>>>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>>>>>>> constructor-barrier
>>>>>>>>>>> for volatiles is not required as part of the JMM specification. It
>>>>>>>>>>> *may*
>>>>>>>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>>>>>>>> ensure
>>>>>>>>>>> you can't see uninitialized fields. So the tests for this are
>>>>>>>>>>> invalid
>>>>>>>>>>> and this part of the patch is not needed in general (ppc64 may
>>>>>>>>>>> need it
>>>>>>>>>>> due to other factors).
>>>>>>>>>>>
>>>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>>>>>>>> term :)
>>>>>>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>>>>>>
>>>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>>>>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>>>>>>> visible to
>>>>>>>>>>> all other hardware threads at the same time point; these
>>>>>>>>>>> architectures
>>>>>>>>>>> are not multiple-copy atomic."
>>>>>>>>>>>
>>>>>>>>>>> This is the visibility issue that I referred to and affects both
>>>>>>>>>>> ARM and
>>>>>>>>>>> PPC. But of course it is normally handled by using suitable barriers
>>>>>>>>>>> after the stores that need to be visible. I think the crux of the
>>>>>>>>>>> current issue is what you wrote below:
>>>>>>>>>>>
>>>>>>>>>>> > The fixes for the constructor issue are only needed because we
>>>>>>>>>>> > remove the sync instruction from behind stores
>>>>>>>>>>> (parse3.cpp:320)
>>>>>>>>>>> > and place it before loads.
>>>>>>>>>>>
>>>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync
>>>>>>>>>>> after
>>>>>>>>>>> the store then you have to do something around the loads to get the
>>>>>>>>>>> same
>>>>>>>>>>> results! I still don't know what lead you to the conclusion that the
>>>>>>>>>>> only way to fix the IRIW issue was to put the fence before the
>>>>>>>>>>> load -
>>>>>>>>>>> maybe when I get the chance to read that paper in full it will be
>>>>>>>>>>> clearer.
>>>>>>>>>>>
>>>>>>>>>>> So ... the basic problem is that the current structure in the VM has
>>>>>>>>>>> hard-wired one choice of how to get the right semantics for volatile
>>>>>>>>>>> variables. You now want to customize that but not all the requisite
>>>>>>>>>>> hooks are present. It would be better if volatile_load and
>>>>>>>>>>> volatile_store were factored out so that they could be
>>>>>>>>>>> implemented as
>>>>>>>>>>> desired per-platform. Alternatively there could be pre- and post-
>>>>>>>>>>> hooks
>>>>>>>>>>> that could then be customized per platform. Otherwise you need
>>>>>>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>>>>>>
>>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think
>>>>>>>>>>> this
>>>>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>>>>>>>>> abstractions are a confused mess in my opinion.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>
>>>>>>>>>>>> I did not yet add the
>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>> or
>>>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>>>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>>>>>>
>>>>>>>>>>>> But we would like to pass the torture test suite as we consider
>>>>>>>>>>>> this a substantial task in implementing a PPC port. Also we think
>>>>>>>>>>>> both tests show behavior a programmer would expect. It's bad if
>>>>>>>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>>>>>>>> fails on ppc. This will always first be blamed on the VM.
>>>>>>>>>>>>
>>>>>>>>>>>> The fixes for the constructor issue are only needed because we
>>>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>>>>>>>> and place it before loads. Then there is no sync between volatile
>>>>>>>>>>>> store
>>>>>>>>>>>> and publishing the object. So we add it again in this one case
>>>>>>>>>>>> (volatile store in constructor).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> @David
>>>>>>>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>>>>>>>> taskqueue problem yet,
>>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>>>>> continuous.
>>>>>>>>>>>> That's not true, we did a lot of investigation and testing on this
>>>>>>>>>>>> issue.
>>>>>>>>>>>> And we came up with a solution we consider the best possible. If
>>>>>>>>>>>> you
>>>>>>>>>>>> have objections, you should at least give the draft of a better
>>>>>>>>>>>> solution,
>>>>>>>>>>>> we would volunteer to implement and test it.
>>>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture
>>>>>>>>>>>> issues.
>>>>>>>>>>>>
>>>>>>>>>>>> @David
>>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term
>>>>>>>>>>>>> and
>>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the
>>>>>>>>>>>> ARM and
>>>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>>>>>>> Work-Stealing for
>>>>>>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>>>>>>> taskqueue problem.
>>>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>>>>>>
>>>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I
>>>>>>>>>>>> used 'read'
>>>>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name
>>>>>>>>>>>> above).
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards and thanks for all your involvements,
>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>
>>>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>> Hi David,
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Volatile in constuctor
>>>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>>>>>>> processors
>>>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks
>>>>>>>>>>>>> Aleksey!)
>>>>>>>>>>>>
>>>>>>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>>>>>>
>>>>>>>>>>>>> -- IRIW issue
>>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with
>>>>>>>>>>>>>> a few
>>>>>>>>>>>>>> moments thought.
>>>>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue
>>>>>>>>>>>>> problem yet,
>>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>>>
>>>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>>>> continuous.
>>>>>>>>>>>>
>>>>>>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>>>>>> different?)
>>>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore
>>>>>>>>>>>>> I contributed a
>>>>>>>>>>>>> solution with the #defines, and that's correct for all, but not
>>>>>>>>>>>>> nice, I admit.
>>>>>>>>>>>>> (I don't really know about ARM, though).
>>>>>>>>>>>>> So if I can write down a nicer solution testing for methods that
>>>>>>>>>>>>> are evaluated
>>>>>>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the
>>>>>>>>>>>>> problem
>>>>>>>>>>>>> is that
>>>>>>>>>>>>> store
>>>>>>>>>>>>> sync
>>>>>>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>>>>>>> only
>>>>>>>>>>>>> sync
>>>>>>>>>>>>> load
>>>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>>>>>>>>>> pass that test.
>>>>>>>>>>>>
>>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>>>>>> term and
>>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>> The JMM is fine. And
>>>>>>>>>>>>> store
>>>>>>>>>>>>> MemBarVolatile
>>>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions
>>>>>>>>>>>>> that
>>>>>>>>>>>>> do what is required.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>>>>>>> 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>> Hi everybody,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM
>>>>>>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>>>>>>> initializing
>>>>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey
>>>>>>>>>>>>>> Shipilev
>>>>>>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>>>>>>> Also, concurrency torture tests
>>>>>>>>>>>>>> LongVolatileTest
>>>>>>>>>>>>>> AtomicIntegerInitialValueTest
>>>>>>>>>>>>>> will fail.
>>>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>>>>>>> volatile field would be
>>>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>>>>>>> AtomicInteger.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>>>>>>> volatiles do
>>>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM
>>>>>>>>>>>>> that
>>>>>>>>>>>>> requires the constructor barrier - discussions with Doug and
>>>>>>>>>>>>> Aleksey
>>>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely
>>>>>>>>>>>>>>> heavyweight
>>>>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We
>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>>>>>>> OrderAccess operations
>>>>>>>>>>>>>> Basically, I agree on this. But you also have to take into
>>>>>>>>>>>>>> account
>>>>>>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>>>>>>> different platforms
>>>>>>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>>>>>>> An example:
>>>>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier
>>>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like
>>>>>>>>>>>>>> this:
>>>>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore |
>>>>>>>>>>>>>> Assembler::StoreStore)
>>>>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>>>>>>>> Just doing what is required.
>>>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>>>>>>>> fine grained operations:
>>>>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore,
>>>>>>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore,
>>>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations
>>>>>>>>>>>>>> are more (too) powerful,
>>>>>>>>>>>>>> I need an additional optimization that removes the lwsync. I
>>>>>>>>>>>>>> can not implement
>>>>>>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable
>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the
>>>>>>>>>>>>>> read
>>>>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would propose to guard the code by
>>>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>>>>>>>> independent?
>>>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with a
>>>>>>>>>>>>> few
>>>>>>>>>>>>> moments thought. You are implying there is a problem here that
>>>>>>>>>>>>> will
>>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>>>>> different?) and I can not take that on face value at the
>>>>>>>>>>>>> moment. The
>>>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>>>>>>> requirements for
>>>>>>>>>>>>> volatile accesses is if there are global visibility issues that
>>>>>>>>>>>>> are not
>>>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the
>>>>>>>>>>>>> store
>>>>>>>>>>>>> would deal with that, not at the load. (This situation reminds me
>>>>>>>>>>>>> of the
>>>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>>>>>>>> software
>>>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>>>>>>>> don't have
>>>>>>>>>>>>> that on ppc!)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days
>>>>>>>>>>>>> I will
>>>>>>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Other ports:
>>>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so
>>>>>>>>>>>>>> it might
>>>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead
>>>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of
>>>>>>>>>>>>>> them).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable
>>>>>>>>>>>>>> way :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> From what I understand what we see here is an attempt to
>>>>>>>>>>>>>> fix an
>>>>>>>>>>>>>> existing issue with the implementation of volatiles so that the
>>>>>>>>>>>>>> IRIW
>>>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>>>>>>>> doing the
>>>>>>>>>>>>>> load.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I
>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> though!). But I do not agree with the changes to the shared code
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>>>>>>>> this is
>>>>>>>>>>>>>> polluting the shared code. My concern is similar to what I said
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired
>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed
>>>>>>>>>>>>>> on a
>>>>>>>>>>>>>> given architecture then the implementation in OrderAccess should
>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>> to a no-op.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed
>>>>>>>>>>>>>> under
>>>>>>>>>>>>>> the JMM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am fine with suggested changes because you did not change our
>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>>>>>>>>> But may be it should be done using more general query which
>>>>>>>>>>>>>>> is set
>>>>>>>>>>>>>>> depending on platform:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the
>>>>>>>>>>>>>> Java
>>>>>>>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>>>>>>>> explicitly
>>>>>>>>>>>>>> like this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM
>>>>>>>>>>>>>> defines?
>>>>>>>>>>>>>>> From what I understand our ppc port is also affected.
>>>>>>>>>>>>>>> David?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>>>>>>> inside else {}.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>>>>>>> specific which
>>>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many
>>>>>>>>>>>>>>> #ifdefs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> if (is_vol) {
>>>>>>>>>>>>>>> // See comment in do_get_xxx().
>>>>>>>>>>>>>>> #ifndef PPC64
>>>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>>>>>>>>> #else
>>>>>>>>>>>>>>> if (is_field) {
>>>>>>>>>>>>>>> // Add MemBarRelease for constructors which write
>>>>>>>>>>>>>>> volatile field
>>>>>>>>>>>>>>> (PPC64).
>>>>>>>>>>>>>>> set_wrote_volatile(true);
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>>>>>>> the torture test suite:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>>>>>>> __________ __________ __________ __________
>>>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y)
>>>>>>>>>>>>>>>> read(y) read(x)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This
>>>>>>>>>>>>>>>> is only
>>>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the
>>>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple
>>>>>>>>>>>>>>>> SIGSEGVs
>>>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Further this change contains a fix that assures that volatile
>>>>>>>>>>>>>>>> fields
>>>>>>>>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>>>>>>>>> published.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>>>>>>>> seems too
>>>>>>>>>>>>>>>> strong.
>>>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>>>>>>>> suffice.
>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>>>
More information about the hotspot-dev
mailing list