RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Tue Jan 21 01:22:08 PST 2014
Hi,
I made a new webrev
http://cr.openjdk.java.net/~goetz/webrevs/8029101-3-raw/
differing from
http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
only in the comments.
I removed
// Support ordering of "Independent Reads of Independent Writes".
everywhere, and edited the comments in the globalDefinition*.hpp
files.
Best regards,
Goetz.
-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com]
Sent: Dienstag, 21. Januar 2014 05:55
To: Lindenmaier, Goetz; Vladimir Kozlov
Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Hi Goetz,
On 17/01/2014 6:39 PM, Lindenmaier, Goetz wrote:
> Hi,
>
> I tried to come up with a webrev that implements the change as proposed in
> your mails:
> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>
> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use
> support_IRIW_for_not_multiple_copy_atomic_cpu.
Given the flag name the commentary eg:
+ // Support ordering of "Independent Reads of Independent Writes".
+ if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
seems somewhat redundant.
> I left the definition and handling of _wrote_volatile in the code, without
> any protection.
+ bool _wrote_volatile; // Did we write a final field?
s/final/volatile
> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() ,
> and put it on one line.
>
> I removed the comment in library_call.cpp.
> I also removed the sentence " Solution: implement volatile read as sync-load-acquire."
> from the comments as it's PPC specific.
I think the primary IRIW comment/explanation should go in
globalDefinitions.hpp where
support_IRIW_for_not_multiple_copy_atomic_cpu is defined.
> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these
> issues in C1 if nobody did it by then.
I've filed:
https://bugs.openjdk.java.net/browse/JDK-8032366
"Implement C1 support for IRIW conformance on non-multiple-copy-atomic
platforms"
to cover this task, as it may be needed sooner rather than later.
> Wrt. to performance: Oracle will soon do heavy testing of the port. If any
> performance problems arise, we still can add #ifdef PPC64 to circumvent this.
Ok.
Thanks,
David
> Best regards,
> Goetz.
>
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Donnerstag, 16. Januar 2014 10:05
> To: Vladimir Kozlov
> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>
> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote:
>> On 1/16/14 12:34 AM, David Holmes wrote:
>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to
>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I
>>>> want to keep it this way, it could be useful to have such info on other
>>>> platforms too. But I would suggest to remove PPC64 comments in
>>>> parse.hpp.
>>>>
>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value
>>>> which could be checked in all places instead of #ifdef:
>>>
>>> I asked for the ifdef some time back as I find it much preferable to
>>> have this as a build-time construct rather than a
>>> runtime one. I don't want to have to pay anything for this if we don't
>>> use it.
>>
>> Any decent C++ compiler will optimize expressions with such constants
>> defined in header files. I insist to avoid #ifdefs in C2 code. I really
>> don't like the code with #ifdef in unsafe.cpp but I can live with it.
>
> If you insist then we may as well do it all the same way. Better to be
> consistent.
>
> My apologies Goetz for wasting your time going back and forth on this.
>
> That aside I have a further concern with this IRIW support - it is
> incomplete as there is no C1 support, as PPC64 isn't using client. If
> this is going on then we (which probably means the Oracle 'we') need to
> add the missing C1 code.
>
> David
> -----
>
>> Vladimir
>>
>>>
>>> David
>>>
>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>>>> #else
>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>>>> #endif
>>>>
>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>>>
>>>> and then:
>>>>
>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>>>> oop p = JNIHandles::resolve(obj); \
>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>>>> OrderAccess::fence(); \
>>>> volatile type_name v = OrderAccess::load_acquire((volatile
>>>> type_name*)index_oop_from_field_offset_long(p, offset));
>>>>
>>>> And:
>>>>
>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>>>> field->is_volatile()) {
>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier
>>>> + }
>>>>
>>>> And so on. The comments will be needed only in globalDefinitions.hpp
>>>>
>>>> The code in parse1.cpp could be put on one line:
>>>>
>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>>>> method()->is_initializer()) )) {
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 1/15/14 9:25 PM, David Holmes wrote:
>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> I updated the webrev:
>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>
>>>>>> - I removed the IRIW example in parse3.cpp
>>>>>> - I adapted the comments not to point to that comment, and to
>>>>>> reflect the new flagging. Also I mention that we support the
>>>>>> volatile constructor issue, but that it's not standard.
>>>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>>>> I also think it's better to separate these this way.
>>>>>
>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>>>> declaration and all uses to be guarded by ifdef PPC64 too
>>>>> please.
>>>>>
>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>>>>> comment doesn't make much sense to me and refers to
>>>>> ppc specific stuff in a shared file:
>>>>>
>>>>> if (is_volatile) {
>>>>> ! if (!is_store) {
>>>>> insert_mem_bar(Op_MemBarAcquire);
>>>>> ! } else {
>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>>>>> insert_mem_bar(Op_MemBarVolatile);
>>>>> + #endif
>>>>> + }
>>>>>
>>>>> I don't think the comment is needed.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks for your comments!
>>>>>>
>>>>>> Best regards,
>>>>>> Goetz.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>>>> To: Lindenmaier, Goetz
>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>> Independent Reads of Independent Writes
>>>>>>
>>>>>> Hi Goetz,
>>>>>>
>>>>>> Sorry for the delay in getting back to this.
>>>>>>
>>>>>> The general changes to the volatile barriers to support IRIW are okay.
>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>>>>>> specifically it is
>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>>>>>> the commentary excessive, particularly for shared code. In particular
>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the
>>>>>> explanation and I don't think we need it to that level of detail.
>>>>>> Seems
>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>>>>>
>>>>>> The changes related to volatile writes in the constructor, as
>>>>>> discussed
>>>>>> are not required by the Java Memory Model. If you want to keep these
>>>>>> then I think they should all be guarded with PPC64 because it is not
>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>>>>>> PPC64 porters.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I updated this webrev. I detected a small flaw I made when editing
>>>>>>> this version.
>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>>>> I also based the webrev on the latest version of the stage repo.
>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Goetz.
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Lindenmaier, Goetz
>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>>>> To: David Holmes
>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>> Independent Reads of Independent Writes
>>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>>> So we can at least undo #4 now we have established those tests were
>>>>>>>> not
>>>>>>>> required to pass.
>>>>>>> We would prefer if we could keep this in. We want to avoid that it's
>>>>>>> blamed on the VM if java programs are failing on PPC after they
>>>>>>> worked
>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard
>>>>>>> it by
>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this
>>>>>>> part is
>>>>>>> not that performance relevant.
>>>>>>>
>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>> think
>>>>>>> I added a compile-time guard in this new webrev:
>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>>>>>>> several double negations I don't like, (#ifNdef
>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>>>> but this way I only have to change the ppc platform.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Goetz
>>>>>>>
>>>>>>> P.S.: I will also be available over the Christmas period.
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>>>> To: Lindenmaier, Goetz
>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>> Independent Reads of Independent Writes
>>>>>>>
>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check
>>>>>>> emails.
>>>>>>>
>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> ok, I understand the tests are wrong. It's good this issue is
>>>>>>>> settled.
>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>>>>>
>>>>>>>> About our change: David, the causality is the other way round.
>>>>>>>> The change is about IRIW.
>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>>>
>>>>>>> This is the part I still have some question marks over as the
>>>>>>> implications are not nice for performance on non-TSO platforms.
>>>>>>> But I'm
>>>>>>> no further along in processing that paper I'm afraid.
>>>>>>>
>>>>>>>> 2. If we do syncs before loads, we don't need to do them after
>>>>>>>> stores.
>>>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>>>> constructor tests.
>>>>>>>> 4. So finally we added them again at the end of the constructor
>>>>>>>> after stores
>>>>>>>> to pass the volatile constructor tests.
>>>>>>>
>>>>>>> So we can at least undo #4 now we have established those tests
>>>>>>> were not
>>>>>>> required to pass.
>>>>>>>
>>>>>>>> We originally passed the constructor tests because the ppc memory
>>>>>>>> order
>>>>>>>> instructions are not as find-granular as the
>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad.
>>>>>>>> The only instruction
>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore,
>>>>>>>> therefore the
>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The
>>>>>>>> proper representation
>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now
>>>>>>>> it's pointless
>>>>>>>> anyways.
>>>>>>>>
>>>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>>>> I'd be happy to add a property
>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>
>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>> think
>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>>>>>> based not architecture based) as that will allows for turning this
>>>>>>> on/off for any architecture for testing purposes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> or the like to guard the customization. I'd like that much better.
>>>>>>>> Or also
>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>
>>>>>>>> TL;DR version:
>>>>>>>>
>>>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>>>> constructor-barrier
>>>>>>>> for volatiles is not required as part of the JMM specification. It
>>>>>>>> *may*
>>>>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>>>>> ensure
>>>>>>>> you can't see uninitialized fields. So the tests for this are
>>>>>>>> invalid
>>>>>>>> and this part of the patch is not needed in general (ppc64 may
>>>>>>>> need it
>>>>>>>> due to other factors).
>>>>>>>>
>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>>>>> term :)
>>>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>>>
>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>>>> visible to
>>>>>>>> all other hardware threads at the same time point; these
>>>>>>>> architectures
>>>>>>>> are not multiple-copy atomic."
>>>>>>>>
>>>>>>>> This is the visibility issue that I referred to and affects both
>>>>>>>> ARM and
>>>>>>>> PPC. But of course it is normally handled by using suitable barriers
>>>>>>>> after the stores that need to be visible. I think the crux of the
>>>>>>>> current issue is what you wrote below:
>>>>>>>>
>>>>>>>> > The fixes for the constructor issue are only needed because we
>>>>>>>> > remove the sync instruction from behind stores
>>>>>>>> (parse3.cpp:320)
>>>>>>>> > and place it before loads.
>>>>>>>>
>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync
>>>>>>>> after
>>>>>>>> the store then you have to do something around the loads to get the
>>>>>>>> same
>>>>>>>> results! I still don't know what lead you to the conclusion that the
>>>>>>>> only way to fix the IRIW issue was to put the fence before the
>>>>>>>> load -
>>>>>>>> maybe when I get the chance to read that paper in full it will be
>>>>>>>> clearer.
>>>>>>>>
>>>>>>>> So ... the basic problem is that the current structure in the VM has
>>>>>>>> hard-wired one choice of how to get the right semantics for volatile
>>>>>>>> variables. You now want to customize that but not all the requisite
>>>>>>>> hooks are present. It would be better if volatile_load and
>>>>>>>> volatile_store were factored out so that they could be
>>>>>>>> implemented as
>>>>>>>> desired per-platform. Alternatively there could be pre- and post-
>>>>>>>> hooks
>>>>>>>> that could then be customized per platform. Otherwise you need
>>>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>>>
>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think
>>>>>>>> this
>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>>>>>> abstractions are a confused mess in my opinion.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>
>>>>>>>>> I did not yet add the
>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>> or
>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>>>
>>>>>>>>> But we would like to pass the torture test suite as we consider
>>>>>>>>> this a substantial task in implementing a PPC port. Also we think
>>>>>>>>> both tests show behavior a programmer would expect. It's bad if
>>>>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>>>>> fails on ppc. This will always first be blamed on the VM.
>>>>>>>>>
>>>>>>>>> The fixes for the constructor issue are only needed because we
>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>>>>> and place it before loads. Then there is no sync between volatile
>>>>>>>>> store
>>>>>>>>> and publishing the object. So we add it again in this one case
>>>>>>>>> (volatile store in constructor).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> @David
>>>>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>>>>> taskqueue problem yet,
>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>> continuous.
>>>>>>>>> That's not true, we did a lot of investigation and testing on this
>>>>>>>>> issue.
>>>>>>>>> And we came up with a solution we consider the best possible. If
>>>>>>>>> you
>>>>>>>>> have objections, you should at least give the draft of a better
>>>>>>>>> solution,
>>>>>>>>> we would volunteer to implement and test it.
>>>>>>>>> Similarly, we invested time in fixing the concurrency torture
>>>>>>>>> issues.
>>>>>>>>>
>>>>>>>>> @David
>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term
>>>>>>>>>> and
>>>>>>>>>> can't find any reference to it.
>>>>>>>>> We learned about this reading "A Tutorial Introduction to the
>>>>>>>>> ARM and
>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>>>> Work-Stealing for
>>>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>>>> taskqueue problem.
>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>>>
>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I
>>>>>>>>> used 'read'
>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name
>>>>>>>>> above).
>>>>>>>>>
>>>>>>>>> Best regards and thanks for all your involvements,
>>>>>>>>> Goetz.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>
>>>>>>>>> Hi Goetz,
>>>>>>>>>
>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>> Hi David,
>>>>>>>>>>
>>>>>>>>>> -- Volatile in constuctor
>>>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>>>> processors
>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks
>>>>>>>>>> Aleksey!)
>>>>>>>>>
>>>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>>>
>>>>>>>>>> -- IRIW issue
>>>>>>>>>>> I can not possibly answer to the necessary level of detail with
>>>>>>>>>>> a few
>>>>>>>>>>> moments thought.
>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue
>>>>>>>>>> problem yet,
>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>
>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>> continuous.
>>>>>>>>>
>>>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>>> different?)
>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore
>>>>>>>>>> I contributed a
>>>>>>>>>> solution with the #defines, and that's correct for all, but not
>>>>>>>>>> nice, I admit.
>>>>>>>>>> (I don't really know about ARM, though).
>>>>>>>>>> So if I can write down a nicer solution testing for methods that
>>>>>>>>>> are evaluated
>>>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>>>
>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the
>>>>>>>>>> problem
>>>>>>>>>> is that
>>>>>>>>>> store
>>>>>>>>>> sync
>>>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>>>> only
>>>>>>>>>> sync
>>>>>>>>>> load
>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>>>>>>> pass that test.
>>>>>>>>>
>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>>> term and
>>>>>>>>> can't find any reference to it.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> The JMM is fine. And
>>>>>>>>>> store
>>>>>>>>>> MemBarVolatile
>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions
>>>>>>>>>> that
>>>>>>>>>> do what is required.
>>>>>>>>>>
>>>>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Goetz.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>>>> 'hotspot-dev at openjdk.java.net';
>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>
>>>>>>>>>> Hi Goetz,
>>>>>>>>>>
>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>> Hi everybody,
>>>>>>>>>>>
>>>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>>>
>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM
>>>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>>>> initializing
>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey
>>>>>>>>>>> Shipilev
>>>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>>>> Also, concurrency torture tests
>>>>>>>>>>> LongVolatileTest
>>>>>>>>>>> AtomicIntegerInitialValueTest
>>>>>>>>>>> will fail.
>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>>>> volatile field would be
>>>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>>>> AtomicInteger.)
>>>>>>>>>>
>>>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>>>> volatiles do
>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM
>>>>>>>>>> that
>>>>>>>>>> requires the constructor barrier - discussions with Doug and
>>>>>>>>>> Aleksey
>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>
>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely
>>>>>>>>>>>> heavyweight
>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We
>>>>>>>>>>> don't
>>>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>>>
>>>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>>>> OrderAccess operations
>>>>>>>>>>> Basically, I agree on this. But you also have to take into
>>>>>>>>>>> account
>>>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>>>> different platforms
>>>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>>>> An example:
>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier
>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>>>>>>> If these are consecutively in the code, sparc code looks like
>>>>>>>>>>> this:
>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore |
>>>>>>>>>>> Assembler::StoreStore)
>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>>>>> Just doing what is required.
>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>>>>> fine grained operations:
>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore,
>>>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore,
>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations
>>>>>>>>>>> are more (too) powerful,
>>>>>>>>>>> I need an additional optimization that removes the lwsync. I
>>>>>>>>>>> can not implement
>>>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>>>
>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable
>>>>>>>>>>> issue.
>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the
>>>>>>>>>>> read
>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>>>>>
>>>>>>>>>>> I would propose to guard the code by
>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>>>>> independent?
>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>>>
>>>>>>>>>> I can not possibly answer to the necessary level of detail with a
>>>>>>>>>> few
>>>>>>>>>> moments thought. You are implying there is a problem here that
>>>>>>>>>> will
>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>> different?) and I can not take that on face value at the
>>>>>>>>>> moment. The
>>>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>>>> requirements for
>>>>>>>>>> volatile accesses is if there are global visibility issues that
>>>>>>>>>> are not
>>>>>>>>>> addressed - but even then I would expect heavy barriers at the
>>>>>>>>>> store
>>>>>>>>>> would deal with that, not at the load. (This situation reminds me
>>>>>>>>>> of the
>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>>>>> software
>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>>>>> don't have
>>>>>>>>>> that on ppc!)
>>>>>>>>>>
>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days
>>>>>>>>>> I will
>>>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Goetz.
>>>>>>>>>>>
>>>>>>>>>>> -- Other ports:
>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so
>>>>>>>>>>> it might
>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead
>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of
>>>>>>>>>>> them).
>>>>>>>>>>>
>>>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I
>>>>>>>>>>> can
>>>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>
>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable
>>>>>>>>>>> way :)
>>>>>>>>>>>
>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>>>
>>>>>>>>>>> From what I understand what we see here is an attempt to
>>>>>>>>>>> fix an
>>>>>>>>>>> existing issue with the implementation of volatiles so that the
>>>>>>>>>>> IRIW
>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>>>>> doing the
>>>>>>>>>>> load.
>>>>>>>>>>>
>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I
>>>>>>>>>>> would be
>>>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>>>> performance
>>>>>>>>>>> though!). But I do not agree with the changes to the shared code
>>>>>>>>>>> that
>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>>>>> this is
>>>>>>>>>>> polluting the shared code. My concern is similar to what I said
>>>>>>>>>>> with the
>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using
>>>>>>>>>>> the
>>>>>>>>>>> correct OrderAccess operations to guarantee the desired
>>>>>>>>>>> properties
>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed
>>>>>>>>>>> on a
>>>>>>>>>>> given architecture then the implementation in OrderAccess should
>>>>>>>>>>> reduce
>>>>>>>>>>> to a no-op.
>>>>>>>>>>>
>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed
>>>>>>>>>>> under
>>>>>>>>>>> the JMM.
>>>>>>>>>>>
>>>>>>>>>>>> I am fine with suggested changes because you did not change our
>>>>>>>>>>>> current
>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>>>>>> But may be it should be done using more general query which
>>>>>>>>>>>> is set
>>>>>>>>>>>> depending on platform:
>>>>>>>>>>>>
>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>
>>>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>>>
>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>
>>>>>>>>>>> Every platform has to support IRIW this is simply part of the
>>>>>>>>>>> Java
>>>>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>>>>> explicitly
>>>>>>>>>>> like this.
>>>>>>>>>>>
>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are
>>>>>>>>>>> there
>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM
>>>>>>>>>>> defines?
>>>>>>>>>>>> From what I understand our ppc port is also affected.
>>>>>>>>>>>> David?
>>>>>>>>>>>
>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>>>> inside else {}.
>>>>>>>>>>>>
>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>>>> specific which
>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many
>>>>>>>>>>>> #ifdefs.
>>>>>>>>>>>>
>>>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>>>
>>>>>>>>>>>> if (is_vol) {
>>>>>>>>>>>> // See comment in do_get_xxx().
>>>>>>>>>>>> #ifndef PPC64
>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>>>>>> #else
>>>>>>>>>>>> if (is_field) {
>>>>>>>>>>>> // Add MemBarRelease for constructors which write
>>>>>>>>>>>> volatile field
>>>>>>>>>>>> (PPC64).
>>>>>>>>>>>> set_wrote_volatile(true);
>>>>>>>>>>>> }
>>>>>>>>>>>> #endif
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>>>> the torture test suite:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>
>>>>>>>>>>>>> Example:
>>>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>>>> __________ __________ __________ __________
>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>>>>>>
>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y)
>>>>>>>>>>>>> read(y) read(x)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This
>>>>>>>>>>>>> is only
>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the
>>>>>>>>>>>>> threads
>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple
>>>>>>>>>>>>> SIGSEGVs
>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Further this change contains a fix that assures that volatile
>>>>>>>>>>>>> fields
>>>>>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>>>>>> published.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>>>>> seems too
>>>>>>>>>>>>> strong.
>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>>>>> suffice.
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>
More information about the ppc-aix-port-dev
mailing list