RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
Vladimir Kozlov
vladimir.kozlov at oracle.com
Sat Jan 18 09:36:04 PST 2014
Good.
Thanks,
Vladimir
On 1/18/14 2:36 AM, Lindenmaier, Goetz wrote:
> Hi,
>
> I updated the comments in the webrev accordingly.
> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>
> Best regards,
> Goetz.
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Saturday, January 18, 2014 2:06 AM
> To: Lindenmaier, Goetz; David Holmes
> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>
> Goetz,
>
> I asked to remove both comments in parse.hpp, comments in parse1.cpp and
> parse3.cpp is enough. It should be similar to _wrote_final:
>
> bool _wrote_volatile; // Did we write a final field?
>
>
> I think the next comment in parse3.cpp should be modified:
>
> + // But remember we wrote a volatile field so that a barrier can be
> issued
> + // in constructors. See do_exits() in parse1.cpp.
>
> // Remember we wrote a volatile field.
> // For not multiple copy atomic cpu (ppc64) a barrier should be issued
> // in constructors which have such stores. See do_exits() in parse1.cpp.
>
> Thanks,
> Vladimir
>
> On 1/17/14 12:39 AM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> I tried to come up with a webrev that implements the change as proposed in
>> your mails:
>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-2-raw/
>>
>> Wherever I used CPU_NOT_MULTIPLE_COPY_ATOMIC, I use
>> support_IRIW_for_not_multiple_copy_atomic_cpu.
>>
>> I left the definition and handling of _wrote_volatile in the code, without
>> any protection.
>> I protected issuing the barrier for volatile in constructors with PPC64_ONLY() ,
>> and put it on one line.
>>
>> I removed the comment in library_call.cpp.
>> I also removed the sentence " Solution: implement volatile read as sync-load-acquire."
>> from the comments as it's PPC specific.
>>
>> Wrt. to C1: we plan to port C1 to PPC64, too. During that task, we will fix these
>> issues in C1 if nobody did it by then.
>>
>> Wrt. to performance: Oracle will soon do heavy testing of the port. If any
>> performance problems arise, we still can add #ifdef PPC64 to circumvent this.
>>
>> Best regards,
>> Goetz.
>>
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Donnerstag, 16. Januar 2014 10:05
>> To: Vladimir Kozlov
>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> On 16/01/2014 6:54 PM, Vladimir Kozlov wrote:
>>> On 1/16/14 12:34 AM, David Holmes wrote:
>>>> On 16/01/2014 5:13 PM, Vladimir Kozlov wrote:
>>>>> This is becoming ugly #ifdef mess. In compiler code we are trying to
>>>>> avoid them. I suggested to have _wrote_volatile without #ifdef and I
>>>>> want to keep it this way, it could be useful to have such info on other
>>>>> platforms too. But I would suggest to remove PPC64 comments in
>>>>> parse.hpp.
>>>>>
>>>>> In globalDefinitions.hpp after globalDefinitions_ppc.hpp define a value
>>>>> which could be checked in all places instead of #ifdef:
>>>>
>>>> I asked for the ifdef some time back as I find it much preferable to
>>>> have this as a build-time construct rather than a
>>>> runtime one. I don't want to have to pay anything for this if we don't
>>>> use it.
>>>
>>> Any decent C++ compiler will optimize expressions with such constants
>>> defined in header files. I insist to avoid #ifdefs in C2 code. I really
>>> don't like the code with #ifdef in unsafe.cpp but I can live with it.
>>
>> If you insist then we may as well do it all the same way. Better to be
>> consistent.
>>
>> My apologies Goetz for wasting your time going back and forth on this.
>>
>> That aside I have a further concern with this IRIW support - it is
>> incomplete as there is no C1 support, as PPC64 isn't using client. If
>> this is going on then we (which probably means the Oracle 'we') need to
>> add the missing C1 code.
>>
>> David
>> -----
>>
>>> Vladimir
>>>
>>>>
>>>> David
>>>>
>>>>> #ifdef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = true;
>>>>> #else
>>>>> const bool support_IRIW_for_not_multiple_copy_atomic_cpu = false;
>>>>> #endif
>>>>>
>>>>> or support_IRIW_for_not_multiple_copy_atomic_cpu, whatever
>>>>>
>>>>> and then:
>>>>>
>>>>> #define GET_FIELD_VOLATILE(obj, offset, type_name, v) \
>>>>> oop p = JNIHandles::resolve(obj); \
>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu)
>>>>> OrderAccess::fence(); \
>>>>> volatile type_name v = OrderAccess::load_acquire((volatile
>>>>> type_name*)index_oop_from_field_offset_long(p, offset));
>>>>>
>>>>> And:
>>>>>
>>>>> + if (support_IRIW_for_not_multiple_copy_atomic_cpu &&
>>>>> field->is_volatile()) {
>>>>> + insert_mem_bar(Op_MemBarVolatile); // StoreLoad barrier
>>>>> + }
>>>>>
>>>>> And so on. The comments will be needed only in globalDefinitions.hpp
>>>>>
>>>>> The code in parse1.cpp could be put on one line:
>>>>>
>>>>> + if (wrote_final() PPC64_ONLY( || (wrote_volatile() &&
>>>>> method()->is_initializer()) )) {
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 1/15/14 9:25 PM, David Holmes wrote:
>>>>>> On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> I updated the webrev:
>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>
>>>>>>> - I removed the IRIW example in parse3.cpp
>>>>>>> - I adapted the comments not to point to that comment, and to
>>>>>>> reflect the new flagging. Also I mention that we support the
>>>>>>> volatile constructor issue, but that it's not standard.
>>>>>>> - I protected issuing the barrier for the constructor by PPC64.
>>>>>>> I also think it's better to separate these this way.
>>>>>>
>>>>>> Sorry if I wasn't clear but I'd like the wrote_volatile field
>>>>>> declaration and all uses to be guarded by ifdef PPC64 too
>>>>>> please.
>>>>>>
>>>>>> One nit I missed before. In src/share/vm/opto/library_call.cpp this
>>>>>> comment doesn't make much sense to me and refers to
>>>>>> ppc specific stuff in a shared file:
>>>>>>
>>>>>> if (is_volatile) {
>>>>>> ! if (!is_store) {
>>>>>> insert_mem_bar(Op_MemBarAcquire);
>>>>>> ! } else {
>>>>>> ! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
>>>>>> ! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
>>>>>> insert_mem_bar(Op_MemBarVolatile);
>>>>>> + #endif
>>>>>> + }
>>>>>>
>>>>>> I don't think the comment is needed.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> Thanks for your comments!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Goetz.
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>> Sent: Mittwoch, 15. Januar 2014 01:55
>>>>>>> To: Lindenmaier, Goetz
>>>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net';
>>>>>>> 'hotspot-dev at openjdk.java.net'
>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>> Independent Reads of Independent Writes
>>>>>>>
>>>>>>> Hi Goetz,
>>>>>>>
>>>>>>> Sorry for the delay in getting back to this.
>>>>>>>
>>>>>>> The general changes to the volatile barriers to support IRIW are okay.
>>>>>>> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
>>>>>>> specifically it is
>>>>>>> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
>>>>>>> the commentary excessive, particularly for shared code. In particular
>>>>>>> the IRIW example in parse3.cpp - it seems a strange place to give the
>>>>>>> explanation and I don't think we need it to that level of detail.
>>>>>>> Seems
>>>>>>> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>>>>>>>
>>>>>>> The changes related to volatile writes in the constructor, as
>>>>>>> discussed
>>>>>>> are not required by the Java Memory Model. If you want to keep these
>>>>>>> then I think they should all be guarded with PPC64 because it is not
>>>>>>> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
>>>>>>> PPC64 porters.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I updated this webrev. I detected a small flaw I made when editing
>>>>>>>> this version.
>>>>>>>> The #endif in line 322, parse3.cpp was in the wrong line.
>>>>>>>> I also based the webrev on the latest version of the stage repo.
>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz.
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Lindenmaier, Goetz
>>>>>>>> Sent: Freitag, 20. Dezember 2013 13:47
>>>>>>>> To: David Holmes
>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>>> So we can at least undo #4 now we have established those tests were
>>>>>>>>> not
>>>>>>>>> required to pass.
>>>>>>>> We would prefer if we could keep this in. We want to avoid that it's
>>>>>>>> blamed on the VM if java programs are failing on PPC after they
>>>>>>>> worked
>>>>>>>> on x86. To clearly mark it as overfulfilling the spec I would guard
>>>>>>>> it by
>>>>>>>> a flag as proposed. But if you insist I will remove it. Also, this
>>>>>>>> part is
>>>>>>>> not that performance relevant.
>>>>>>>>
>>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>>> think
>>>>>>>> I added a compile-time guard in this new webrev:
>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>>>>>>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>>>>>>>> several double negations I don't like, (#ifNdef
>>>>>>>> CPU_NOT_MULTIPLE_COPY_ATOMIC)
>>>>>>>> but this way I only have to change the ppc platform.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz
>>>>>>>>
>>>>>>>> P.S.: I will also be available over the Christmas period.
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>> Sent: Freitag, 20. Dezember 2013 05:58
>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>
>>>>>>>> Sorry for the delay, it takes a while to catch up after two weeks
>>>>>>>> vacation :) Next vacation (ie next two weeks) I'll continue to check
>>>>>>>> emails.
>>>>>>>>
>>>>>>>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> ok, I understand the tests are wrong. It's good this issue is
>>>>>>>>> settled.
>>>>>>>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>>>>>>>
>>>>>>>>> About our change: David, the causality is the other way round.
>>>>>>>>> The change is about IRIW.
>>>>>>>>> 1. To pass IRIW, we must use sync instructions before loads.
>>>>>>>>
>>>>>>>> This is the part I still have some question marks over as the
>>>>>>>> implications are not nice for performance on non-TSO platforms.
>>>>>>>> But I'm
>>>>>>>> no further along in processing that paper I'm afraid.
>>>>>>>>
>>>>>>>>> 2. If we do syncs before loads, we don't need to do them after
>>>>>>>>> stores.
>>>>>>>>> 3. If we don't do them after stores, we fail the volatile
>>>>>>>>> constructor tests.
>>>>>>>>> 4. So finally we added them again at the end of the constructor
>>>>>>>>> after stores
>>>>>>>>> to pass the volatile constructor tests.
>>>>>>>>
>>>>>>>> So we can at least undo #4 now we have established those tests
>>>>>>>> were not
>>>>>>>> required to pass.
>>>>>>>>
>>>>>>>>> We originally passed the constructor tests because the ppc memory
>>>>>>>>> order
>>>>>>>>> instructions are not as find-granular as the
>>>>>>>>> operations in the IR. MemBarVolatile is specified as StoreLoad.
>>>>>>>>> The only instruction
>>>>>>>>> on PPC that does StoreLoad is sync. But sync also does StoreStore,
>>>>>>>>> therefore the
>>>>>>>>> MemBarVolatile after the store fixes the constructor tests. The
>>>>>>>>> proper representation
>>>>>>>>> of the fix in the IR would be adding a MemBarStoreStore. But now
>>>>>>>>> it's pointless
>>>>>>>>> anyways.
>>>>>>>>>
>>>>>>>>>> I'm not happy with the ifdef approach but I won't block it.
>>>>>>>>> I'd be happy to add a property
>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>
>>>>>>>> A compile-time guard (ifdef) would be better than a runtime one I
>>>>>>>> think
>>>>>>>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>>>>>>>> based not architecture based) as that will allows for turning this
>>>>>>>> on/off for any architecture for testing purposes.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> or the like to guard the customization. I'd like that much better.
>>>>>>>>> Or also
>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Goetz.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>> Sent: Donnerstag, 28. November 2013 00:34
>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>
>>>>>>>>> TL;DR version:
>>>>>>>>>
>>>>>>>>> Discussion on the c-i list has now confirmed that a
>>>>>>>>> constructor-barrier
>>>>>>>>> for volatiles is not required as part of the JMM specification. It
>>>>>>>>> *may*
>>>>>>>>> be required in an implementation that doesn't pre-zero memory to
>>>>>>>>> ensure
>>>>>>>>> you can't see uninitialized fields. So the tests for this are
>>>>>>>>> invalid
>>>>>>>>> and this part of the patch is not needed in general (ppc64 may
>>>>>>>>> need it
>>>>>>>>> due to other factors).
>>>>>>>>>
>>>>>>>>> Re: "multiple copy atomicity" - first thanks for correcting the
>>>>>>>>> term :)
>>>>>>>>> Second thanks for the reference to that paper! For reference:
>>>>>>>>>
>>>>>>>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>>>>>>>> complex interconnect) does not guarantee that a write becomes
>>>>>>>>> visible to
>>>>>>>>> all other hardware threads at the same time point; these
>>>>>>>>> architectures
>>>>>>>>> are not multiple-copy atomic."
>>>>>>>>>
>>>>>>>>> This is the visibility issue that I referred to and affects both
>>>>>>>>> ARM and
>>>>>>>>> PPC. But of course it is normally handled by using suitable barriers
>>>>>>>>> after the stores that need to be visible. I think the crux of the
>>>>>>>>> current issue is what you wrote below:
>>>>>>>>>
>>>>>>>>> > The fixes for the constructor issue are only needed because we
>>>>>>>>> > remove the sync instruction from behind stores
>>>>>>>>> (parse3.cpp:320)
>>>>>>>>> > and place it before loads.
>>>>>>>>>
>>>>>>>>> I hadn't grasped this part. Obviously if you fail to do the sync
>>>>>>>>> after
>>>>>>>>> the store then you have to do something around the loads to get the
>>>>>>>>> same
>>>>>>>>> results! I still don't know what lead you to the conclusion that the
>>>>>>>>> only way to fix the IRIW issue was to put the fence before the
>>>>>>>>> load -
>>>>>>>>> maybe when I get the chance to read that paper in full it will be
>>>>>>>>> clearer.
>>>>>>>>>
>>>>>>>>> So ... the basic problem is that the current structure in the VM has
>>>>>>>>> hard-wired one choice of how to get the right semantics for volatile
>>>>>>>>> variables. You now want to customize that but not all the requisite
>>>>>>>>> hooks are present. It would be better if volatile_load and
>>>>>>>>> volatile_store were factored out so that they could be
>>>>>>>>> implemented as
>>>>>>>>> desired per-platform. Alternatively there could be pre- and post-
>>>>>>>>> hooks
>>>>>>>>> that could then be customized per platform. Otherwise you need
>>>>>>>>> platform-specific ifdef's to handle it as per your patch.
>>>>>>>>>
>>>>>>>>> I'm not happy with the ifdef approach but I won't block it. I think
>>>>>>>>> this
>>>>>>>>> is an area where a lot of clean up is needed in the VM. The barrier
>>>>>>>>> abstractions are a confused mess in my opinion.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>
>>>>>>>>>> I did not yet add the
>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>> or
>>>>>>>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>>>>>>>> to reduce #defined, as I got no further comment on that.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>>>>>>>> I feel not in the position to contribute substantially.
>>>>>>>>>>
>>>>>>>>>> But we would like to pass the torture test suite as we consider
>>>>>>>>>> this a substantial task in implementing a PPC port. Also we think
>>>>>>>>>> both tests show behavior a programmer would expect. It's bad if
>>>>>>>>>> Java code runs fine on the more common x86 platform, and then
>>>>>>>>>> fails on ppc. This will always first be blamed on the VM.
>>>>>>>>>>
>>>>>>>>>> The fixes for the constructor issue are only needed because we
>>>>>>>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>>>>>>>> and place it before loads. Then there is no sync between volatile
>>>>>>>>>> store
>>>>>>>>>> and publishing the object. So we add it again in this one case
>>>>>>>>>> (volatile store in constructor).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> @David
>>>>>>>>>>>> Sure. There also is no solution as you require for the
>>>>>>>>>>>> taskqueue problem yet,
>>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>>> continuous.
>>>>>>>>>> That's not true, we did a lot of investigation and testing on this
>>>>>>>>>> issue.
>>>>>>>>>> And we came up with a solution we consider the best possible. If
>>>>>>>>>> you
>>>>>>>>>> have objections, you should at least give the draft of a better
>>>>>>>>>> solution,
>>>>>>>>>> we would volunteer to implement and test it.
>>>>>>>>>> Similarly, we invested time in fixing the concurrency torture
>>>>>>>>>> issues.
>>>>>>>>>>
>>>>>>>>>> @David
>>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term
>>>>>>>>>>> and
>>>>>>>>>>> can't find any reference to it.
>>>>>>>>>> We learned about this reading "A Tutorial Introduction to the
>>>>>>>>>> ARM and
>>>>>>>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>>>>>>>> Peter Sewell, which is cited in "Correct and Efficient
>>>>>>>>>> Work-Stealing for
>>>>>>>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>>>>>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the
>>>>>>>>>> taskqueue problem.
>>>>>>>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>>>>>>>
>>>>>>>>>> I was wrong in one thing, it's called multiple copy atomicity, I
>>>>>>>>>> used 'read'
>>>>>>>>>> instead. Sorry for that. (I also fixed that in the method name
>>>>>>>>>> above).
>>>>>>>>>>
>>>>>>>>>> Best regards and thanks for all your involvements,
>>>>>>>>>> Goetz.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>> Cc: 'hotspot-dev at openjdk.java.net';
>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>
>>>>>>>>>> Hi Goetz,
>>>>>>>>>>
>>>>>>>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>> Hi David,
>>>>>>>>>>>
>>>>>>>>>>> -- Volatile in constuctor
>>>>>>>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>> We see them on PPC64. Our test machines have typically 8-32
>>>>>>>>>>> processors
>>>>>>>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks
>>>>>>>>>>> Aleksey!)
>>>>>>>>>>
>>>>>>>>>> And see follow ups - the tests are invalid.
>>>>>>>>>>
>>>>>>>>>>> -- IRIW issue
>>>>>>>>>>>> I can not possibly answer to the necessary level of detail with
>>>>>>>>>>>> a few
>>>>>>>>>>>> moments thought.
>>>>>>>>>>> Sure. There also is no solution as you require for the taskqueue
>>>>>>>>>>> problem yet,
>>>>>>>>>>> and that's being discussed now for almost a year.
>>>>>>>>>>
>>>>>>>>>> It may have started a year ago but work on it has hardly been
>>>>>>>>>> continuous.
>>>>>>>>>>
>>>>>>>>>>>> You are implying there is a problem here that will
>>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>>>> different?)
>>>>>>>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore
>>>>>>>>>>> I contributed a
>>>>>>>>>>> solution with the #defines, and that's correct for all, but not
>>>>>>>>>>> nice, I admit.
>>>>>>>>>>> (I don't really know about ARM, though).
>>>>>>>>>>> So if I can write down a nicer solution testing for methods that
>>>>>>>>>>> are evaluated
>>>>>>>>>>> by the C-compiler I'm happy.
>>>>>>>>>>>
>>>>>>>>>>> The problem is not that IRIW is not handled by the JMM, the
>>>>>>>>>>> problem
>>>>>>>>>>> is that
>>>>>>>>>>> store
>>>>>>>>>>> sync
>>>>>>>>>>> does not assure multiple-read-atomicity,
>>>>>>>>>>> only
>>>>>>>>>>> sync
>>>>>>>>>>> load
>>>>>>>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>>>>>>>> pass that test.
>>>>>>>>>>
>>>>>>>>>> What is "multiple-read-atomicity"? I'm not familiar with the
>>>>>>>>>> term and
>>>>>>>>>> can't find any reference to it.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>> The JMM is fine. And
>>>>>>>>>>> store
>>>>>>>>>>> MemBarVolatile
>>>>>>>>>>> is fine on x86, sparc etc. as there exist assembler instructions
>>>>>>>>>>> that
>>>>>>>>>>> do what is required.
>>>>>>>>>>>
>>>>>>>>>>> So if you are off soon, please let's come to a solution that
>>>>>>>>>>> might be improvable in the way it's implemented, but that
>>>>>>>>>>> allows us to implement a correct PPC64 port.
>>>>>>>>>>>
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Goetz.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>>>>>>>> To: Lindenmaier, Goetz
>>>>>>>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich';
>>>>>>>>>>> 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>
>>>>>>>>>>> Hi Goetz,
>>>>>>>>>>>
>>>>>>>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>> Hi everybody,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks a lot for the detailed reviews!
>>>>>>>>>>>> I'll try to answer to all in one mail.
>>>>>>>>>>>>
>>>>>>>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM
>>>>>>>>>>>>> to occur before the reference is assigned;
>>>>>>>>>>>> We don't think it's correct if we omit the barrier after
>>>>>>>>>>>> initializing
>>>>>>>>>>>> a volatile field. Previously, we discussed this with Aleksey
>>>>>>>>>>>> Shipilev
>>>>>>>>>>>> and Doug Lea, and they agreed.
>>>>>>>>>>>> Also, concurrency torture tests
>>>>>>>>>>>> LongVolatileTest
>>>>>>>>>>>> AtomicIntegerInitialValueTest
>>>>>>>>>>>> will fail.
>>>>>>>>>>>> (In addition, observing 0 instead of the inital value of a
>>>>>>>>>>>> volatile field would be
>>>>>>>>>>>> very counter-intuitive for Java programmers, especially in
>>>>>>>>>>>> AtomicInteger.)
>>>>>>>>>>>
>>>>>>>>>>> The affects of unsafe publication are always surprising -
>>>>>>>>>>> volatiles do
>>>>>>>>>>> not add anything special here. AFAIK there is nothing in the JMM
>>>>>>>>>>> that
>>>>>>>>>>> requires the constructor barrier - discussions with Doug and
>>>>>>>>>>> Aleksey
>>>>>>>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>>>>>>>> missing constructor barrier.
>>>>>>>>>>>
>>>>>>>>>>>>> proposed for PPC64 is to make volatile reads extremely
>>>>>>>>>>>>> heavyweight
>>>>>>>>>>>> Yes, it costs measurable performance. But else it is wrong. We
>>>>>>>>>>>> don't
>>>>>>>>>>>> see a way to implement this cheaper.
>>>>>>>>>>>>
>>>>>>>>>>>>> - these algorithms should be expressed using the correct
>>>>>>>>>>>>> OrderAccess operations
>>>>>>>>>>>> Basically, I agree on this. But you also have to take into
>>>>>>>>>>>> account
>>>>>>>>>>>> that due to the different memory ordering instructions on
>>>>>>>>>>>> different platforms
>>>>>>>>>>>> just implementing something empty is not sufficient.
>>>>>>>>>>>> An example:
>>>>>>>>>>>> MemBarRelease // means LoadStore, StoreStore barrier
>>>>>>>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>>>>>>>> If these are consecutively in the code, sparc code looks like
>>>>>>>>>>>> this:
>>>>>>>>>>>> MemBarRelease --> membar(Assembler::LoadStore |
>>>>>>>>>>>> Assembler::StoreStore)
>>>>>>>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>>>>>>>> Just doing what is required.
>>>>>>>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>>>>>>>> fine grained operations:
>>>>>>>>>>>> MemBarRelease --> lwsync // Doing LoadStore,
>>>>>>>>>>>> StoreStore, LoadLoad
>>>>>>>>>>>> MemBarVolatile --> sync // // Doing LoadStore,
>>>>>>>>>>>> StoreStore, LoadLoad, StoreLoad
>>>>>>>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations
>>>>>>>>>>>> are more (too) powerful,
>>>>>>>>>>>> I need an additional optimization that removes the lwsync. I
>>>>>>>>>>>> can not implement
>>>>>>>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>>>>>>>
>>>>>>>>>>>> Back to the IRIW problem. I think here we have a comparable
>>>>>>>>>>>> issue.
>>>>>>>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the
>>>>>>>>>>>> read
>>>>>>>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>>>>>>>
>>>>>>>>>>>> I would propose to guard the code by
>>>>>>>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>>>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>>>>>>>> Else, David, how would you propose to implement this platform
>>>>>>>>>>>> independent?
>>>>>>>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>>>>>>>
>>>>>>>>>>> I can not possibly answer to the necessary level of detail with a
>>>>>>>>>>> few
>>>>>>>>>>> moments thought. You are implying there is a problem here that
>>>>>>>>>>> will
>>>>>>>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>>>>>>>> different?) and I can not take that on face value at the
>>>>>>>>>>> moment. The
>>>>>>>>>>> only reason I can see IRIW not being handled by the JMM
>>>>>>>>>>> requirements for
>>>>>>>>>>> volatile accesses is if there are global visibility issues that
>>>>>>>>>>> are not
>>>>>>>>>>> addressed - but even then I would expect heavy barriers at the
>>>>>>>>>>> store
>>>>>>>>>>> would deal with that, not at the load. (This situation reminds me
>>>>>>>>>>> of the
>>>>>>>>>>> need for read-barriers on Alpha architecture due to the use of
>>>>>>>>>>> software
>>>>>>>>>>> cache-coherency rather than hardware cache-coherency - but we
>>>>>>>>>>> don't have
>>>>>>>>>>> that on ppc!)
>>>>>>>>>>>
>>>>>>>>>>> Sorry - There is no quick resolution here and in a couple of days
>>>>>>>>>>> I will
>>>>>>>>>>> be heading out on vacation for two weeks.
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Other ports:
>>>>>>>>>>>> The IRIW issue requires at least 3 processors to be relevant, so
>>>>>>>>>>>> it might
>>>>>>>>>>>> not happen on small machines. But I can use PPC_ONLY instead
>>>>>>>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of
>>>>>>>>>>>> them).
>>>>>>>>>>>>
>>>>>>>>>>>> -- MemBarStoreStore after initialization
>>>>>>>>>>>> I agree we should not change it in the ppc port. If you wish, I
>>>>>>>>>>>> can
>>>>>>>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>>>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>>>>>>>> To: Vladimir Kozlov
>>>>>>>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net';
>>>>>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'
>>>>>>>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of
>>>>>>>>>>>> Independent Reads of Independent Writes
>>>>>>>>>>>>
>>>>>>>>>>>> Okay this is my second attempt at answering this in a reasonable
>>>>>>>>>>>> way :)
>>>>>>>>>>>>
>>>>>>>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>>>>>>>
>>>>>>>>>>>> From what I understand what we see here is an attempt to
>>>>>>>>>>>> fix an
>>>>>>>>>>>> existing issue with the implementation of volatiles so that the
>>>>>>>>>>>> IRIW
>>>>>>>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>>>>>>>> volatile reads extremely heavyweight by adding a fence() when
>>>>>>>>>>>> doing the
>>>>>>>>>>>> load.
>>>>>>>>>>>>
>>>>>>>>>>>> Now if this was purely handled in ppc64 source code then I
>>>>>>>>>>>> would be
>>>>>>>>>>>> happy to let them do whatever they like (surely this kills
>>>>>>>>>>>> performance
>>>>>>>>>>>> though!). But I do not agree with the changes to the shared code
>>>>>>>>>>>> that
>>>>>>>>>>>> allow this solution to be implemented - even with PPC64_ONLY
>>>>>>>>>>>> this is
>>>>>>>>>>>> polluting the shared code. My concern is similar to what I said
>>>>>>>>>>>> with the
>>>>>>>>>>>> taskQueue changes - these algorithms should be expressed using
>>>>>>>>>>>> the
>>>>>>>>>>>> correct OrderAccess operations to guarantee the desired
>>>>>>>>>>>> properties
>>>>>>>>>>>> independent of architecture. If such a "barrier" is not needed
>>>>>>>>>>>> on a
>>>>>>>>>>>> given architecture then the implementation in OrderAccess should
>>>>>>>>>>>> reduce
>>>>>>>>>>>> to a no-op.
>>>>>>>>>>>>
>>>>>>>>>>>> And as Vitaly points out the constructor barriers are not needed
>>>>>>>>>>>> under
>>>>>>>>>>>> the JMM.
>>>>>>>>>>>>
>>>>>>>>>>>>> I am fine with suggested changes because you did not change our
>>>>>>>>>>>>> current
>>>>>>>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>>>>>>>> But may be it should be done using more general query which
>>>>>>>>>>>>> is set
>>>>>>>>>>>>> depending on platform:
>>>>>>>>>>>>>
>>>>>>>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>>>>>>>
>>>>>>>>>>>>> or similar to what we use now:
>>>>>>>>>>>>>
>>>>>>>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>>>>>>>
>>>>>>>>>>>> Every platform has to support IRIW this is simply part of the
>>>>>>>>>>>> Java
>>>>>>>>>>>> Memory Model, there should not be any need to call this out
>>>>>>>>>>>> explicitly
>>>>>>>>>>>> like this.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there some subtlety of the hardware I am missing here? Are
>>>>>>>>>>>> there
>>>>>>>>>>>> visibility issues beyond the ordering constraints that the JMM
>>>>>>>>>>>> defines?
>>>>>>>>>>>>> From what I understand our ppc port is also affected.
>>>>>>>>>>>>> David?
>>>>>>>>>>>>
>>>>>>>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>> In library_call.cpp can you add {}? New comment should be
>>>>>>>>>>>>> inside else {}.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you should make _wrote_volatile field not ppc64
>>>>>>>>>>>>> specific which
>>>>>>>>>>>>> will be set to 'true' only on ppc64. Then you will not need
>>>>>>>>>>>>> PPC64_ONLY()
>>>>>>>>>>>>> except in do_put_xxx() where it is set to true. Too many
>>>>>>>>>>>>> #ifdefs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (is_vol) {
>>>>>>>>>>>>> // See comment in do_get_xxx().
>>>>>>>>>>>>> #ifndef PPC64
>>>>>>>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>>>>>>>> #else
>>>>>>>>>>>>> if (is_field) {
>>>>>>>>>>>>> // Add MemBarRelease for constructors which write
>>>>>>>>>>>>> volatile field
>>>>>>>>>>>>> (PPC64).
>>>>>>>>>>>>> set_wrote_volatile(true);
>>>>>>>>>>>>> }
>>>>>>>>>>>>> #endif
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I preprared a webrev with fixes for PPC for the
>>>>>>>>>>>>>> VolatileIRIWTest of
>>>>>>>>>>>>>> the torture test suite:
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>> volatile x=0, y=0
>>>>>>>>>>>>>> __________ __________ __________ __________
>>>>>>>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> write(x=1) read(x) write(y=1) read(y)
>>>>>>>>>>>>>> read(y) read(x)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Solution: This example requires multiple-copy-atomicity. This
>>>>>>>>>>>>>> is only
>>>>>>>>>>>>>> assured by the sync instruction and if it is executed in the
>>>>>>>>>>>>>> threads
>>>>>>>>>>>>>> doing the loads. Thus we implement volatile read as
>>>>>>>>>>>>>> sync-load-acquire
>>>>>>>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This addresses a similar issue as fix "8012144: multiple
>>>>>>>>>>>>>> SIGSEGVs
>>>>>>>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Further this change contains a fix that assures that volatile
>>>>>>>>>>>>>> fields
>>>>>>>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>>>>>>>> published.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looking at the code, we found a MemBarRelease that to us,
>>>>>>>>>>>>>> seems too
>>>>>>>>>>>>>> strong.
>>>>>>>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should
>>>>>>>>>>>>>> suffice.
>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please review and test this change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Goetz.
>>>>>>>>>>>>>>
More information about the hotspot-dev
mailing list