RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
David Holmes
david.holmes at oracle.com
Wed Jan 15 21:25:36 PST 2014
On 16/01/2014 1:28 AM, Lindenmaier, Goetz wrote:
> Hi David,
>
> I updated the webrev:
> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>
> - I removed the IRIW example in parse3.cpp
> - I adapted the comments not to point to that comment, and to
> reflect the new flagging. Also I mention that we support the
> volatile constructor issue, but that it's not standard.
> - I protected issuing the barrier for the constructor by PPC64.
> I also think it's better to separate these this way.
Sorry if I wasn't clear but I'd like the wrote_volatile field
declaration and all uses to be guarded by ifdef PPC64 too please.
One nit I missed before. In src/share/vm/opto/library_call.cpp this
comment doesn't make much sense to me and refers to ppc specific stuff
in a shared file:
if (is_volatile) {
! if (!is_store) {
insert_mem_bar(Op_MemBarAcquire);
! } else {
! #ifndef CPU_NOT_MULTIPLE_COPY_ATOMIC
! // Changed volatiles/Unsafe: lwsync-store, sync-load-acquire.
insert_mem_bar(Op_MemBarVolatile);
+ #endif
+ }
I don't think the comment is needed.
Thanks,
David
> Thanks for your comments!
>
> Best regards,
> Goetz.
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Mittwoch, 15. Januar 2014 01:55
> To: Lindenmaier, Goetz
> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>
> Hi Goetz,
>
> Sorry for the delay in getting back to this.
>
> The general changes to the volatile barriers to support IRIW are okay.
> The guard of CPU_NOT_MULTIPLE_COPY_ATOMIC works for this (though more
> specifically it is
> not-multiple-copy-atomic-and-chooses-to-support-IRIW). I find much of
> the commentary excessive, particularly for shared code. In particular
> the IRIW example in parse3.cpp - it seems a strange place to give the
> explanation and I don't think we need it to that level of detail. Seems
> to me that is present is globalDefinitions_ppc.hpp is quite adequate.
>
> The changes related to volatile writes in the constructor, as discussed
> are not required by the Java Memory Model. If you want to keep these
> then I think they should all be guarded with PPC64 because it is not
> related to CPU_NOT_MULTIPLE_COPY_ATOMIC but a choice being made by the
> PPC64 porters.
>
> Thanks,
> David
>
> On 14/01/2014 11:52 PM, Lindenmaier, Goetz wrote:
>> Hi,
>>
>> I updated this webrev. I detected a small flaw I made when editing this version.
>> The #endif in line 322, parse3.cpp was in the wrong line.
>> I also based the webrev on the latest version of the stage repo.
>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>>
>> Best regards,
>> Goetz.
>>
>> -----Original Message-----
>> From: Lindenmaier, Goetz
>> Sent: Freitag, 20. Dezember 2013 13:47
>> To: David Holmes
>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>> Subject: RE: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> Hi David,
>>
>>> So we can at least undo #4 now we have established those tests were not
>>> required to pass.
>> We would prefer if we could keep this in. We want to avoid that it's
>> blamed on the VM if java programs are failing on PPC after they worked
>> on x86. To clearly mark it as overfulfilling the spec I would guard it by
>> a flag as proposed. But if you insist I will remove it. Also, this part is
>> not that performance relevant.
>>
>>> A compile-time guard (ifdef) would be better than a runtime one I think
>> I added a compile-time guard in this new webrev:
>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-1-raw/
>> I've chosen CPU_NOT_MULTIPLE_COPY_ATOMIC. This introduces
>> several double negations I don't like, (#ifNdef CPU_NOT_MULTIPLE_COPY_ATOMIC)
>> but this way I only have to change the ppc platform.
>>
>> Best regards,
>> Goetz
>>
>> P.S.: I will also be available over the Christmas period.
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Freitag, 20. Dezember 2013 05:58
>> To: Lindenmaier, Goetz
>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>
>> Sorry for the delay, it takes a while to catch up after two weeks
>> vacation :) Next vacation (ie next two weeks) I'll continue to check emails.
>>
>> On 2/12/2013 6:33 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> ok, I understand the tests are wrong. It's good this issue is settled.
>>> Thanks Aleksey and Andreas for going into the details of the proof!
>>>
>>> About our change: David, the causality is the other way round.
>>> The change is about IRIW.
>>> 1. To pass IRIW, we must use sync instructions before loads.
>>
>> This is the part I still have some question marks over as the
>> implications are not nice for performance on non-TSO platforms. But I'm
>> no further along in processing that paper I'm afraid.
>>
>>> 2. If we do syncs before loads, we don't need to do them after stores.
>>> 3. If we don't do them after stores, we fail the volatile constructor tests.
>>> 4. So finally we added them again at the end of the constructor after stores
>>> to pass the volatile constructor tests.
>>
>> So we can at least undo #4 now we have established those tests were not
>> required to pass.
>>
>>> We originally passed the constructor tests because the ppc memory order
>>> instructions are not as find-granular as the
>>> operations in the IR. MemBarVolatile is specified as StoreLoad. The only instruction
>>> on PPC that does StoreLoad is sync. But sync also does StoreStore, therefore the
>>> MemBarVolatile after the store fixes the constructor tests. The proper representation
>>> of the fix in the IR would be adding a MemBarStoreStore. But now it's pointless
>>> anyways.
>>>
>>>> I'm not happy with the ifdef approach but I won't block it.
>>> I'd be happy to add a property
>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>
>> A compile-time guard (ifdef) would be better than a runtime one I think
>> - similar to the SUPPORTS_NATIVE_CX8 optimization (something semantic
>> based not architecture based) as that will allows for turning this
>> on/off for any architecture for testing purposes.
>>
>> Thanks,
>> David
>>
>>> or the like to guard the customization. I'd like that much better. Or also
>>> OrderAccess::needs_support_iriw_ordering()
>>> VM_Version::needs_support_iriw_ordering()
>>>
>>>
>>> Best regards,
>>> Goetz.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Donnerstag, 28. November 2013 00:34
>>> To: Lindenmaier, Goetz
>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>
>>> TL;DR version:
>>>
>>> Discussion on the c-i list has now confirmed that a constructor-barrier
>>> for volatiles is not required as part of the JMM specification. It *may*
>>> be required in an implementation that doesn't pre-zero memory to ensure
>>> you can't see uninitialized fields. So the tests for this are invalid
>>> and this part of the patch is not needed in general (ppc64 may need it
>>> due to other factors).
>>>
>>> Re: "multiple copy atomicity" - first thanks for correcting the term :)
>>> Second thanks for the reference to that paper! For reference:
>>>
>>> "The memory system (perhaps involving a hierarchy of buffers and a
>>> complex interconnect) does not guarantee that a write becomes visible to
>>> all other hardware threads at the same time point; these architectures
>>> are not multiple-copy atomic."
>>>
>>> This is the visibility issue that I referred to and affects both ARM and
>>> PPC. But of course it is normally handled by using suitable barriers
>>> after the stores that need to be visible. I think the crux of the
>>> current issue is what you wrote below:
>>>
>>> > The fixes for the constructor issue are only needed because we
>>> > remove the sync instruction from behind stores (parse3.cpp:320)
>>> > and place it before loads.
>>>
>>> I hadn't grasped this part. Obviously if you fail to do the sync after
>>> the store then you have to do something around the loads to get the same
>>> results! I still don't know what lead you to the conclusion that the
>>> only way to fix the IRIW issue was to put the fence before the load -
>>> maybe when I get the chance to read that paper in full it will be clearer.
>>>
>>> So ... the basic problem is that the current structure in the VM has
>>> hard-wired one choice of how to get the right semantics for volatile
>>> variables. You now want to customize that but not all the requisite
>>> hooks are present. It would be better if volatile_load and
>>> volatile_store were factored out so that they could be implemented as
>>> desired per-platform. Alternatively there could be pre- and post- hooks
>>> that could then be customized per platform. Otherwise you need
>>> platform-specific ifdef's to handle it as per your patch.
>>>
>>> I'm not happy with the ifdef approach but I won't block it. I think this
>>> is an area where a lot of clean up is needed in the VM. The barrier
>>> abstractions are a confused mess in my opinion.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>> On 28/11/2013 3:15 AM, Lindenmaier, Goetz wrote:
>>>> Hi,
>>>>
>>>> I updated the webrev to fix the issues mentioned by Vladimir:
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>
>>>> I did not yet add the
>>>> OrderAccess::needs_support_iriw_ordering()
>>>> VM_Version::needs_support_iriw_ordering()
>>>> or
>>>> OrderAccess::cpu_is_multiple_copy_atomic()
>>>> to reduce #defined, as I got no further comment on that.
>>>>
>>>>
>>>> WRT to the validity of the tests and the interpretation of the JMM
>>>> I feel not in the position to contribute substantially.
>>>>
>>>> But we would like to pass the torture test suite as we consider
>>>> this a substantial task in implementing a PPC port. Also we think
>>>> both tests show behavior a programmer would expect. It's bad if
>>>> Java code runs fine on the more common x86 platform, and then
>>>> fails on ppc. This will always first be blamed on the VM.
>>>>
>>>> The fixes for the constructor issue are only needed because we
>>>> remove the sync instruction from behind stores (parse3.cpp:320)
>>>> and place it before loads. Then there is no sync between volatile store
>>>> and publishing the object. So we add it again in this one case
>>>> (volatile store in constructor).
>>>>
>>>>
>>>> @David
>>>>>> Sure. There also is no solution as you require for the taskqueue problem yet,
>>>>>> and that's being discussed now for almost a year.
>>>>> It may have started a year ago but work on it has hardly been continuous.
>>>> That's not true, we did a lot of investigation and testing on this issue.
>>>> And we came up with a solution we consider the best possible. If you
>>>> have objections, you should at least give the draft of a better solution,
>>>> we would volunteer to implement and test it.
>>>> Similarly, we invested time in fixing the concurrency torture issues.
>>>>
>>>> @David
>>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and
>>>>> can't find any reference to it.
>>>> We learned about this reading "A Tutorial Introduction to the ARM and
>>>> POWER Relaxed Memory Models" by Luc Maranget, Susmit Sarkar and
>>>> Peter Sewell, which is cited in "Correct and Efficient Work-Stealing for
>>>> Weak Memory Models" by Nhat Minh Lê, Antoniu Pop, Albert Cohen
>>>> and Francesco Zappa Nardelli (PPoPP `13) when analysing the taskqueue problem.
>>>> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
>>>>
>>>> I was wrong in one thing, it's called multiple copy atomicity, I used 'read'
>>>> instead. Sorry for that. (I also fixed that in the method name above).
>>>>
>>>> Best regards and thanks for all your involvements,
>>>> Goetz.
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Mittwoch, 27. November 2013 12:53
>>>> To: Lindenmaier, Goetz
>>>> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>
>>>> Hi Goetz,
>>>>
>>>> On 26/11/2013 10:51 PM, Lindenmaier, Goetz wrote:
>>>>> Hi David,
>>>>>
>>>>> -- Volatile in constuctor
>>>>>> AFAIK we have not seen those tests fail due to a
>>>>>> missing constructor barrier.
>>>>> We see them on PPC64. Our test machines have typically 8-32 processors
>>>>> and are Power 5-7. But see also Aleksey's mail. (Thanks Aleksey!)
>>>>
>>>> And see follow ups - the tests are invalid.
>>>>
>>>>> -- IRIW issue
>>>>>> I can not possibly answer to the necessary level of detail with a few
>>>>>> moments thought.
>>>>> Sure. There also is no solution as you require for the taskqueue problem yet,
>>>>> and that's being discussed now for almost a year.
>>>>
>>>> It may have started a year ago but work on it has hardly been continuous.
>>>>
>>>>>> You are implying there is a problem here that will
>>>>>> impact numerous platforms (unless you can tell me why ppc is so different?)
>>>>> No, only PPC does not have 'multiple-read-atomicity'. Therefore I contributed a
>>>>> solution with the #defines, and that's correct for all, but not nice, I admit.
>>>>> (I don't really know about ARM, though).
>>>>> So if I can write down a nicer solution testing for methods that are evaluated
>>>>> by the C-compiler I'm happy.
>>>>>
>>>>> The problem is not that IRIW is not handled by the JMM, the problem
>>>>> is that
>>>>> store
>>>>> sync
>>>>> does not assure multiple-read-atomicity,
>>>>> only
>>>>> sync
>>>>> load
>>>>> does so on PPC. And you require multiple-read-atomicity to
>>>>> pass that test.
>>>>
>>>> What is "multiple-read-atomicity"? I'm not familiar with the term and
>>>> can't find any reference to it.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> The JMM is fine. And
>>>>> store
>>>>> MemBarVolatile
>>>>> is fine on x86, sparc etc. as there exist assembler instructions that
>>>>> do what is required.
>>>>>
>>>>> So if you are off soon, please let's come to a solution that
>>>>> might be improvable in the way it's implemented, but that
>>>>> allows us to implement a correct PPC64 port.
>>>>>
>>>>> Best regards,
>>>>> Goetz.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Tuesday, November 26, 2013 1:11 PM
>>>>> To: Lindenmaier, Goetz
>>>>> Cc: 'Vladimir Kozlov'; 'Vitaly Davidovich'; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>>
>>>>> Hi Goetz,
>>>>>
>>>>> On 26/11/2013 9:22 PM, Lindenmaier, Goetz wrote:
>>>>>> Hi everybody,
>>>>>>
>>>>>> thanks a lot for the detailed reviews!
>>>>>> I'll try to answer to all in one mail.
>>>>>>
>>>>>>> Volatile fields written in constructor aren't guaranteed by JMM to occur before the reference is assigned;
>>>>>> We don't think it's correct if we omit the barrier after initializing
>>>>>> a volatile field. Previously, we discussed this with Aleksey Shipilev
>>>>>> and Doug Lea, and they agreed.
>>>>>> Also, concurrency torture tests
>>>>>> LongVolatileTest
>>>>>> AtomicIntegerInitialValueTest
>>>>>> will fail.
>>>>>> (In addition, observing 0 instead of the inital value of a volatile field would be
>>>>>> very counter-intuitive for Java programmers, especially in AtomicInteger.)
>>>>>
>>>>> The affects of unsafe publication are always surprising - volatiles do
>>>>> not add anything special here. AFAIK there is nothing in the JMM that
>>>>> requires the constructor barrier - discussions with Doug and Aleksey
>>>>> notwithstanding. AFAIK we have not seen those tests fail due to a
>>>>> missing constructor barrier.
>>>>>
>>>>>>> proposed for PPC64 is to make volatile reads extremely heavyweight
>>>>>> Yes, it costs measurable performance. But else it is wrong. We don't
>>>>>> see a way to implement this cheaper.
>>>>>>
>>>>>>> - these algorithms should be expressed using the correct OrderAccess operations
>>>>>> Basically, I agree on this. But you also have to take into account
>>>>>> that due to the different memory ordering instructions on different platforms
>>>>>> just implementing something empty is not sufficient.
>>>>>> An example:
>>>>>> MemBarRelease // means LoadStore, StoreStore barrier
>>>>>> MemBarVolatile // means StoreLoad barrier
>>>>>> If these are consecutively in the code, sparc code looks like this:
>>>>>> MemBarRelease --> membar(Assembler::LoadStore | Assembler::StoreStore)
>>>>>> MemBarVolatile --> membar(Assembler::StoreLoad)
>>>>>> Just doing what is required.
>>>>>> On Power, we get suboptimal code, as there are no comparable,
>>>>>> fine grained operations:
>>>>>> MemBarRelease --> lwsync // Doing LoadStore, StoreStore, LoadLoad
>>>>>> MemBarVolatile --> sync // // Doing LoadStore, StoreStore, LoadLoad, StoreLoad
>>>>>> obviously, the lwsync is superfluous. Thus, as PPC operations are more (too) powerful,
>>>>>> I need an additional optimization that removes the lwsync. I can not implement
>>>>>> MemBarRelease empty, as it is also used independently.
>>>>>>
>>>>>> Back to the IRIW problem. I think here we have a comparable issue.
>>>>>> Doing the MemBarVolatile or the OrderAccess::fence() before the read
>>>>>> is inefficient on platforms that have multiple-read-atomicity.
>>>>>>
>>>>>> I would propose to guard the code by
>>>>>> VM_Version::cpu_is_multiple_read_atomic() or even better
>>>>>> OrderAccess::cpu_is_multiple_read_atomic()
>>>>>> Else, David, how would you propose to implement this platform independent?
>>>>>> (Maybe we can also use above method in taskqueue.hpp.)
>>>>>
>>>>> I can not possibly answer to the necessary level of detail with a few
>>>>> moments thought. You are implying there is a problem here that will
>>>>> impact numerous platforms (unless you can tell me why ppc is so
>>>>> different?) and I can not take that on face value at the moment. The
>>>>> only reason I can see IRIW not being handled by the JMM requirements for
>>>>> volatile accesses is if there are global visibility issues that are not
>>>>> addressed - but even then I would expect heavy barriers at the store
>>>>> would deal with that, not at the load. (This situation reminds me of the
>>>>> need for read-barriers on Alpha architecture due to the use of software
>>>>> cache-coherency rather than hardware cache-coherency - but we don't have
>>>>> that on ppc!)
>>>>>
>>>>> Sorry - There is no quick resolution here and in a couple of days I will
>>>>> be heading out on vacation for two weeks.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Best regards,
>>>>>> Goetz.
>>>>>>
>>>>>> -- Other ports:
>>>>>> The IRIW issue requires at least 3 processors to be relevant, so it might
>>>>>> not happen on small machines. But I can use PPC_ONLY instead
>>>>>> of PPC64_ONLY if you request so (and if we don't get rid of them).
>>>>>>
>>>>>> -- MemBarStoreStore after initialization
>>>>>> I agree we should not change it in the ppc port. If you wish, I can
>>>>>> prepare an extra webrev for hotspot-comp.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Tuesday, November 26, 2013 2:49 AM
>>>>>> To: Vladimir Kozlov
>>>>>> Cc: Lindenmaier, Goetz; 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8029101: PPC64 (part 211): ordering of Independent Reads of Independent Writes
>>>>>>
>>>>>> Okay this is my second attempt at answering this in a reasonable way :)
>>>>>>
>>>>>> On 26/11/2013 10:51 AM, Vladimir Kozlov wrote:
>>>>>>> I have to ask David to do correctness evaluation.
>>>>>>
>>>>>> From what I understand what we see here is an attempt to fix an
>>>>>> existing issue with the implementation of volatiles so that the IRIW
>>>>>> problem is addressed. The solution proposed for PPC64 is to make
>>>>>> volatile reads extremely heavyweight by adding a fence() when doing the
>>>>>> load.
>>>>>>
>>>>>> Now if this was purely handled in ppc64 source code then I would be
>>>>>> happy to let them do whatever they like (surely this kills performance
>>>>>> though!). But I do not agree with the changes to the shared code that
>>>>>> allow this solution to be implemented - even with PPC64_ONLY this is
>>>>>> polluting the shared code. My concern is similar to what I said with the
>>>>>> taskQueue changes - these algorithms should be expressed using the
>>>>>> correct OrderAccess operations to guarantee the desired properties
>>>>>> independent of architecture. If such a "barrier" is not needed on a
>>>>>> given architecture then the implementation in OrderAccess should reduce
>>>>>> to a no-op.
>>>>>>
>>>>>> And as Vitaly points out the constructor barriers are not needed under
>>>>>> the JMM.
>>>>>>
>>>>>>> I am fine with suggested changes because you did not change our current
>>>>>>> code for our platforms (please, do not change do_exits() now).
>>>>>>> But may be it should be done using more general query which is set
>>>>>>> depending on platform:
>>>>>>>
>>>>>>> OrderAccess::needs_support_iriw_ordering()
>>>>>>>
>>>>>>> or similar to what we use now:
>>>>>>>
>>>>>>> VM_Version::needs_support_iriw_ordering()
>>>>>>
>>>>>> Every platform has to support IRIW this is simply part of the Java
>>>>>> Memory Model, there should not be any need to call this out explicitly
>>>>>> like this.
>>>>>>
>>>>>> Is there some subtlety of the hardware I am missing here? Are there
>>>>>> visibility issues beyond the ordering constraints that the JMM defines?
>>>>>>> From what I understand our ppc port is also affected. David?
>>>>>>
>>>>>> We can not discuss that on an OpenJDK mailing list - sorry.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> In library_call.cpp can you add {}? New comment should be inside else {}.
>>>>>>>
>>>>>>> I think you should make _wrote_volatile field not ppc64 specific which
>>>>>>> will be set to 'true' only on ppc64. Then you will not need PPC64_ONLY()
>>>>>>> except in do_put_xxx() where it is set to true. Too many #ifdefs.
>>>>>>>
>>>>>>> In do_put_xxx() can you combine your changes:
>>>>>>>
>>>>>>> if (is_vol) {
>>>>>>> // See comment in do_get_xxx().
>>>>>>> #ifndef PPC64
>>>>>>> insert_mem_bar(Op_MemBarVolatile); // Use fat membar
>>>>>>> #else
>>>>>>> if (is_field) {
>>>>>>> // Add MemBarRelease for constructors which write volatile field
>>>>>>> (PPC64).
>>>>>>> set_wrote_volatile(true);
>>>>>>> }
>>>>>>> #endif
>>>>>>> }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 11/25/13 8:16 AM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I preprared a webrev with fixes for PPC for the VolatileIRIWTest of
>>>>>>>> the torture test suite:
>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8029101-0-raw/
>>>>>>>>
>>>>>>>> Example:
>>>>>>>> volatile x=0, y=0
>>>>>>>> __________ __________ __________ __________
>>>>>>>> | Thread 0 | | Thread 1 | | Thread 2 | | Thread 3 |
>>>>>>>>
>>>>>>>> write(x=1) read(x) write(y=1) read(y)
>>>>>>>> read(y) read(x)
>>>>>>>>
>>>>>>>> Disallowed: x=1, y=0 y=1, x=0
>>>>>>>>
>>>>>>>>
>>>>>>>> Solution: This example requires multiple-copy-atomicity. This is only
>>>>>>>> assured by the sync instruction and if it is executed in the threads
>>>>>>>> doing the loads. Thus we implement volatile read as sync-load-acquire
>>>>>>>> and omit the sync/MemBarVolatile after the volatile store.
>>>>>>>> MemBarVolatile happens to be implemented by sync.
>>>>>>>> We fix this in C2 and the cpp interpreter.
>>>>>>>>
>>>>>>>> This addresses a similar issue as fix "8012144: multiple SIGSEGVs
>>>>>>>> fails on staxf" for taskqueue.hpp.
>>>>>>>>
>>>>>>>> Further this change contains a fix that assures that volatile fields
>>>>>>>> written in constructors are visible before the reference gets
>>>>>>>> published.
>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at the code, we found a MemBarRelease that to us, seems too
>>>>>>>> strong.
>>>>>>>> We think in parse1.cpp do_exits() a MemBarStoreStore should suffice.
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> Please review and test this change.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz.
>>>>>>>>
More information about the hotspot-dev
mailing list