RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce MemBarAcquire/ReleaseWide.
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Mon Nov 25 03:26:05 PST 2013
Hi David,
I promised to comment on your concern about individual load/stores:
> As you state ld.acq and st.rel only
> relate to the target load/store _but_ there are no actions in the Java
> Memory Model that only provide ordering with respect to a single load or
> store!
We derive this from the well-known cookbook:
http://g.oswego.edu/dl/jmm/cookbook.html
Have a look at the table "required Barriers".
It talks about individual operations. E.g., imagine
Normal load 1
Volatile load 2
Normal load 3
Then load1 may float arbitrarily to any place, but load3 may not
float above load2.
If I issue a 'wide' Acquire barriers after load2 (lwsync), this hinders load1 to
float below this barrier, introducing a constraint not specified by this
table.
Does this satisfy your concerns?
Best regards,
Goetz and Martin
-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com]
Sent: Donnerstag, 21. November 2013 15:30
To: Lindenmaier, Goetz
Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'; 'Vladimir Kozlov'
Subject: Re: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce MemBarAcquire/ReleaseWide.
Hi Goetz,
On 21/11/2013 8:23 PM, Lindenmaier, Goetz wrote:
> Hi David,
>
> here the new webrev, naming the nodes LoadFence/StoreFence:
> http://cr.openjdk.java.net/~goetz/webrevs/8028515-2-wide/
Okay. I think I now disagree (when seeing in context) that Membar should
have been dropped from the name but I'll let that lie.
Aside: I find the descriptions of all those "membar" nodes very unclear
- it would be better, for me, if there were expressed more clearly in
terms of storeStore, storeLoad etc.
Could you add to the comments in memnode.hpp that those nodes are only
used by the respective Unsafe intrinisics? Or at least use that as an
example usage?
> The thing with the mail was my fault, I had sent the first mail
> only to you accidentially. Sorry for that.
>
>> But Unsafe.loadFence and Unsafe.storeFence _are_ intended for use after
>> loads/stores - but with stronger semantics than just simple loadLoad or
>> storeStore barriers.
> Yes. I mean they do not intend to sort a single load/store with succeeding
> operations. This can not be implemented this way, as the operation
> is not known in the intrinsic. I mean cases where you can use ia64 ld.acq
> or ppc ld-twi-lwsync.
If you are saying that loadFence and storeFence can not be implemented
using ld.acq and st.rel then good I agree with you. But I would argue
that general ld;MembarAcquire and st;MembarRelease can also not be
implemented with those instructions! As you state ld.acq and st.rel only
relate to the target load/store _but_ there are no actions in the Java
Memory Model that only provide ordering with respect to a single load or
store! The Unsafe *Ordered() operations may come closer to this but I
would expect them to be handled via an intrinsic - and they are outside
the JMM. So I would think the applicability of using ld.acq and st.rel
would be very limited, so it concerns me that it seems to be associated
with the very general MembarAcquire and MembarRelease nodes (ditto for
the changes in 8024921).
But how you implement this is your concern. :) From the perspective of
this shared code my concern is that it is not clear exactly what the
practical distinction is between MembarAcquire and LoadFence, and
MembarRelease and StoreFence. I'm not a compiler person but I would find
it quite difficult to know which can/should be used when. And given for
most platforms they would be implemented the same, that would add to the
uncertainty (with a risk that someone will make an arbitrary choice).
Thanks,
David
-----
>> Also I said that the Unsafe ops require bi-directional fences but that
>> was imprecise - the _load and _store ops only require two of the four
>> possible barriers which may be why you see these as distinct from the
>> membars associated directly with variable accesses ?
> On ia64, ld.acq sorts _this_ _single_ load with all following ld/st operations.
> For the intrinsic I must use mf, which sorts _all_ previous loads with all
> following ld/st operations (and more). So my distinction is if I have a
> store on one processor, and a load on the other, and I want to make sure
> the other processor's load sees the stored value, I can use - on ia64 - st.rel/ld.acq and I'm fine.
> If I did some arbitrary stores, and I want assure an other processor doing
> some arbitrary load,s then I must use - on ia64- mf on both.
> On sparc I can do a
> membar( Assembler::Membar_mask_bits(Assembler::LoadStore | Assembler::LoadLoad) );
> membar( Assembler::Membar_mask_bits(Assembler::LoadStore | Assembler::StoreStore) );
> But, on sparc, there is no operation as ld.acq as far as I know.
>
> Change 8024921 enables to distinguish normal loads and such
> that should do ld.acq. This change enables the compiler to
> differentiate the MemBar nodes used better: such coming after
> a dedicated load, and such valid for any load before the barrier.
> This is already visible in the IR by some means: MemBars after
> dedicated loads have an input edge at position MemBar::precedence
> connecting them with the load, but that does not suffice to
> match it (in the matcher) correctly.
> If I use ld.acq, the following MemBarAcquire is superfluous,
> but the MemBarAcquire used in the intrinsic is not. So want to
> have different nodes.
> In our VM, we use #ifdef PPC and issue MemBarRelease instead of MemBarAcquire,
> and with #ifdef ia64 we use MemBarVolatile, as we know these operations will
> issue the proper assembly instructions. But this is bad - I want to do it better
> here.
>
>> Are you referring to generated code if these nodes are adjacent? If so I
>> assume the problem is that you can't logically coalesce these, even if
>> the actual generated code could be coalesced?
> Yes. On ia64 it is worst, as we had to implement MemBarRelease, Acquire
> and Volatile with mf() if we would not use ld.acq/st.rel. Then there
> would be even more redundant operations. But that's out of scope of this change.
>
>> I still have this concern regarding the fact you think these are not
>> related to load/store operations. But I'll wait for the webrev.
> As stated above, yes, they are related to store/load operations. But not
> to a dedicated ld/st -- I can not use ld.acq/st.rel as the ld/st are not
> part of the intrinsic.
>
> (I'm using the ia64 mnemonics in the implementation because I think
> they are nicely orthogonal. Also it's important to know that
> ld-tdi-lwsync on ppc is cheaper than ld-lwsync because it affects only
> the one load mentioned in the sequence. That's some hack in the
> processor.)
>
> Best regards,
> Goetz.
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Mittwoch, 20. November 2013 21:06
> To: Lindenmaier, Goetz
> Cc: 'hotspot-dev at openjdk.java.net'; 'ppc-aix-port-dev at openjdk.java.net'; 'Vladimir Kozlov'
> Subject: Re: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce MemBarAcquire/ReleaseWide.
>
> First apologies that the email you replied to here didn't go to the
> lists. Due to the way it was filtered I assumed it was a direct email
> only. :)
>
> On 21/11/2013 12:22 AM, Lindenmaier, Goetz wrote:
>> Hi David,
>>
>> thanks for the comprehensive explanation. That's in agreement with
>> what I understand. Except that I think, unidirectional barriers only
>> make sense if you have Acquire/Release operations dedicated to
>> a certain operation. Then other operations of the same kind can pass
>> the barrier in one direction.
>>
>> This also matches the extension I contributed to hotspot (8024921),
>> which now allows to issue such operations right along with the
>> loads/stores.
>>
>> This change here tries to separate the barriers used after load/stores from
>> the 'wide' or better 'fence' variants.
>
> But Unsafe.loadFence and Unsafe.storeFence _are_ intended for use after
> loads/stores - but with stronger semantics than just simple loadLoad or
> storeStore barriers.
>
>> Given the basic four barriers (L-L, L-S, S-L, S-S) there are 16 possible
>> operations. Unfortunately, Hotspot knows only four of these:
>> MemBarStoreStore: S-S
>
> Okay - nice and clear.
>
>> MemBarAcquire: L-S, L-L
>
> I'm not sure what makes this "acquire" but this seems to be exactly what
> Unsafe.loadFence requires.
>
>> MemBarRelease: L-S, S-S
>
> Again I'm not sure what makes this "release" - I also find it odd that
> it is L-S not S-L. If S-L this would be exactly what Unsafe.storeFence
> needs.
>
>> MemBarVolatile: L-L, L-S, S-L, S-S
>
> A full fence, needed in the worst case when accessing a volatile variable.
>
> Also note that in different parts of the VM we have different
> expressions of these, both at the JIT level and runtime eg see
> orderAccess.hpp (which is not in itself self-consistent or clear: see
> https://bugs.openjdk.java.net/browse/JDK-7143664).
>
> Also I said that the Unsafe ops require bi-directional fences but that
> was imprecise - the _load and _store ops only require two of the four
> possible barriers which may be why you see these as distinct from the
> membars associated directly with variable accesses ?
>
>> and these don't map, e.g., the operations available on PPC:
>> lwsync: L-S, L-L, S-S
>> sync: L-L, L-S, S-L, S-S
>> What is the reason why it's hard for us to generate optimal code
>> with the available optimizations:
>> MemBarAcquire;
>> MemBarRelease;
>> are mapped to
>> lwsync
>> lwsync
>> which is obviously redundant.
>
> Are you referring to generated code if these nodes are adjacent? If so I
> assume the problem is that you can't logically coalesce these, even if
> the actual generated code could be coalesced?
>
>> A generic solution would be to have a single operation, that can be
>> Tagged with the barriers (L-L, L-S, S-L, S-S) it should guarantee.
>> Two operations could be merged by an optimization simply by
>> or-ing the tags.
>> But this is not reality in HotSpot, and out of scope of this change.
>
> Ok
>
>>> I think perhaps that Vladimir's suggestion is best - if these are the
>>> Membar nodes for these unsafe ops (and only these) then name them to
>>> match the unsafe ops. At least then it is clearer where the
>>> specification for their behaviour comes from.
>> OK, so I'll rename them as Vladimir proposed, and post a new webrev.
>
> I still have this concern regarding the fact you think these are not
> related to load/store operations. But I'll wait for the webrev.
>
> Thanks for your patience on this. Hopefully we can sort it out in the
> next 24 hours before I takeoff.
>
> David
> -----
>
>> Best regards and a good flight home,
>> Goetz.
>>
>>
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Mittwoch, 20. November 2013 14:30
>> To: Lindenmaier, Goetz
>> Cc: David Holmes
>> Subject: Re: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce MemBarAcquire/ReleaseWide.
>>
>> Hi Goetz,
>>
>> On 20/11/2013 7:36 PM, Lindenmaier, Goetz wrote:
>>> Hi David,
>>> I don't know what you mean by uni/bi-directional.
>>
>> Pretty much as you describe below. A unidirectional barrier allows
>> things to cross in one direction only ie it blocks motion one way; a
>> bi-directional barrier ("full fence") blocks both ways and so prevents
>> any reordering.
>>
>> As I described in earlier emails these kind of unreferenced
>> acquire/release semantics ie not associated with any stores/loads is
>> used to describe the memory barrier properties of a synchronized block,
>> for example. Monitor acquisition has acquire semantics, while monitor
>> release has store semantics. The implied barriers mean that any accesses
>> can move into the synchronized region but none can move out - the "roach
>> motel" semantics.
>>
>> Now as you say this kind of barrier is not what is wanted with these
>> loadFence/storeFence operations, hence I think the generic
>> MembarAcquire/MembarRelease are inappropriate names (whether the
>> implementation is correct is a different matter). Similarly I find the
>> MembarAcquireFence and MembarReleaseFence to be somewhat inappropriate -
>> what do they mean ???
>>
>> The loadFence semantics imply a loadLoad|loadStore barrier.
>> The storeFence semantics imply a storeLoad|storeStore barrier.
>> The fullFence is of course all four.
>>
>> Again I'm unclear if the current or proposed MembarXXX actions actually
>> map to barriers that implement those semantics.
>>
>> So yes this comes down to naming initially but also verification that
>> the implementation does as required.
>>
>> I think perhaps that Vladimir's suggestion is best - if these are the
>> Membar nodes for these unsafe ops (and only these) then name them to
>> match the unsafe ops. At least then it is clearer where the
>> specification for their behaviour comes from.
>>
>> I'll post a follow up email to the list when I get a chance. I'm
>> currently in the US and attending lots of meetings then head back to
>> Australia tomorrow evening.
>>
>> Thanks,
>> David
>>
>>> I guess we agree that the documentation of the intrinsics, let's take
>>> loadFence
>>> for an example, talks about loads before the loadFence, and stores and
>>> loads after it:
>>> loads
>>> ------------loadFence()--------------
>>> loads, stores
>>> With this, you can achieve any ordering by moving nodes up OR down:
>>> Op1 Move Op1 Op2
>>> Op2 down -------------
>>> --------- =========> Op3
>>> Op3 Op1
>>> Alternatively, you can move op3 up, and get the same order:
>>> Op1 Reorder Op2 Move Op3 Op2
>>> Op2 Ops 1 and 2 Op1 up Op3
>>> --------- =========> -------------- =========> Op1
>>> Op3 Not restricted Op3 ----------
>>> by later loadFence
>>> So, if the operation restricts only moves in one direction, it's of no
>>> help because
>>> you can still get an execution order you did not expect. So
>>> unidirectional operations
>>> (e.g., only hindering operations from moving up) are pointless.
>>> By the way, are you questioning whether the implementation is correct, or
>>> whether the names express what is desired?
>>> If it's the first, it's of no concern to this change, which is only about
>>> matching these operations independently from those, that address
>>> a dedicated memory operation.
>>> If It's the second, please tell me what names to use, and I'll fix it.
>>> (I hope you can properly read the 'drawings')
>>> Best regards,
>>> Goetz.
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Dienstag, 19. November 2013 20:51
>>> To: Lindenmaier, Goetz
>>> Cc: 'Vladimir Kozlov'; 'ppc-aix-port-dev at openjdk.java.net';
>>> 'hotspot-dev at openjdk.java.net'
>>> Subject: Re: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce
>>> MemBarAcquire/ReleaseWide.
>>> On 20/11/2013 5:34 AM, Lindenmaier, Goetz wrote:
>>>> Hi David,
>>>>
>>>> this are the instrinsics for sun/misc/Unsafe, which are documented
>>>> as shown below. So I guess that's just what is desired.
>>> I don't think so! What is described for Unsafe are full bi-directional
>>> fences and "acquire" and "release" are only uni-directional fences!
>>> David
>>> -----
>>>> Best regards,
>>>> Goetz.
>>>>
>>>> /**
>>>> * Ensures lack of reordering of loads before the fence
>>>> * with loads or stores after the fence.
>>>> * @since 1.8
>>>> */
>>>> public native void loadFence();
>>>>
>>>> /**
>>>> * Ensures lack of reordering of stores before the fence
>>>> * with loads or stores after the fence.
>>>> * @since 1.8
>>>> */
>>>> public native void storeFence();
>>>>
>>>> /**
>>>> * Ensures lack of reordering of loads or stores before the fence
>>>> * with loads or stores after the fence.
>>>> * @since 1.8
>>>> */
>>>> public native void fullFence();
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Tuesday, November 19, 2013 8:08 PM
>>>> To: Lindenmaier, Goetz
>>>> Cc: Vladimir Kozlov; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>> Subject: Re: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce MemBarAcquire/ReleaseWide.
>>>>
>>>> So I like the rename but I am concerned there are semantic issues here:
>>>>
>>>> switch(id) {
>>>> case vmIntrinsics::_loadFence:
>>>> - insert_mem_bar(Op_MemBarAcquire);
>>>> + insert_mem_bar(Op_MemBarFenceAcquire);
>>>> return true;
>>>> case vmIntrinsics::_storeFence:
>>>> - insert_mem_bar(Op_MemBarRelease);
>>>> + insert_mem_bar(Op_MemBarFenceRelease);
>>>> return true;
>>>>
>>>> What is a _loadFence operation? Does it really map to
>>>> MembarFenceAcquire? Ditto for the _storeFence. These sound like
>>>> bi-directional fences - are they? Are they really only concerned with
>>>> loads or stores ?
>>>>
>>>> David
>>>>
>>>> On 19/11/2013 6:34 PM, Lindenmaier, Goetz wrote:
>>>>> Hi,
>>>>>
>>>>> I made a new webrev, using the new names:
>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8028515-1-wide/
>>>>>
>>>>> Thanks,
>>>>> Goetz.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>>> Sent: Montag, 18. November 2013 23:28
>>>>> To: Lindenmaier, Goetz; 'David Holmes'
>>>>> Cc: 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>>> Subject: Re: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce MemBarAcquire/ReleaseWide.
>>>>>
>>>>> On 11/18/13 2:19 PM, Lindenmaier, Goetz wrote:
>>>>>> Hi David, Vladimir,
>>>>>>
>>>>>> I also would prefer MemBarFenceAquire/Release, for two reasons
>>>>>> - the same prefix shows they are all similar nodes.
>>>>>> - I don't have to change MemBarVolatile to FullFence, which I didn't
>>>>>> change yet and which is used in more places.
>>>>>
>>>>> Okay.
>>>>>
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Best regards,
>>>>>> Goetz.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Monday, November 18, 2013 10:49 PM
>>>>>> To: Vladimir Kozlov
>>>>>> Cc: Lindenmaier, Goetz; 'ppc-aix-port-dev at openjdk.java.net'; 'hotspot-dev at openjdk.java.net'
>>>>>> Subject: Re: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce MemBarAcquire/ReleaseWide.
>>>>>>
>>>>>> Vladimir,
>>>>>>
>>>>>> On 19/11/2013 5:36 AM, Vladimir Kozlov wrote:
>>>>>>> I think David asked for different nodes not just one.
>>>>>>> We can have new membar nodes with names matching Unsafe methods names:
>>>>>>> LoadFence, StoreFence, FullFence. I am fine with it.
>>>>>>
>>>>>> But I don't think that actually describes them - they don't distinguish
>>>>>> between loads and stores only the direction in which the barrier
>>>>>> operates. "acquire" only allows accesses to move below it; "release"
>>>>>> only allows accesses to move above it ie:
>>>>>>
>>>>>> load a
>>>>>> store b, c
>>>>>> acquire();
>>>>>> load d
>>>>>> store e,f
>>>>>> release();
>>>>>> load g
>>>>>> store h, i
>>>>>>
>>>>>> allows the loads of 'a' and 'g' to move into the region between acquire
>>>>>> and release; similarly for the stores to b and h. But the load of d nor
>>>>>> the store to e can not move in relation to either acquire or release.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>> MemBar and Fence have similar meaning so lets don't mix them in node names.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 11/18/13 8:19 AM, Lindenmaier, Goetz wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> as reply to your comment on the bug:
>>>>>>>>
>>>>>>>> Well, I sitll would need 2 different nodes, as on PPC we do
>>>>>>>> MemBarAcquireWide --> lwsync
>>>>>>>> MemBarReleaseWide --> lwsync
>>>>>>>> MemBarVolatile --> sync.
>>>>>>>> On Sparc, you even do 3 different operations.
>>>>>>>>
>>>>>>>> Or should I name them MemBarFenceAcquire and MemBarFenceRelease?
>>>>>>>> This all depends a lot on the available instructions of the processors.
>>>>>>>> Therefore I think a really clean representation that, at the same
>>>>>>>> time, allows
>>>>>>>> to find the cheapest set of instructions to express it on all
>>>>>>>> processors, is impossible.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz
>>>>>>>>
>>>>>>>> PS: Should I respond to comments in the bug right in the bug
>>>>>>>> or on the mailing lists?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From:ppc-aix-port-dev-bounces at openjdk.java.net
>>> <mailto:ppc-aix-port-dev-bounces at openjdk.java.net>
>>>>>>>> [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of
>>>>>>>> Lindenmaier, Goetz
>>>>>>>> Sent: Montag, 18. November 2013 15:19
>>>>>>>> To: 'hotspot-dev at openjdk.java.net';
>>>>>>>> 'ppc-aix-port-dev at openjdk.java.net'; Vladimir Kozlov
>>>>>>>> Subject: RFR(M): 8028515: PPC64 (part 113.2): opto: Introduce
>>>>>>>> MemBarAcquire/ReleaseWide.
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The c2 compiler inserts MemBarAcquire/Release nodes to enforce memory
>>>>>>>> ordering in various places. Some order a certain load/store with other
>>>>>>>> operations. Inline_unsafe_fence() inserts MemBars that do not
>>>>>>>> correspont to a memory operation. So far, the same nodes were used.
>>>>>>>>
>>>>>>>> This change introduces MemBarAcquire/ReleaseWide to use where no
>>>>>>>> dedicated load/store is ordered. With this change, these nodes can be
>>>>>>>> matched differently, what is needed on PPC64.
>>>>>>>>
>>>>>>>> When reviewing 8024921 (part 113) we decided to avoid #defines in
>>>>>>>> inline_unsafe_fence() and to introduce new MemBar operations.
>>>>>>>>
>>>>>>>> Please review and test this change.
>>>>>>>> http://cr.openjdk.java.net/~goetz/webrevs/8028515-0-wide/
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Goetz.
>>>>>>>>
More information about the ppc-aix-port-dev
mailing list