[vector]: RFR (XS): Small bug fixes

Paul Sandoz paul.sandoz at oracle.com
Tue Feb 27 19:06:35 UTC 2018



> On Feb 27, 2018, at 10:48 AM, Kandu, Rahul <rahul.kandu at intel.com> wrote:
> 
> Would it still generate "mfence" or "lock addl" with use of loadFence() and storeFence().
> 

No, neither independently nor for a combined sequence of calls (i think your questions refers to the latter?).

Paul.

> -Rahul
> 
> ."And LoadFence and StoreFence translate to nothing in x86 due to memory model."
> 
> -----Original Message-----
> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Ivanov
> Sent: Tuesday, February 27, 2018 7:55 AM
> To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; John Rose <john.r.rose at oracle.com>; Paul Sandoz <paul.sandoz at oracle.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: [vector]: RFR (XS): Small bug fixes
> 
> Paul, John, Razvan, thanks for reviews. Pushed both fixes.
> 
> Best regards,
> Vladimir Ivanov
> 
> On 2/27/18 6:37 PM, Lupusoru, Razvan A wrote:
>> Updated solution via Unsafe.loadFence() (or storeFence()) will work for me. And thanks for other fix on NegVI.
>> 
>> --Razvan
>> 
>> -----Original Message-----
>> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com]
>> Sent: Tuesday, February 27, 2018 7:31 AM
>> To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; John Rose 
>> <john.r.rose at oracle.com>; Paul Sandoz <paul.sandoz at oracle.com>
>> Cc: panama-dev at openjdk.java.net
>> Subject: Re: [vector]: RFR (XS): Small bug fixes
>> 
>> Good point, Razvan.
>> 
>>> I recommend that we instead update it to Unsafe().loadFence() 
>>> followed by Unsafe().storeFence(). From my understanding, this should 
>>> be safe while having the desired performance characteristics. From 
>>> what it seems to me, this will generate: MemBarCPUOrder LoadFence 
>>> MemBarCPUOrder StoreFence. MemBarCPUOrder seems to be what I want 
>>> which is provides memory ordering guarantees in compiler. And 
>>> LoadFence and StoreFence translate to nothing in x86 due to memory model.
>> 
>> As John pointed out, bare MemBarCPUOrder is enough. There's no way right now to insert it, but Unsafe.loadFence() or Unsafe.storeFence() will fix the problem as well.
>>> Even better, we could update the C2 side to manually insert 
>>> MemBarCPUOrder without load and store fences. This would allow us to 
>>> move forward in absence of an actual solution for aliasing in 
>>> presence of wide memory accesses.
>> 
>> I proposed the fix as a stop-the-gap solution.
>> 
>> I agree that having it implemented in C2 is appealing (e.g., there are some optimization opportunities to avoid barrier depending on whether reboxing actually happens on not) and a viable solution in a longer term.
>> 
>> Best regards,
>> Vladimir Ivanov
>> 
>>> *From:*John Rose [mailto:john.r.rose at oracle.com]
>>> *Sent:* Monday, February 26, 2018 5:33 PM
>>> *To:* Paul Sandoz <paul.sandoz at oracle.com>
>>> *Cc:* Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; 
>>> panama-dev at openjdk.java.net
>>> *Subject:* Re: [vector]: RFR (XS): Small bug fixes
>>> 
>>> On Feb 26, 2018, at 5:18 PM, Paul Sandoz <paul.sandoz at oracle.com 
>>> <mailto:paul.sandoz at oracle.com>> wrote:
>>> 
>>>     On Feb 26, 2018, at 5:04 PM, Lupusoru, Razvan A
>>>     <razvan.a.lupusoru at intel.com <mailto:razvan.a.lupusoru at intel.com>>
>>>     wrote:
>>> 
>>> 
>>>         Hi Vladimir,
>>> 
>>>         I am not too familiar with what Unsafe.fullFence() ends up
>>>         generating in terms of code. However, if it generates something
>>>         like "mfence" instruction, it is undesirable. Ideally we want a
>>>         scheduling barrier so that C2 will not move memory operations
>>>         across the barrier but that has zero cost in terms of generated
>>>         code. If it has a zero cost indeed, then the patch looks fine to me.
>>> 
>>> 
>>>     It will generate something equivalent to mfence, ‘lock addl’ (see
>>>     matches for MemBarVolatileNode and also OrderAccess::fence).
>>> 
>>>     I thought this was just a temporary reprieve to avoid crashes until
>>>     something better is worked out.
>>> 
>>> Yes, the fullFence is stronger, associated with volatile reads and writes.
>>> 
>>> Most raw unsafe accesses use CPU order barriers which are what Razvan wants.
>>> 
>>> 
>>> But the Java Unsafe API doesn't directly provide such a fence.
>>> 
>>> Perhaps one could trick C2 into placing a CPU order barrier next to 
>>> code
>>> 
>>> which folds up to nothing, but such a trick would be fragile.
>>> Certainly
>>> 
>>>   CPU order fences must have been considered as a possible fence,
>>> 
>>> but they aren't in today's kit.
>>> 
>>> We could try Reference.reachabilityFence(null), which (as it happens)
>>> 
>>> is not reorderable. That's cheaper than fullFence.
>>> 
>>> — John
>>> 



More information about the panama-dev mailing list