[vector]: RFR (XS): Small bug fixes

Tue Feb 27 15:37:48 UTC 2018

Updated solution via Unsafe.loadFence() (or storeFence()) will work for me. And thanks for other fix on NegVI.

--Razvan

-----Original Message-----
From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] 
Sent: Tuesday, February 27, 2018 7:31 AM
To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; John Rose <john.r.rose at oracle.com>; Paul Sandoz <paul.sandoz at oracle.com>
Cc: panama-dev at openjdk.java.net
Subject: Re: [vector]: RFR (XS): Small bug fixes

Good point, Razvan.

> I recommend that we instead update it to Unsafe().loadFence() followed 
> by Unsafe().storeFence(). From my understanding, this should be safe 
> while having the desired performance characteristics. From what it 
> seems to me, this will generate: MemBarCPUOrder LoadFence 
> MemBarCPUOrder StoreFence. MemBarCPUOrder seems to be what I want 
> which is provides memory ordering guarantees in compiler. And 
> LoadFence and StoreFence translate to nothing in x86 due to memory model.

As John pointed out, bare MemBarCPUOrder is enough. There's no way right now to insert it, but Unsafe.loadFence() or Unsafe.storeFence() will fix the problem as well.
> Even better, we could update the C2 side to manually insert 
> MemBarCPUOrder without load and store fences. This would allow us to 
> move forward in absence of an actual solution for aliasing in presence 
> of wide memory accesses.

I proposed the fix as a stop-the-gap solution.

I agree that having it implemented in C2 is appealing (e.g., there are some optimization opportunities to avoid barrier depending on whether reboxing actually happens on not) and a viable solution in a longer term.

Best regards,
Vladimir Ivanov

> *From:*John Rose [mailto:john.r.rose at oracle.com]
> *Sent:* Monday, February 26, 2018 5:33 PM
> *To:* Paul Sandoz <paul.sandoz at oracle.com>
> *Cc:* Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; 
> panama-dev at openjdk.java.net
> *Subject:* Re: [vector]: RFR (XS): Small bug fixes
> 
> On Feb 26, 2018, at 5:18 PM, Paul Sandoz <paul.sandoz at oracle.com 
> <mailto:paul.sandoz at oracle.com>> wrote:
> 
>     On Feb 26, 2018, at 5:04 PM, Lupusoru, Razvan A
>     <razvan.a.lupusoru at intel.com <mailto:razvan.a.lupusoru at intel.com>>
>     wrote:
> 
> 
>         Hi Vladimir,
> 
>         I am not too familiar with what Unsafe.fullFence() ends up
>         generating in terms of code. However, if it generates something
>         like "mfence" instruction, it is undesirable. Ideally we want a
>         scheduling barrier so that C2 will not move memory operations
>         across the barrier but that has zero cost in terms of generated
>         code. If it has a zero cost indeed, then the patch looks fine to me.
> 
> 
>     It will generate something equivalent to mfence, ‘lock addl’ (see
>     matches for MemBarVolatileNode and also OrderAccess::fence).
> 
>     I thought this was just a temporary reprieve to avoid crashes until
>     something better is worked out.
> 
> Yes, the fullFence is stronger, associated with volatile reads and writes.
> 
> Most raw unsafe accesses use CPU order barriers which are what Razvan wants.
> 
> 
> But the Java Unsafe API doesn't directly provide such a fence.
> 
> Perhaps one could trick C2 into placing a CPU order barrier next to 
> code
> 
> which folds up to nothing, but such a trick would be fragile.  
> Certainly
> 
>   CPU order fences must have been considered as a possible fence,
> 
> but they aren't in today's kit.
> 
> We could try Reference.reachabilityFence(null), which (as it happens)
> 
> is not reorderable. That's cheaper than fullFence.
> 
> — John
>