RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic

Igor Veresov igor.veresov at oracle.com
Wed Jan 27 06:30:39 UTC 2016


Or to put it another way. Memory effect of the pause prevents ordinary loads to float up. However, the control effect of it alone should be enough to prevent the _volatile_ loads to float up, since they are control-dependent. Hence the original thought that the memory effect of the pause might be unnecessarily restrictive if it’s used with volatile loads. But may be I’m missing something. 

igor

> On Jan 26, 2016, at 10:03 PM, Igor Veresov <igor.veresov at oracle.com> wrote:
> 
>> 
>> On Jan 26, 2016, at 9:35 PM, Vitaly Davidovich <vitalyd at gmail.com <mailto:vitalyd at gmail.com>> wrote:
>> 
>> 
>> 
>> On Tuesday, January 26, 2016, Igor Veresov <igor.veresov at oracle.com <mailto:igor.veresov at oracle.com>> wrote:
>> 
>>> On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich <vitalyd at gmail.com <>> wrote:
>>> 
>>> You would but subsequent volatile load could move before the pause.  If you unroll the loop, you could (theoretically) end up with all loads moved before the pause but all appearing ordered with respect to each other, eg:
>>> 
>>> cmp addr, 0 // from iteration 1
>>> je label
>>> cmp addr, 0 // from iteration 2
>>> je label
>>> ...
>>> pause
>>> 
>>> What prevents that if pause is not a compiler member?
>>> 
>> 
>> I think volatile loads explicitly depend on control. If the pause node consumes and produces control it all should be in a rigid control chain.  
>> Other regular loads (that don’t have control dependencies) would still be free to move around.
>> 
>> Is this to avoid out of thin air values? That is, suppose you have:
>> 
>> if (some condition)
>>     read volatile (or regular)
> 
>> 
>> Regular load can be scheduled before the if and result used if control reaches there.  For volatile, load cannot be scheduled above the if since value can be bogus at that point?
> 
> Right. Regular reads can move up anywhere to the preceding memory effect, that modified that alias index.
> 
>> 
>> Is it safe for compiler to assume that something else anchors loads around the pause?
>> 
>> That aside, given the intended usage, I'm not sure what other regular loads would be there.  The usage is a tight spin loop waiting for exit condition to be met.  Although I suppose if compiler sees regular loads after the loop exits successfully, perhaps scheduling them before the loop can be beneficial.  Is that what you have in mind?
> 
> 
> No just simple stuff like:
> 
> while(…) {
>   a = x.f;
>   pause();
>   b = x.f;
> }
> 
> If pause() is a wide memory kill, regular field loads around it obviously won’t fold. So in the example above those field loads are both going to be there. I realize it’s probably not a big deal in reality for the wait loops, but I was just wondering why make it a wide mem kill if membar nodes for volatiles (that will have to be in the loop) already have wide kill semantics.
> 
> igor 
> 
> 
>> 
>> 
>> igor
>> 
>>> On Tuesday, January 26, 2016, Igor Veresov <igor.veresov at oracle.com <>> wrote:
>>> Wouldn’t you use a volatile load for the memory location you’re polling?
>>> 
>>> igor
>>> 
>>>> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich <vitalyd at gmail.com <>> wrote:
>>>> 
>>>> Subsequent loads at this point will likely be polls of same memory location that just failed a test, and the author inserted a pause.  It's unlikely that the memory changed that quickly and scheduling the next load before the pause is equivalent to two loads back to back essentially, which wouldn't make sense given the intended usage.  There's also the risk that the compiler would move enough of those load+test pairs before the pause and fill up the speculative pipeline with them; that pipeline will need to be flushed once the spin exits since those load instructions likely speculated incorrectly.  And here we're basically describing the reason for putting pause there in the first place :).
>>>> 
>>>> On Tuesday, January 26, 2016, Igor Veresov <igor.veresov at oracle.com <>> wrote:
>>>> So, why does the new node have a memory effect? That would seem to prevent any movement of the subsequent loads in your loop, right? If that’s intentional I wonder why is that?
>>>> 
>>>> igor
>>>> 
>>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov <ivan at azulsystems.com <>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> Some of you may have a seen a few e-mails on the core-libs alias about a proposed “spin wait hint”. The JEP is forming up nicely at  https://bugs.openjdk.java.net/browse/JDK-8147832 <https://bugs.openjdk.java.net/browse/JDK-8147832>. There seems to be a consensus on the API side. It is now in a draft state and I hope this JEP will get targeted for java 9 shortly.  The upcoming API changes can be seen at the webrev:
>>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/ <http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/>
>>>>> 
>>>>> At this time I would like to ask for a review of the hs-comp changes. The plan is push changes into class libraries and hotspot synchronously but that may happen after the JEP gets targeted.
>>>>> 
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844 <https://bugs.openjdk.java.net/browse/JDK-8147844>
>>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/ <http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/>
>>>>> 
>>>>> The idea of the fix is pretty simple: hotspot replaces a call to java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a 'pause' instruction on x86.  This intrinsic is guarded by the -XX:±UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a verification code that makes sure the flag is off, VM will just execute at empty method java.lang.Runtime.onSpinWait() – effectively a no-op. According the [1] the 'pause' instruction is functional since SSE2, but even on CPUs prior to SSE2 the  'pause' instruction is a no-op and hence harmless, there seems to be no need to add guarding code for older generations of Intel CPUs.
>>>>> 
>>>>> The proposed patch includes a simple regression test that simply makes sure that method java.lang.Runtime.onSpinWait() gets intrinsified.  There are several other producer-consumer-like performance tests ready that the authors of this JEP would be happy to make available under JEP-230 but I am uncertain about the process.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Ivan
>>>>> 
>>>>> [1]  - https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops <https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops>
>>>> 
>>>> 
>>>> -- 
>>>> Sent from my phone
>>> 
>>> 
>>> 
>>> -- 
>>> Sent from my phone
>> 
>> 
>> 
>> -- 
>> Sent from my phone

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160126/65f5b106/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list