RFR(S): 8147844: new method j.l.Runtime.onSpinWait() and the corresponding x86 hotspot instrinsic
Vitaly Davidovich
vitalyd at gmail.com
Wed Jan 27 12:22:15 UTC 2016
On Wednesday, January 27, 2016, Igor Veresov <igor.veresov at oracle.com>
wrote:
>
> On Jan 26, 2016, at 9:35 PM, Vitaly Davidovich <vitalyd at gmail.com
> <javascript:_e(%7B%7D,'cvml','vitalyd at gmail.com');>> wrote:
>
>
>
> On Tuesday, January 26, 2016, Igor Veresov <igor.veresov at oracle.com
> <javascript:_e(%7B%7D,'cvml','igor.veresov at oracle.com');>> wrote:
>
>>
>> On Jan 26, 2016, at 8:08 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>>
>> You would but subsequent volatile load could move before the pause. If
>> you unroll the loop, you could (theoretically) end up with all loads moved
>> before the pause but all appearing ordered with respect to each other, eg:
>>
>> cmp addr, 0 // from iteration 1
>> je label
>> cmp addr, 0 // from iteration 2
>> je label
>> ...
>> pause
>>
>> What prevents that if pause is not a compiler member?
>>
>>
>> I think volatile loads explicitly depend on control. If the pause node
>> consumes and produces control it all should be in a rigid control chain.
>>
> Other regular loads (that don’t have control dependencies) would still be
>> free to move around.
>>
>
> Is this to avoid out of thin air values? That is, suppose you have:
>
> if (some condition)
> read volatile (or regular)
>
>
> Regular load can be scheduled before the if and result used if control
> reaches there. For volatile, load cannot be scheduled above the if since
> value can be bogus at that point?
>
>
> Right. Regular reads can move up anywhere to the preceding memory effect,
> that modified that alias index.
>
I wonder if that's required by JMM though. In my example above, if the
condition being read doesn't have volatile load semantics then it seems
there's no happens-before between the condition and the volatile load.
Your sentence above regarding modifying the alias index sort of makes it
sound like store-load forwarding by the compiler, allowing the read to be
skipped entirely (for regular loads), is that right or did I read too much
into it? If that's right, volatile loads cannot be eliminated so not quite
sure where that nets out.
I can see how volatile loads having control is a safe/conservative
implementation approach but I can also see how scheduling them
aggressively, when not prevented by other memory ordering, could be
beneficial.
>
>
> Is it safe for compiler to assume that something else anchors loads around
> the pause?
>
> That aside, given the intended usage, I'm not sure what other regular
> loads would be there. The usage is a tight spin loop waiting for exit
> condition to be met. Although I suppose if compiler sees regular loads
> after the loop exits successfully, perhaps scheduling them before the loop
> can be beneficial. Is that what you have in mind?
>
>
>
> No just simple stuff like:
>
> while(…) {
> a = x.f;
> pause();
> b = x.f;
> }
>
> If pause() is a wide memory kill, regular field loads around it obviously
> won’t fold. So in the example above those field loads are both going to be
> there. I realize it’s probably not a big deal in reality for the wait
> loops, but I was just wondering why make it a wide mem kill if membar nodes
> for volatiles (that will have to be in the loop) already have wide kill
> semantics.
>
> igor
>
>
>
>
>> igor
>>
>> On Tuesday, January 26, 2016, Igor Veresov <igor.veresov at oracle.com>
>> wrote:
>>
>>> Wouldn’t you use a volatile load for the memory location you’re polling?
>>>
>>> igor
>>>
>>> On Jan 26, 2016, at 6:15 PM, Vitaly Davidovich <vitalyd at gmail.com>
>>> wrote:
>>>
>>> Subsequent loads at this point will likely be polls of same memory
>>> location that just failed a test, and the author inserted a pause. It's
>>> unlikely that the memory changed that quickly and scheduling the next load
>>> before the pause is equivalent to two loads back to back essentially, which
>>> wouldn't make sense given the intended usage. There's also the risk that
>>> the compiler would move enough of those load+test pairs before the pause
>>> and fill up the speculative pipeline with them; that pipeline will need to
>>> be flushed once the spin exits since those load instructions likely
>>> speculated incorrectly. And here we're basically describing the reason for
>>> putting pause there in the first place :).
>>>
>>> On Tuesday, January 26, 2016, Igor Veresov <igor.veresov at oracle.com>
>>> wrote:
>>>
>>>> So, why does the new node have a memory effect? That would seem to
>>>> prevent any movement of the subsequent loads in your loop, right? If that’s
>>>> intentional I wonder why is that?
>>>>
>>>> igor
>>>>
>>>> On Jan 26, 2016, at 2:59 AM, Ivan Krylov <ivan at azulsystems.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Some of you may have a seen a few e-mails on the core-libs alias about
>>>> a proposed “spin wait hint”. The JEP is forming up nicely at
>>>> https://bugs.openjdk.java.net/browse/JDK-8147832. There seems to be a
>>>> consensus on the API side. It is now in a draft state and I hope this JEP
>>>> will get targeted for java 9 shortly. The upcoming API changes can be seen
>>>> at the webrev:
>>>> http://cr.openjdk.java.net/~ikrylov/8147844.jdk.00/
>>>>
>>>> At this time I would like to ask for a review of the hs-comp changes.
>>>> The plan is push changes into class libraries and hotspot synchronously but
>>>> that may happen after the JEP gets targeted.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8147844
>>>> Webrev: http://cr.openjdk.java.net/~ikrylov/8147844.hs.00/
>>>>
>>>> The idea of the fix is pretty simple: hotspot replaces a call to
>>>> java.lang.Runtime.onSpinWait() with an intrinsic that is effectively a
>>>> 'pause' instruction on x86. This intrinsic is guarded by the
>>>> -XX:±UseOnSpinWaitIntrinsic flag. For non-x86 platforms there is a
>>>> verification code that makes sure the flag is off, VM will just execute at
>>>> empty method java.lang.Runtime.onSpinWait() – effectively a no-op.
>>>> According the [1] the 'pause' instruction is functional since SSE2, but
>>>> even on CPUs prior to SSE2 the 'pause' instruction is a no-op and hence
>>>> harmless, there seems to be no need to add guarding code for older
>>>> generations of Intel CPUs.
>>>>
>>>> The proposed patch includes a simple regression test that simply makes
>>>> sure that method java.lang.Runtime.onSpinWait() gets intrinsified. There
>>>> are several other producer-consumer-like performance tests ready that the
>>>> authors of this JEP would be happy to make available under JEP-230 but I am
>>>> uncertain about the process.
>>>>
>>>> Thanks,
>>>>
>>>> Ivan
>>>>
>>>> [1] -
>>>> https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops
>>>>
>>>>
>>>>
>>>
>>> --
>>> Sent from my phone
>>>
>>>
>>>
>>
>> --
>> Sent from my phone
>>
>>
>>
>
> --
> Sent from my phone
>
>
>
--
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160127/a3e262af/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list