RFR: 8189596: AArch64: implementation for Thread-local handshakes

Fri Nov 24 13:36:22 UTC 2017

Hi Andrew,

On 2017-11-24 13:07, Andrew Dinn wrote:
> On 24/11/17 10:36, Erik Österlund wrote:
>> By placing loading the local poll value with ldar *only* in the native
>> wrapper wakeup function, you avoid these issues.
>> Another approach you may elect to use is to only in the native wrapper
>> load both the thread-local poll value and the
>> SafepointSynchronize::_state, when filtering slow paths to avoid this
>> unfortunate race.
> I can see why an ldar (actually ldarw) is needed when safepoint_poll is
> called from the nativewrapper. Can you explain why ldar is not needed
> for *all* calls to safepoint_poll?

That is a long story. :) But since you asked, here we go...

First we need to recognize that the following previous polls (with 
different contexts):

1) Poll in JIT-compiled code (poll against the global page)
2) Poll in the VM for thread transitions
3) Poll in the interpreter (safepoint table)
4) Poll in the native wrapper

...have been replaced with polls reading the thread-local state. But 
they have different races they are concerned with. Previously, the 
global _state, the safepoint table, and polling page, have all been 
manipulated in different places in SafepointSynchronize::begin/end, 
causing interesting interactions now that it is replaced with one poll.

When we synchronize a safepoint, we have a fast-path poll on the local 
poll, but if it is indeed armed, we *have* to use acquire before loading 
the global state to check if we need to block or not. If the local poll 
is disarmed on the other hand, then there is no harm with races with 
synchronization of a safepoint. The lack of acquire can cause a stale 
load of the global state. But the possible stale values when racing with 
the synchronization is that you observe _not_synchronized instead of 
_synchronizing, which is already racy and fine. So the race with 
synchronization comes with the constraint that if the local poll is 
armed, we have to acquire before reading the global state.

When we unsynchronize the safepoint, on the other hand, we are in a much 
more dangerous situation. Unsynchronization races with transitions from 
dormant states (blocked and native) to active states (Java and VM). 
During the dormant to active transitions, reading a stale value of the 
_state from SafepointSynchronize::end() is a disaster, because now a 
stale load may observe _synchronized instead of !_synchronized values. 
This must be prevented (to not get false positives on the query 
is_at_safepoint()) by either a) making the load of the local poll use 
acquire unconditionally (making sure subsequent loads of the _state are 
not stale), or check both the local poll and global _state to make sure 
you simply do not wake up when the stale state is observed to be 
_synchronized.

So looking back at the scenarios we have replaced in order:

1) The JIT-compiled code races with SafepointSynchronize::begin(), not 
end(), and hence only has the requirement that acquire() is 
conditionally used before reading an armed poll value and checking if 
the global _state is _synchronizing.
2) In the VM, the conservative standpoint is taken - the load uses 
load_acquire unconditionally. Because it might be used for waking up 
from dormant or transitioning between active states.
3) The interpreter scenario is similar to the JIT-compiled code in that 
it races with SafepointSynchronize::begin(), and not end(). So similar 
reasoning applies.
4) The poll in the native wrapper wakes up from a dormant state, and 
hence races with SafepointSynchronize::end(). Therefore it comes with 
the additional requirement of either using a load_acquire, or polling 
both the local and global state to make sure stale loads do not cause us 
to observe we are still in a safepoint when we are not.

The check for blocking in the VM currently uses acquire unconditionally, 
but if you find that too expensive, it really only needs to 
conditionally use acquire if the local poll is armed before reading the 
global state, and unconditionally when waking up from a dormant state 
(racing with SafepointSynchronize::end()). Those are the true restrictions.

I hope this sheds some light on the important races you need to be aware of.

Thanks,
/Erik

>> I have a bunch of other ideas as well exploiting dependent loads, but
>> thought maybe I should start the conversation and see what you think
>> before running off too much
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander