RFR: 8189596: AArch64: implementation for Thread-local handshakes
Erik Österlund
erik.osterlund at oracle.com
Fri Nov 24 13:36:22 UTC 2017
Hi Andrew,
On 2017-11-24 13:07, Andrew Dinn wrote:
> On 24/11/17 10:36, Erik Österlund wrote:
>> By placing loading the local poll value with ldar *only* in the native
>> wrapper wakeup function, you avoid these issues.
>> Another approach you may elect to use is to only in the native wrapper
>> load both the thread-local poll value and the
>> SafepointSynchronize::_state, when filtering slow paths to avoid this
>> unfortunate race.
> I can see why an ldar (actually ldarw) is needed when safepoint_poll is
> called from the nativewrapper. Can you explain why ldar is not needed
> for *all* calls to safepoint_poll?
That is a long story. :) But since you asked, here we go...
First we need to recognize that the following previous polls (with
different contexts):
1) Poll in JIT-compiled code (poll against the global page)
2) Poll in the VM for thread transitions
3) Poll in the interpreter (safepoint table)
4) Poll in the native wrapper
...have been replaced with polls reading the thread-local state. But
they have different races they are concerned with. Previously, the
global _state, the safepoint table, and polling page, have all been
manipulated in different places in SafepointSynchronize::begin/end,
causing interesting interactions now that it is replaced with one poll.
When we synchronize a safepoint, we have a fast-path poll on the local
poll, but if it is indeed armed, we *have* to use acquire before loading
the global state to check if we need to block or not. If the local poll
is disarmed on the other hand, then there is no harm with races with
synchronization of a safepoint. The lack of acquire can cause a stale
load of the global state. But the possible stale values when racing with
the synchronization is that you observe _not_synchronized instead of
_synchronizing, which is already racy and fine. So the race with
synchronization comes with the constraint that if the local poll is
armed, we have to acquire before reading the global state.
When we unsynchronize the safepoint, on the other hand, we are in a much
more dangerous situation. Unsynchronization races with transitions from
dormant states (blocked and native) to active states (Java and VM).
During the dormant to active transitions, reading a stale value of the
_state from SafepointSynchronize::end() is a disaster, because now a
stale load may observe _synchronized instead of !_synchronized values.
This must be prevented (to not get false positives on the query
is_at_safepoint()) by either a) making the load of the local poll use
acquire unconditionally (making sure subsequent loads of the _state are
not stale), or check both the local poll and global _state to make sure
you simply do not wake up when the stale state is observed to be
_synchronized.
So looking back at the scenarios we have replaced in order:
1) The JIT-compiled code races with SafepointSynchronize::begin(), not
end(), and hence only has the requirement that acquire() is
conditionally used before reading an armed poll value and checking if
the global _state is _synchronizing.
2) In the VM, the conservative standpoint is taken - the load uses
load_acquire unconditionally. Because it might be used for waking up
from dormant or transitioning between active states.
3) The interpreter scenario is similar to the JIT-compiled code in that
it races with SafepointSynchronize::begin(), and not end(). So similar
reasoning applies.
4) The poll in the native wrapper wakes up from a dormant state, and
hence races with SafepointSynchronize::end(). Therefore it comes with
the additional requirement of either using a load_acquire, or polling
both the local and global state to make sure stale loads do not cause us
to observe we are still in a safepoint when we are not.
The check for blocking in the VM currently uses acquire unconditionally,
but if you find that too expensive, it really only needs to
conditionally use acquire if the local poll is armed before reading the
global state, and unconditionally when waking up from a dormant state
(racing with SafepointSynchronize::end()). Those are the true restrictions.
I hope this sheds some light on the important races you need to be aware of.
Thanks,
/Erik
>> I have a bunch of other ideas as well exploiting dependent loads, but
>> thought maybe I should start the conversation and see what you think
>> before running off too much
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
More information about the hotspot-dev
mailing list