RFR: 8189596: AArch64: implementation for Thread-local handshakes

Fri Nov 24 10:36:59 UTC 2017

Hi Andrew,

I am glad to see you also support the move to use Thread-local 
handshakes. Thank you for that.
I think your approach looks good. However, there is a certain race that 
I believe you are currently vulnerable to that could cause trouble for 
you. So I thought I'd point it out.

Specifically, when transitioning from native back to Java, there is a 
peculiar race condition on your platform unless more synchronization is 
applied. I would recommend loading the thread-local poll value when 
waking up from native with ldar instead to avoid subsequent checks for 
is_at_safepoint() on the thread waking up from native to provide the 
wrong result, which could cause MutexLockerEx that is conditionally 
taken only if we are not in a safepoint to hit the wrong path and not be 
taken as it still thinks you are in a safepoint.

The race is due to an interaction between the wake-up path of the native 
wrapper and the safepoint unsynchronizing path.

When SafepointSynchronize::end() starts, all JavaThreads except the ones 
in native and in blocked are blocked. Specifically there exists a race 
with the native wrapper trying to wake up from native while the 
SafepointSynchronizer is waking up from a safepoint.

The SafepointSynchronize::end() will conceptually perform the following 
important steps:

VM.1) SafepointSynchronize::_state = _not_synchronized
VM.2) For each thread: disarm local poll

Your native wrapper wakeup path filters slow paths on the local poll 
(without acquire() or loadload()), but no longer the global _state value.

So a JavaThread waking up from native could do:

JT.1) load local poll (with plain ldr), check if it is armed (it is not, 
because it observes the store from VM.1)
JT.2) release_store the thread state to Java with stlr
JT.3) enter arbitrary Java code
JT.4) make leaf call to the runtime from Java without thread transition
JT.5) Check if in a safepoint by loading SafepointSynchronize::_state

At JT.5, it is not guaranteed that the load observes the updated 
_not_synchronized value written in VM.1 (because the load in JT.1 and 
JT.5 may reorder). In this scenario the check to see whether you are in 
a safepoint might if very unlucky return true spuriously. From then on, 
bad things can happen.

By placing loading the local poll value with ldar *only* in the native 
wrapper wakeup function, you avoid these issues.
Another approach you may elect to use is to only in the native wrapper 
load both the thread-local poll value and the 
SafepointSynchronize::_state, when filtering slow paths to avoid this 
unfortunate race.

I have a bunch of other ideas as well exploiting dependent loads, but 
thought maybe I should start the conversation and see what you think 
before running off too much.

Again, very happy to see this move to use thread-local handshakes. 
Thanks for doing this.

Thanks,
/Erik

On 2017-11-23 15:12, Andrew Haley wrote:
> On 23/11/17 11:30, Andrew Haley wrote:
>> It was quite tricky to get this one right.
>>
>> Some notes:
>>
>> I deleted LIR_Assembler::poll_for_safepoint because it's dead code.
>>
>> I set ThreadLocalHandshakes to true.  This follows x86-64 and is
>> probably the right thing to do, but it does have the disadvantage that
>> it considerably extends the time to safepoint for interpreted code
>> because we only poll at branches.  I think this is probably a bad
>> idea, but again it's the same as x86-64.  OTOH, checking the safepoint
>> bit at every bytecode would seriously bloat the interpreter.  Oh dear.
>>
> http://cr.openjdk.java.net/~aph/8189596-1/
>