RFR: 8267079: Support async handshakes that can be executed by a remote thread [v2]

Thu May 20 02:47:57 UTC 2021

On 20/05/2021 7:35 am, Man Cao wrote:
> On Tue, 18 May 2021 19:08:08 GMT, Man Cao <manc at openjdk.org> wrote:
> 
>>> Hi all,
>>>
>>> Can I have reviews for this small refactoring change? It resolves a pending concern from [JDK-8238761](https://bugs.openjdk.java.net/browse/JDK-8238761), clarifies the code and allows more use case of async handshakes. See [JDK-8267079](https://bugs.openjdk.java.net/browse/JDK-8267079) for detailed description.
>>>
>>> -Man
>>
>> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
>>
>>    Added missing deallocation and renamed "remote" to "non-self".
> 
> Re David's comment:
> 
>> Though if there is a self-only async op in the queue ahead of that, then you won't be able to proceed anyway.
> 
> I don't quite understand this part. After this change, in `HandshakeState::try_process()`, the do...while loop calls `HandshakeState::pop()` inside, which will skip those self-only async ops, right?  Then it can still process those non-self executable async ops.

I thought we had to preserve the order of handshake operations? (In the 
same way the safepoint operations were previously well ordered.) If that 
is not the case ... there might be some subtle interactions there that 
might lead to very hard to diagnose bugs.

>> Not really getting this part. Why do you want to skip blocked threads?
>> ...
>> if a thread is blocked then it is handshake-safe and so the handshake should be fast.
>> Or are you concerned that your handshake won't be able to proceed
>> because of what else may be in a target thread's op queue, and you want
>> to skip that target in that case?
> 
> Perhaps "skipping" is a bad choice of word. In the latest implementation for epoch sync protocol, the requesting thread still issues the async handshake to blocked threads. The requesting thread then checks whether the target thread is blocked, if so, it updates the target thread's epoch counter on behalf of the target thread, then the epoch sync is done for this target thread. This is the same as the way that  `Handshake::execute()` executes a synchronous handshake op on behalf of a blocked target thread. If the target thread is not blocked, the requesting thread can only wait, and possibly time out and give up.

As I don't know anything about the epoch sync protocol I don't really 
understand the requirements here. If you are prepared to have some 
threads defer execution of the async handshake indefinitely (because 
they aren't blocked) then why do you need to ensure you update the 
counter for other threads, rather than have them do it themselves when 
they are able to execute the async handshake op?

> Currently just running `Handshake::execute()` for a target thread could run into a safepoint, and it does not support timeout or giving up. These behaviors do not work for G1's refinement. This is the reason I'm using enqueue-only async submission, and also need a way to check for threads that are blocked, in order to quickly finish the epoch sync with blocked threads.

I get the need for async submission. I don't get why you need to do 
something for the blocked threads rather than them doing it themselves 
later when they unblock. Though that is a distinct issue from whether 
the mechanism should exist.

Cheers,
David
-----

> With the arm-the-poll-only approach Robbin suggested, I think it still needs a way to check for blocked threads, and update the blocked thread's epoch counter on behalf of the blocked thread. I could do that in the change for JDK-8236485, or split it up in another RFE.
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/4005
>