RFR: 8267079: Support async handshakes that can be executed by a remote thread

Man Cao manc at openjdk.java.net
Thu May 20 05:44:44 UTC 2021


On Thu, 20 May 2021 02:50:07 GMT, David Holmes <david.holmes at oracle.com> wrote:

> I thought we had to preserve the order of handshake operations? (In the
> same way the safepoint operations were previously well ordered.) If that
> is not the case ... there might be some subtle interactions there that
> might lead to very hard to diagnose bugs.

I'm not sure about this. Currently without this change, if there are both async self-executed ops and synchronous non-self executable ops on the queue, it would not preserve the execution order.
I think non-self executable ops and self-executed ops shouldn't depend on each other. It would be a misuse of handshakes if that happens.

> As I don't know anything about the epoch sync protocol I don't really
> understand the requirements here. If you are prepared to have some
> threads defer execution of the async handshake indefinitely (because
> they aren't blocked) then why do you need to ensure you update the
> counter for other threads, rather than have them do it themselves when
> they are able to execute the async handshake op?

The epoch sync protocol only needs the target thread to execute a memory fence, or become blocked. The purpose is to flush out all potential stores to Java heap, or establish a release-acquire edge from the target thread to requesting thread. It is essentially an asymmetric Dekker synchronization (see [this article](https://blogs.oracle.com/dave/qpi-quiescence)). The role of handshake is like a "membarrier" Linux syscall on the target thread. This is why the actual handshake op is a no-op, and the arm-the-poll-only approach as Robbin suggested is superior.

For this question, it is because:
- We want to minimize the number of deferred ops, because deferring means more work during the later GC pause. In fact, with a timeout of 2-3 milliseconds, it is extremely rare to have deferred ops already.
- A blocked thread may not come back to in_Java for a long time, so it will not execute any self-executed async op. If we don't handle them, most ops will become deferred if there's a blocked thread. In a realistic large server, it is common to have hundreds or thousands of Java threads, and a large portion of them are blocked on Object.wait() and rarely become running.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4005


More information about the hotspot-dev mailing list