RFR: 8267079: Support async handshakes that can be executed by a remote thread
David Holmes
david.holmes at oracle.com
Thu May 20 06:49:50 UTC 2021
On 20/05/2021 3:44 pm, Man Cao wrote:
> On Thu, 20 May 2021 02:50:07 GMT, David Holmes <david.holmes at oracle.com> wrote:
>
>> I thought we had to preserve the order of handshake operations? (In the
>> same way the safepoint operations were previously well ordered.) If that
>> is not the case ... there might be some subtle interactions there that
>> might lead to very hard to diagnose bugs.
>
> I'm not sure about this. Currently without this change, if there are both async self-executed ops and synchronous non-self executable ops on the queue, it would not preserve the execution order.
> I think non-self executable ops and self-executed ops shouldn't depend on each other. It would be a misuse of handshakes if that happens.
Hmmm. I had a mental model where order of ops was preserved. Introducing
non-determinism in the order in which ops are executed seems potentially
fragile - and a new mode of operation compared to safepoint VM
operations. That said, taking suspension as an an example, if the target
thread is off in native when suspended, and so could not process the
async handshake op for suspension yet, then we would still want a
synchronous handshake op to dump its stack to work.
But I'm not at all convinced that there may not be any ordering
dependencies ever. Maybe it is a misuse, but how hard will be it be to
spot this misuse, or debug it ? (rhetorical question)
>> As I don't know anything about the epoch sync protocol I don't really
>> understand the requirements here. If you are prepared to have some
>> threads defer execution of the async handshake indefinitely (because
>> they aren't blocked) then why do you need to ensure you update the
>> counter for other threads, rather than have them do it themselves when
>> they are able to execute the async handshake op?
>
> The epoch sync protocol only needs the target thread to execute a memory fence, or become blocked. The purpose is to flush out all potential stores to Java heap, or establish a release-acquire edge from the target thread to requesting thread. It is essentially an asymmetric Dekker synchronization (see [this article](https://blogs.oracle.com/dave/qpi-quiescence)). The role of handshake is like a "membarrier" Linux syscall on the target thread. This is why the actual handshake op is a no-op, and the arm-the-poll-only approach as Robbin suggested is superior.
>
> For this question, it is because:
> - We want to minimize the number of deferred ops, because deferring means more work during the later GC pause. In fact, with a timeout of 2-3 milliseconds, it is extremely rare to have deferred ops already.
If the deferred op is a no-op then I'm not sure how this creates work
for a later GC pause. Assume all the Java threads are executing in
native and stay there for a long time - why should that impact the GC's
work?
> - A blocked thread may not come back to in_Java for a long time, so it will not execute any self-executed async op. If we don't handle them, most ops will become deferred if there's a blocked thread. In a realistic large server, it is common to have hundreds or thousands of Java threads, and a large portion of them are blocked on Object.wait() and rarely become running.
So taking an extreme example, if a thread is blocked for a few minutes
(or equivalently, but less likely, is in native) then you are concerned
that many of these epoch-sync async ops will accumulate, and that could
cause memory pressure and slowdown the thread's return to Java. I can
see that is a concern. But the first thought I had in relation to this
problem was that perhaps we need to introduce the notion of coalescable
operations: if all epoch-sync operations are equivalent then you only
need at most one to get enqueued. Of course then we have to scan the
queue for an existing occurrence. But that seems more general a solution
to unbounded deferred operations than introducing a way to "skip"
blocked threads.
I think it is important to flesh out the requirements here to ensure
we're making strategic design decisions about the overall architecture
of the handshake mechanism, rather than just trying to tweak the
mechanism to support a specific use case. So sorry for the delay this
adds, but I think the discussions are important.
Thanks,
David
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/4005
>
More information about the hotspot-dev
mailing list