RFR: 8267079: Support async handshakes that can be executed by a remote thread [v2]

Thu May 20 07:33:30 UTC 2021

On Tue, 18 May 2021 19:08:08 GMT, Man Cao <manc at openjdk.org> wrote:

>> Hi all,
>> 
>> Can I have reviews for this small refactoring change? It resolves a pending concern from [JDK-8238761](https://bugs.openjdk.java.net/browse/JDK-8238761), clarifies the code and allows more use case of async handshakes. See [JDK-8267079](https://bugs.openjdk.java.net/browse/JDK-8267079) for detailed description.
>> 
>> -Man
>
> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Added missing deallocation and renamed "remote" to "non-self".

Thanks again. Yes, the discussion here is really helpful.

> But I'm not at all convinced that there may not be any ordering
> dependencies ever. Maybe it is a misuse, but how hard will be it be to
> spot this misuse, or debug it ? (rhetorical question)

Do we agree that for this change we don't need to preserve the order yet? Preserving the order probably needs a separate RFE and more discussion. It should be considered again if there are more async non-self executable ops in the future.

> If the deferred op is a no-op then I'm not sure how this creates work
> for a later GC pause. Assume all the Java threads are executing in
> native and stay there for a long time - why should that impact the GC's
> work?

The actual work is to refine the dirty card. There are 3 major steps to refine a dirty card with G1 epoch sync:
1. Clean the card
2. Do the epoch sync protocol with all Java threads
3. If the epoch sync is successful, refine the cleaned card. If unsuccessful, defer the card to be checked and refined later, or to be processed in the next incremental GC pause.

If we defer too many cards, the refinement will not make much progress, and the work goes into the GC pause.

> So taking an extreme example, if a thread is blocked for a few minutes
> (or equivalently, but less likely, is in native) then you are concerned
> that many of these epoch-sync async ops will accumulate, and that could
> cause memory pressure and slowdown the thread's return to Java. I can
> see that is a concern. But the first thought I had in relation to this
> problem was that perhaps we need to introduce the notion of coalescable
> operations: if all epoch-sync operations are equivalent then you only
> need at most one to get enqueued. Of course then we have to scan the
> queue for an existing occurrence. But that seems more general a solution
> to unbounded deferred operations than introducing a way to "skip"
> blocked threads.

Yes, if the epoch sync protocol stays with using handshake, it would ideally need such a coalescable handshake op. However, the suggested arm-the-poll-only approach can replace handshakes for the epoch sync protocol, and is coalescable by itself. If the target's poll is already armed, arming it again doesn't do anything. This approach also avoids allocating any `HandshakeClosure` or `HandshakeOperation` altogether.

There is the theoretical problem of having unbounded deferred cards. It is not a concern in practice because deferred cards are extremely rare, and GC pauses happen frequently which free up those deferred cards. Also I think G1DirtyCardQueue and SATBMarkQueue are already unbounded, and deferred cards are much fewer than them.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4005