[Question][ZGC] handshakeAllThreads vs. Per-Thread Handshakes During Mark Phase Termination

Thu Sep 11 11:08:43 UTC 2025

> If so, what is the difference between thread-local handshakes (i.e., handshaking with one thread at a time) and handshakeAllThreads?

I suspect you probably misunderstood `handshakeAllThreads` -- it's performing handshake, one thread at a time, with all threads, NOT, sync with all-threads altogether. The latter would be a safepoint, which is often more expensive than handshake.

In `VM_HandshakeAllThreads`, you can see `bool evaluate_at_safepoint() const { return false; }`.

More on safepoint in https://www.youtube.com/watch?v=JkbWPPNc4SI from Markus Grönlund, just fyi.

PS: the paper you mentioned is about non-generational ZGC, which has been removed from the code base via https://bugs.openjdk.org/browse/JDK-8335850.

/Albert

________________________________________
From: hotspot-gc-dev <hotspot-gc-dev-retn at openjdk.org> on behalf of O Sato <oh.sato at ntt.com>
Sent: Thursday, September 11, 2025 11:58
To: hotspot-gc-dev at openjdk.org
Subject: [Question][ZGC] handshakeAllThreads vs. Per-Thread Handshakes During Mark Phase Termination

Hello everyone,
I’m trying to better understand how ZGC performs thread coordination at the end of the marking phase.
My questions are:

  *   Is handshakeAllThreads used at the end of the mark phase in ZGC?
  *   If so, what is the difference between thread-local handshakes (i.e., handshaking with one thread at a time) and handshakeAllThreads?
  *   Are thread-local handshaking and handshakeAllThreads fundamentally different mechanisms, or just variations of the same mechanism?
  *   What are the trade-offs or reasons for choosing one over the other in this context?
Background and observations:
While reading the latest ZGC source code, I came across the following call chain near the end of the mark phase:
https://github.com/openjdk/jdk/blob/f4d73d2a3dbeccfd04d49c0cfd690086edd0544f/src/hotspot/share/gc/z/zRemembered.cpp#L561C1-L561C49
ZRemembered::scan_and_follow(ZMark* mark)
→ ZMark::try_terminate_flush()
→ ZMark::flush()
→ Handshake::execute()
This led me to notice that handshakeAllThreads appears to be used during this process.
I was curious about this, as I wondered whether using a global handshake (with all threads) might contribute to observable latency in some cases—although I’m not sure how significant this might be in practice.
In contrast, in the following paper:
Albert Mingkun Yang and Tobias Wrigstad,
“Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK”,
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2022.
https://dl.acm.org/doi/10.1145/3538532
It is stated in Section 3.4 (STW2: The End of the Marking Phase) that:
"... thread-local handshaking with each mutator (one mutator at a time) is performed to check for the presence of any to-be-marked objects before attempting an STW pause; this reduces the probability of entering STW2 prematurely."
This seems to suggest a more incremental approach (per-thread handshaking), which could be helpful in minimizing pauses.
Hence my questions above — I’d appreciate any clarification about the actual behavior and the design decisions behind the use of handshakeAllThreads vs. thread-local handshakes.
Thank you!

==================
NTT R&D
Oh Sato
oh.sato at ntt.com