Work-in-progress: 8236485: Epoch synchronization protocol for G1 concurrent refinement

Sat Apr 17 04:07:14 UTC 2021

Hi Erik,

Thank you for the feedback!

A thread that has not armed the poll value of remote JavaThreads, may
> not read said thread's state to draw any conclusions from it whatsoever.
> Even if you had, the way you can reason about remote thread states is
> typically only to prove threads are in blocked or native, not to prove
> they are in Java (excluding the vm state). And the way that the
> different transitions are fenced reflect that, so that only that can be
> determined. The reasoning is that the states are there mostly for
> safepoint/handshake synchronization. And they can wait for responsive
> threads, but not threads that won't respond (blocked and native).

Thanks for clarifying this design, and I understand it better now.
I found a solid example: the transition from in_java state to vm state has
no fence, so if the mutator store to Java heap is reordered after the store
to the thread state, it will cause problems.

I don't get why you want this filtering though. Seems like a premature
> optimization to in practice not poke threads that are blocked or in
> native. But those threads are handled efficiently by the handshaking
> mechanism anyway, as the handshaking mechanism already filters out such
> threads (but in a correct way), and when they come back they will
> quickly disarm themselves and continue. So I think the solution is to
> just handshake all threads and be done with it, letting the handshaking
> mechanism perform the proper filtering in a context where it is valid. I
> don't think it will make any perf difference.

This filtering is inherited from the old prototype. Specifically
"jthread->is_online_vm"
from here
<http://cr.openjdk.java.net/~eosterlund/g1_experiments/lazy_sync/webrev.07/src/share/vm/runtime/globalSynchronizer.cpp.html>
.
I found filtering by thread's state is effective to reduce both:
(1) the number of epoch synchronizations that must do handshake;
(2) the number of remote threads to handshake with.
This was important when the requesting thread needed to block and wait for
the handshake to finish, which was quite expensive.
Now the handshakes are fully asynchronous, so filtering by thread state
hopefully is not that important. I will try the handshaking with all
threads approach, but need to resolve an issue with AsyncHandshakeClosure
below.

Is it really though? Threads are in VM for a short time by design (not
> to impact time to safepoint), and threads in native will have the
> handshake operation performed by the handshaking thread (a no-op), and
> then continue. The handshakee will just see it needs to flip a bit when
> coming back from native. I can't understand why it would make a
> difference. Do you have numbers showing this?

Similar to above, this was partly due to the experience that it was
expensive to synchronously wait for handshakes to finish. I haven't
collected performance numbers from the asynchronous implementation yet, and
will collect it for _thread_in_vm first.

For handshaking with a thread in native and the "handshaking with all
threads" approach, I need to resolve this issue:
The epoch synchronization uses a no-op AsyncHandshakeClosure, but
Handshake::execute() doesn't execute such closure for a thread that is in
native or blocked. I agree with David's comment in JDK-8238761 that the
term "async handshake" is conflated, and probably need to implement
something new in handshake.cpp.

-Man