RFR: 8227407: ZGC: C2 loads and load barriers can get separated by safepoints
Erik Österlund
erik.osterlund at oracle.com
Thu Jul 18 13:46:10 UTC 2019
Hi,
The idea of "8224675: Late GC barrier insertion for ZGC" was to expand
load barriers much later, when loads have stopped floating around. The
reason we wanted to do that is so that when we expand load barriers, we
won't get safepoints between loads and their load barrier.
Turns out though that we still get safepoints between loads and their
corresponding load barriers. This happens when a load with an expanded
barrier state ends up in the same block as a safepoint.
Here are reasons why this can happen that I found:
1) We don't pin the loads to their block. By that I mean make sure that
it has the block as in(MemNode::Control), as well as have their
_control_dependency == Pinned. Because of that, e.g. final graph
reshaping of CastPP nodes, move their memory (load) uses up to the
CastPP control block. That causes the load to move in to a block that
may have safepoints in it, breaking the invariant. I now explicitly pin
the loads to their block during late barrier injection to prevent such
floating.
2) Despite pinning the loads after barrier injection, there is an IGVN
transformation that explicitly moves loads above safepoints, without
respecting if they are pinned. I changed the LoadNode::Ideal function to
only move loads past safepoints if they are not pinned, to prevent that.
This is the only change affecting non-ZGC code. But it seems to me that
floating a pinned load above a safepoint is illegal in any execution
mode and is not a ZGC-related problem. Moving for example the G1 loads
that checks if marking is active past a safepoint will crash the VM.
3) Late barrier expansion isn't quite late enough. Barriers are expanded
before macro expansion. Macro nodes such as locking macro nodes may
expand into a bunch of nodes, including safepoints. I moved late barrier
expansion to after macro expansion to solve that. It also should be
after macro expansion for yet another reason: macro nodes my expand to
loads that will require barriers. For example the "array copy" basic
clone (i.e. java.lang.Object.clone() intrinsic) node does that. Now we
have cloning disabled in ZGC right now, but should be able to enable it
after this. That will be a later fix though for 14.
4) The original block of the load where we inject the LoadBarrier and
splice in a successor control block (which will end up after the
barrier), may already have safepoints in it. I solved this issue by
moving the safepoint node to the successor control block that we
injected as part of barrier injection. This ensures that the load and
the safepoint don't end up in the same block.
I found these issues by injecting quite intrusive verification code into
C2. Unfortunately I think the verification code I wrote is too intrusive
to upstream, so I would like to upstream just the fix to 13, and rework
less intrusive verification code for 14 when there is more time to think
about how to do that, if that is okay. I've done what I believe is the
minimal fix to the observed problem, with the minimal risk.
The test that detected the issue originally,
"Compact_NonbranchyTree_ArrayOf", failed with root verification errors.
Probability of failure after Stefan Karlsson's improved verification
code seemed to be about 25% per run. With this patch, I have run 100
iterations without failure. I have also run this through gc-test-suite
with -Xcomp, and I am currently running through tier-1-6.
Bug:
https://bugs.openjdk.java.net/browse/JDK-8227407
Webrev:
http://cr.openjdk.java.net/~eosterlund/8227407/webrev.01/
Thanks,
/Erik
More information about the hotspot-compiler-dev
mailing list