RFR: 8227407: ZGC: C2 loads and load barriers can get separated by safepoints

Thu Jul 18 13:46:10 UTC 2019

Hi,

The idea of "8224675: Late GC barrier insertion for ZGC" was to expand 
load barriers much later, when loads have stopped floating around. The 
reason we wanted to do that is so that when we expand load barriers, we 
won't get safepoints between loads and their load barrier.

Turns out though that we still get safepoints between loads and their 
corresponding load barriers. This happens when a load with an expanded 
barrier state ends up in the same block as a safepoint.

Here are reasons why this can happen that I found:

1) We don't pin the loads to their block. By that I mean make sure that 
it has the block as in(MemNode::Control), as well as have their 
_control_dependency == Pinned. Because of that, e.g. final graph 
reshaping of CastPP nodes, move their memory (load) uses up to the 
CastPP control block. That causes the load to move in to a block that 
may have safepoints in it, breaking the invariant. I now explicitly pin 
the loads to their block during late barrier injection to prevent such 
floating.

2) Despite pinning the loads after barrier injection, there is an IGVN 
transformation that explicitly moves loads above safepoints, without 
respecting if they are pinned. I changed the LoadNode::Ideal function to 
only move loads past safepoints if they are not pinned, to prevent that. 
This is the only change affecting non-ZGC code. But it seems to me that 
floating a pinned load above a safepoint is illegal in any execution 
mode and is not a ZGC-related problem. Moving for example the G1 loads 
that checks if marking is active past a safepoint will crash the VM.

3) Late barrier expansion isn't quite late enough. Barriers are expanded 
before macro expansion. Macro nodes such as locking macro nodes may 
expand into a bunch of nodes, including safepoints. I moved late barrier 
expansion to after macro expansion to solve that. It also should be 
after macro expansion for yet another reason: macro nodes my expand to 
loads that will require barriers. For example the "array copy" basic 
clone (i.e. java.lang.Object.clone() intrinsic) node does that. Now we 
have cloning disabled in ZGC right now, but should be able to enable it 
after this. That will be a later fix though for 14.

4) The original block of the load where we inject the LoadBarrier and 
splice in a successor control block (which will end up after the 
barrier), may already have safepoints in it. I solved this issue by 
moving the safepoint node to the successor control block that we 
injected as part of barrier injection. This ensures that the load and 
the safepoint don't end up in the same block.

I found these issues by injecting quite intrusive verification code into 
C2. Unfortunately I think the verification code I wrote is too intrusive 
to upstream, so I would like to upstream just the fix to 13, and rework 
less intrusive verification code for 14 when there is more time to think 
about how to do that, if that is okay. I've done what I believe is the 
minimal fix to the observed problem, with the minimal risk.

The test that detected the issue originally, 
"Compact_NonbranchyTree_ArrayOf", failed with root verification errors. 
Probability of failure after Stefan Karlsson's improved verification 
code seemed to be about 25% per run. With this patch, I have run 100 
iterations without failure. I have also run this through gc-test-suite 
with -Xcomp, and I am currently running through tier-1-6.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8227407

Webrev:
http://cr.openjdk.java.net/~eosterlund/8227407/webrev.01/

Thanks,
/Erik