RFR: 8227407: ZGC: C2 loads and load barriers can get separated by safepoints

Erik Österlund erik.osterlund at oracle.com
Fri Jul 19 07:41:06 UTC 2019


Hi Stuart,

Thank you for taking this for a spin. And yes, I think the appropriate 
place to put in such verification code would be in

bs->verify_gc_barriers(this, BarrierSetC2::BeforeCodeGen);

which is called right after Compile::Optimize() and before lowering to 
machine nodes.
I'll give it a shot to add some verification code there.

Thanks,
/Erik

On 2019-07-18 17:34, Stuart Monteith wrote:
> Hello Erik,
>     I'll try this against Aarch64 and let you know what I get. I'm
> thinking the safepoints could be examined in the assembler to see if
> they are splitting the loads and their barriers.
> Is there a pass where we could enforce this as an invariant?
>
> BR,
>     Stuart
>
> On Thu, 18 Jul 2019 at 15:46, Erik Österlund <erik.osterlund at oracle.com> wrote:
>> Hi,
>>
>> The idea of "8224675: Late GC barrier insertion for ZGC" was to expand
>> load barriers much later, when loads have stopped floating around. The
>> reason we wanted to do that is so that when we expand load barriers, we
>> won't get safepoints between loads and their load barrier.
>>
>> Turns out though that we still get safepoints between loads and their
>> corresponding load barriers. This happens when a load with an expanded
>> barrier state ends up in the same block as a safepoint.
>>
>> Here are reasons why this can happen that I found:
>>
>> 1) We don't pin the loads to their block. By that I mean make sure that
>> it has the block as in(MemNode::Control), as well as have their
>> _control_dependency == Pinned. Because of that, e.g. final graph
>> reshaping of CastPP nodes, move their memory (load) uses up to the
>> CastPP control block. That causes the load to move in to a block that
>> may have safepoints in it, breaking the invariant. I now explicitly pin
>> the loads to their block during late barrier injection to prevent such
>> floating.
>>
>> 2) Despite pinning the loads after barrier injection, there is an IGVN
>> transformation that explicitly moves loads above safepoints, without
>> respecting if they are pinned. I changed the LoadNode::Ideal function to
>> only move loads past safepoints if they are not pinned, to prevent that.
>> This is the only change affecting non-ZGC code. But it seems to me that
>> floating a pinned load above a safepoint is illegal in any execution
>> mode and is not a ZGC-related problem. Moving for example the G1 loads
>> that checks if marking is active past a safepoint will crash the VM.
>>
>> 3) Late barrier expansion isn't quite late enough. Barriers are expanded
>> before macro expansion. Macro nodes such as locking macro nodes may
>> expand into a bunch of nodes, including safepoints. I moved late barrier
>> expansion to after macro expansion to solve that. It also should be
>> after macro expansion for yet another reason: macro nodes my expand to
>> loads that will require barriers. For example the "array copy" basic
>> clone (i.e. java.lang.Object.clone() intrinsic) node does that. Now we
>> have cloning disabled in ZGC right now, but should be able to enable it
>> after this. That will be a later fix though for 14.
>>
>> 4) The original block of the load where we inject the LoadBarrier and
>> splice in a successor control block (which will end up after the
>> barrier), may already have safepoints in it. I solved this issue by
>> moving the safepoint node to the successor control block that we
>> injected as part of barrier injection. This ensures that the load and
>> the safepoint don't end up in the same block.
>>
>> I found these issues by injecting quite intrusive verification code into
>> C2. Unfortunately I think the verification code I wrote is too intrusive
>> to upstream, so I would like to upstream just the fix to 13, and rework
>> less intrusive verification code for 14 when there is more time to think
>> about how to do that, if that is okay. I've done what I believe is the
>> minimal fix to the observed problem, with the minimal risk.
>>
>> The test that detected the issue originally,
>> "Compact_NonbranchyTree_ArrayOf", failed with root verification errors.
>> Probability of failure after Stefan Karlsson's improved verification
>> code seemed to be about 25% per run. With this patch, I have run 100
>> iterations without failure. I have also run this through gc-test-suite
>> with -Xcomp, and I am currently running through tier-1-6.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8227407
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8227407/webrev.01/
>>
>> Thanks,
>> /Erik



More information about the hotspot-compiler-dev mailing list