RFR: 8227407: ZGC: C2 loads and load barriers can get separated by safepoints

Erik Osterlund erik.osterlund at oracle.com
Fri Jul 19 16:16:01 UTC 2019


Hi Nils,

Thanks for the review!

/Erik

> On 19 Jul 2019, at 17:02, Nils Eliasson <nils.eliasson at oracle.com> wrote:
> 
> Hi Erik,
> 
> Looks good!
> 
> // Nils
> 
>> On 2019-07-19 15:21, Erik Österlund wrote:
>> Hi,
>> 
>> As promised, I wrote some verification code triggered right before code gen.
>> 
>> Updated webrev:
>> http://cr.openjdk.java.net/~eosterlund/8227407/webrev.02/
>> 
>> I verified that this verification code catches the issues I identified before my fix is applied, and doesn't trigger after my fix is applied.
>> This type of verification won't show you where you went wrong, but it will show you that there is a problem (somewhere) when it arises.
>> Managed to get through hs-tier1-6 with this patch (and verification enabled)
>> 
>> Thanks,
>> /Erik
>> 
>>> On 2019-07-19 09:41, Erik Österlund wrote:
>>> Hi Stuart,
>>> 
>>> Thank you for taking this for a spin. And yes, I think the appropriate place to put in such verification code would be in
>>> 
>>> bs->verify_gc_barriers(this, BarrierSetC2::BeforeCodeGen);
>>> 
>>> which is called right after Compile::Optimize() and before lowering to machine nodes.
>>> I'll give it a shot to add some verification code there.
>>> 
>>> Thanks,
>>> /Erik
>>> 
>>>> On 2019-07-18 17:34, Stuart Monteith wrote:
>>>> Hello Erik,
>>>>     I'll try this against Aarch64 and let you know what I get. I'm
>>>> thinking the safepoints could be examined in the assembler to see if
>>>> they are splitting the loads and their barriers.
>>>> Is there a pass where we could enforce this as an invariant?
>>>> 
>>>> BR,
>>>>     Stuart
>>>> 
>>>>> On Thu, 18 Jul 2019 at 15:46, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> The idea of "8224675: Late GC barrier insertion for ZGC" was to expand
>>>>> load barriers much later, when loads have stopped floating around. The
>>>>> reason we wanted to do that is so that when we expand load barriers, we
>>>>> won't get safepoints between loads and their load barrier.
>>>>> 
>>>>> Turns out though that we still get safepoints between loads and their
>>>>> corresponding load barriers. This happens when a load with an expanded
>>>>> barrier state ends up in the same block as a safepoint.
>>>>> 
>>>>> Here are reasons why this can happen that I found:
>>>>> 
>>>>> 1) We don't pin the loads to their block. By that I mean make sure that
>>>>> it has the block as in(MemNode::Control), as well as have their
>>>>> _control_dependency == Pinned. Because of that, e.g. final graph
>>>>> reshaping of CastPP nodes, move their memory (load) uses up to the
>>>>> CastPP control block. That causes the load to move in to a block that
>>>>> may have safepoints in it, breaking the invariant. I now explicitly pin
>>>>> the loads to their block during late barrier injection to prevent such
>>>>> floating.
>>>>> 
>>>>> 2) Despite pinning the loads after barrier injection, there is an IGVN
>>>>> transformation that explicitly moves loads above safepoints, without
>>>>> respecting if they are pinned. I changed the LoadNode::Ideal function to
>>>>> only move loads past safepoints if they are not pinned, to prevent that.
>>>>> This is the only change affecting non-ZGC code. But it seems to me that
>>>>> floating a pinned load above a safepoint is illegal in any execution
>>>>> mode and is not a ZGC-related problem. Moving for example the G1 loads
>>>>> that checks if marking is active past a safepoint will crash the VM.
>>>>> 
>>>>> 3) Late barrier expansion isn't quite late enough. Barriers are expanded
>>>>> before macro expansion. Macro nodes such as locking macro nodes may
>>>>> expand into a bunch of nodes, including safepoints. I moved late barrier
>>>>> expansion to after macro expansion to solve that. It also should be
>>>>> after macro expansion for yet another reason: macro nodes my expand to
>>>>> loads that will require barriers. For example the "array copy" basic
>>>>> clone (i.e. java.lang.Object.clone() intrinsic) node does that. Now we
>>>>> have cloning disabled in ZGC right now, but should be able to enable it
>>>>> after this. That will be a later fix though for 14.
>>>>> 
>>>>> 4) The original block of the load where we inject the LoadBarrier and
>>>>> splice in a successor control block (which will end up after the
>>>>> barrier), may already have safepoints in it. I solved this issue by
>>>>> moving the safepoint node to the successor control block that we
>>>>> injected as part of barrier injection. This ensures that the load and
>>>>> the safepoint don't end up in the same block.
>>>>> 
>>>>> I found these issues by injecting quite intrusive verification code into
>>>>> C2. Unfortunately I think the verification code I wrote is too intrusive
>>>>> to upstream, so I would like to upstream just the fix to 13, and rework
>>>>> less intrusive verification code for 14 when there is more time to think
>>>>> about how to do that, if that is okay. I've done what I believe is the
>>>>> minimal fix to the observed problem, with the minimal risk.
>>>>> 
>>>>> The test that detected the issue originally,
>>>>> "Compact_NonbranchyTree_ArrayOf", failed with root verification errors.
>>>>> Probability of failure after Stefan Karlsson's improved verification
>>>>> code seemed to be about 25% per run. With this patch, I have run 100
>>>>> iterations without failure. I have also run this through gc-test-suite
>>>>> with -Xcomp, and I am currently running through tier-1-6.
>>>>> 
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227407
>>>>> 
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8227407/webrev.01/
>>>>> 
>>>>> Thanks,
>>>>> /Erik
>>> 
>> 



More information about the hotspot-compiler-dev mailing list