RFR: 8227407: ZGC: C2 loads and load barriers can get separated by safepoints
Erik Osterlund
erik.osterlund at oracle.com
Fri Jul 19 16:16:01 UTC 2019
Hi Nils,
Thanks for the review!
/Erik
> On 19 Jul 2019, at 17:02, Nils Eliasson <nils.eliasson at oracle.com> wrote:
>
> Hi Erik,
>
> Looks good!
>
> // Nils
>
>> On 2019-07-19 15:21, Erik Österlund wrote:
>> Hi,
>>
>> As promised, I wrote some verification code triggered right before code gen.
>>
>> Updated webrev:
>> http://cr.openjdk.java.net/~eosterlund/8227407/webrev.02/
>>
>> I verified that this verification code catches the issues I identified before my fix is applied, and doesn't trigger after my fix is applied.
>> This type of verification won't show you where you went wrong, but it will show you that there is a problem (somewhere) when it arises.
>> Managed to get through hs-tier1-6 with this patch (and verification enabled)
>>
>> Thanks,
>> /Erik
>>
>>> On 2019-07-19 09:41, Erik Österlund wrote:
>>> Hi Stuart,
>>>
>>> Thank you for taking this for a spin. And yes, I think the appropriate place to put in such verification code would be in
>>>
>>> bs->verify_gc_barriers(this, BarrierSetC2::BeforeCodeGen);
>>>
>>> which is called right after Compile::Optimize() and before lowering to machine nodes.
>>> I'll give it a shot to add some verification code there.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> On 2019-07-18 17:34, Stuart Monteith wrote:
>>>> Hello Erik,
>>>> I'll try this against Aarch64 and let you know what I get. I'm
>>>> thinking the safepoints could be examined in the assembler to see if
>>>> they are splitting the loads and their barriers.
>>>> Is there a pass where we could enforce this as an invariant?
>>>>
>>>> BR,
>>>> Stuart
>>>>
>>>>> On Thu, 18 Jul 2019 at 15:46, Erik Österlund <erik.osterlund at oracle.com> wrote:
>>>>> Hi,
>>>>>
>>>>> The idea of "8224675: Late GC barrier insertion for ZGC" was to expand
>>>>> load barriers much later, when loads have stopped floating around. The
>>>>> reason we wanted to do that is so that when we expand load barriers, we
>>>>> won't get safepoints between loads and their load barrier.
>>>>>
>>>>> Turns out though that we still get safepoints between loads and their
>>>>> corresponding load barriers. This happens when a load with an expanded
>>>>> barrier state ends up in the same block as a safepoint.
>>>>>
>>>>> Here are reasons why this can happen that I found:
>>>>>
>>>>> 1) We don't pin the loads to their block. By that I mean make sure that
>>>>> it has the block as in(MemNode::Control), as well as have their
>>>>> _control_dependency == Pinned. Because of that, e.g. final graph
>>>>> reshaping of CastPP nodes, move their memory (load) uses up to the
>>>>> CastPP control block. That causes the load to move in to a block that
>>>>> may have safepoints in it, breaking the invariant. I now explicitly pin
>>>>> the loads to their block during late barrier injection to prevent such
>>>>> floating.
>>>>>
>>>>> 2) Despite pinning the loads after barrier injection, there is an IGVN
>>>>> transformation that explicitly moves loads above safepoints, without
>>>>> respecting if they are pinned. I changed the LoadNode::Ideal function to
>>>>> only move loads past safepoints if they are not pinned, to prevent that.
>>>>> This is the only change affecting non-ZGC code. But it seems to me that
>>>>> floating a pinned load above a safepoint is illegal in any execution
>>>>> mode and is not a ZGC-related problem. Moving for example the G1 loads
>>>>> that checks if marking is active past a safepoint will crash the VM.
>>>>>
>>>>> 3) Late barrier expansion isn't quite late enough. Barriers are expanded
>>>>> before macro expansion. Macro nodes such as locking macro nodes may
>>>>> expand into a bunch of nodes, including safepoints. I moved late barrier
>>>>> expansion to after macro expansion to solve that. It also should be
>>>>> after macro expansion for yet another reason: macro nodes my expand to
>>>>> loads that will require barriers. For example the "array copy" basic
>>>>> clone (i.e. java.lang.Object.clone() intrinsic) node does that. Now we
>>>>> have cloning disabled in ZGC right now, but should be able to enable it
>>>>> after this. That will be a later fix though for 14.
>>>>>
>>>>> 4) The original block of the load where we inject the LoadBarrier and
>>>>> splice in a successor control block (which will end up after the
>>>>> barrier), may already have safepoints in it. I solved this issue by
>>>>> moving the safepoint node to the successor control block that we
>>>>> injected as part of barrier injection. This ensures that the load and
>>>>> the safepoint don't end up in the same block.
>>>>>
>>>>> I found these issues by injecting quite intrusive verification code into
>>>>> C2. Unfortunately I think the verification code I wrote is too intrusive
>>>>> to upstream, so I would like to upstream just the fix to 13, and rework
>>>>> less intrusive verification code for 14 when there is more time to think
>>>>> about how to do that, if that is okay. I've done what I believe is the
>>>>> minimal fix to the observed problem, with the minimal risk.
>>>>>
>>>>> The test that detected the issue originally,
>>>>> "Compact_NonbranchyTree_ArrayOf", failed with root verification errors.
>>>>> Probability of failure after Stefan Karlsson's improved verification
>>>>> code seemed to be about 25% per run. With this patch, I have run 100
>>>>> iterations without failure. I have also run this through gc-test-suite
>>>>> with -Xcomp, and I am currently running through tier-1-6.
>>>>>
>>>>> Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8227407
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8227407/webrev.01/
>>>>>
>>>>> Thanks,
>>>>> /Erik
>>>
>>
More information about the hotspot-compiler-dev
mailing list