RFR: 8227407: ZGC: C2 loads and load barriers can get separated by safepoints

Nils Eliasson nils.eliasson at oracle.com
Fri Jul 19 15:02:05 UTC 2019


Hi Erik,

Looks good!

// Nils

On 2019-07-19 15:21, Erik Österlund wrote:
> Hi,
>
> As promised, I wrote some verification code triggered right before 
> code gen.
>
> Updated webrev:
> http://cr.openjdk.java.net/~eosterlund/8227407/webrev.02/
>
> I verified that this verification code catches the issues I identified 
> before my fix is applied, and doesn't trigger after my fix is applied.
> This type of verification won't show you where you went wrong, but it 
> will show you that there is a problem (somewhere) when it arises.
> Managed to get through hs-tier1-6 with this patch (and verification 
> enabled)
>
> Thanks,
> /Erik
>
> On 2019-07-19 09:41, Erik Österlund wrote:
>> Hi Stuart,
>>
>> Thank you for taking this for a spin. And yes, I think the 
>> appropriate place to put in such verification code would be in
>>
>> bs->verify_gc_barriers(this, BarrierSetC2::BeforeCodeGen);
>>
>> which is called right after Compile::Optimize() and before lowering 
>> to machine nodes.
>> I'll give it a shot to add some verification code there.
>>
>> Thanks,
>> /Erik
>>
>> On 2019-07-18 17:34, Stuart Monteith wrote:
>>> Hello Erik,
>>>     I'll try this against Aarch64 and let you know what I get. I'm
>>> thinking the safepoints could be examined in the assembler to see if
>>> they are splitting the loads and their barriers.
>>> Is there a pass where we could enforce this as an invariant?
>>>
>>> BR,
>>>     Stuart
>>>
>>> On Thu, 18 Jul 2019 at 15:46, Erik Österlund 
>>> <erik.osterlund at oracle.com> wrote:
>>>> Hi,
>>>>
>>>> The idea of "8224675: Late GC barrier insertion for ZGC" was to expand
>>>> load barriers much later, when loads have stopped floating around. The
>>>> reason we wanted to do that is so that when we expand load 
>>>> barriers, we
>>>> won't get safepoints between loads and their load barrier.
>>>>
>>>> Turns out though that we still get safepoints between loads and their
>>>> corresponding load barriers. This happens when a load with an expanded
>>>> barrier state ends up in the same block as a safepoint.
>>>>
>>>> Here are reasons why this can happen that I found:
>>>>
>>>> 1) We don't pin the loads to their block. By that I mean make sure 
>>>> that
>>>> it has the block as in(MemNode::Control), as well as have their
>>>> _control_dependency == Pinned. Because of that, e.g. final graph
>>>> reshaping of CastPP nodes, move their memory (load) uses up to the
>>>> CastPP control block. That causes the load to move in to a block that
>>>> may have safepoints in it, breaking the invariant. I now explicitly 
>>>> pin
>>>> the loads to their block during late barrier injection to prevent such
>>>> floating.
>>>>
>>>> 2) Despite pinning the loads after barrier injection, there is an IGVN
>>>> transformation that explicitly moves loads above safepoints, without
>>>> respecting if they are pinned. I changed the LoadNode::Ideal 
>>>> function to
>>>> only move loads past safepoints if they are not pinned, to prevent 
>>>> that.
>>>> This is the only change affecting non-ZGC code. But it seems to me 
>>>> that
>>>> floating a pinned load above a safepoint is illegal in any execution
>>>> mode and is not a ZGC-related problem. Moving for example the G1 loads
>>>> that checks if marking is active past a safepoint will crash the VM.
>>>>
>>>> 3) Late barrier expansion isn't quite late enough. Barriers are 
>>>> expanded
>>>> before macro expansion. Macro nodes such as locking macro nodes may
>>>> expand into a bunch of nodes, including safepoints. I moved late 
>>>> barrier
>>>> expansion to after macro expansion to solve that. It also should be
>>>> after macro expansion for yet another reason: macro nodes my expand to
>>>> loads that will require barriers. For example the "array copy" basic
>>>> clone (i.e. java.lang.Object.clone() intrinsic) node does that. Now we
>>>> have cloning disabled in ZGC right now, but should be able to 
>>>> enable it
>>>> after this. That will be a later fix though for 14.
>>>>
>>>> 4) The original block of the load where we inject the LoadBarrier and
>>>> splice in a successor control block (which will end up after the
>>>> barrier), may already have safepoints in it. I solved this issue by
>>>> moving the safepoint node to the successor control block that we
>>>> injected as part of barrier injection. This ensures that the load and
>>>> the safepoint don't end up in the same block.
>>>>
>>>> I found these issues by injecting quite intrusive verification code 
>>>> into
>>>> C2. Unfortunately I think the verification code I wrote is too 
>>>> intrusive
>>>> to upstream, so I would like to upstream just the fix to 13, and 
>>>> rework
>>>> less intrusive verification code for 14 when there is more time to 
>>>> think
>>>> about how to do that, if that is okay. I've done what I believe is the
>>>> minimal fix to the observed problem, with the minimal risk.
>>>>
>>>> The test that detected the issue originally,
>>>> "Compact_NonbranchyTree_ArrayOf", failed with root verification 
>>>> errors.
>>>> Probability of failure after Stefan Karlsson's improved verification
>>>> code seemed to be about 25% per run. With this patch, I have run 100
>>>> iterations without failure. I have also run this through gc-test-suite
>>>> with -Xcomp, and I am currently running through tier-1-6.
>>>>
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8227407
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~eosterlund/8227407/webrev.01/
>>>>
>>>> Thanks,
>>>> /Erik
>>
>


More information about the hotspot-compiler-dev mailing list