RFR: 8225642: ZGC: Crash due to bad oops being spilled to stack in load barriers

Erik Osterlund erik.osterlund at oracle.com
Thu Jun 20 17:08:53 UTC 2019


Hi Stuart,

That looks right. I will fold it in to my push.

Thanks,
/Erik

> On 20 Jun 2019, at 19:05, Stuart Monteith <stuart.monteith at linaro.org> wrote:
> 
> Thanks Erik.
> 
> Hopefully this should be all we need - I haven't spotted anything
> fundamentally different:
>   http://cr.openjdk.java.net/~smonteith/8225642/8225642.patch
> 
> I've done a little testing.
> 
> Thanks!
> 
> 
>> On Thu, 20 Jun 2019 at 17:57, Erik Osterlund <erik.osterlund at oracle.com> wrote:
>> 
>> Hi Stuart,
>> 
>> I can fold in some AArch64 juice if you got some. :)
>> 
>> Thanks,
>> /Erik
>> 
>>> On 20 Jun 2019, at 18:29, Stuart Monteith <stuart.monteith at linaro.org> wrote:
>>> 
>>> Hello,
>>> Sorry, I should check my ZGC filter more often. I notice that there
>>> are some changes that are required for Aarch64. Would you like me to
>>> send a patch to be incorporated into this patch, or shall I pursue a
>>> separate bug?
>>> 
>>> Thanks,
>>>  Stuart
>>> 
>>>> On Thu, 20 Jun 2019 at 12:33, Erik Osterlund <erik.osterlund at oracle.com> wrote:
>>>> 
>>>> Hi Per,
>>>> 
>>>> Thanks for the review.
>>>> 
>>>> /Erik
>>>> 
>>>>>> On 20 Jun 2019, at 13:04, Per Liden <per.liden at oracle.com> wrote:
>>>>>> 
>>>>>> On 6/20/19 11:38 AM, Erik Österlund wrote:
>>>>>> Hi,
>>>>>> With the new late barrier expansion mechanism, we thought we could finally disable some workarounds that fixed up bad oops in the stack, that should never be allowed to be bad really if the compiler did what we want it to do.
>>>>>> It turns out that it is still possible for bad oops to end up in the stack. This is attributed to spills of the loaded bad oop that never really got usedin the application. These spills could be observed in the disassembly. While the oop isn't used in the nmethod, it might be used by e.g. deoptimization, forcing it to stick around. The cause for the spill causing the trouble is that the medium slow path (LoadBarrierSlowRegNode) defines (DEF) a new value returned from the slow path reloads the oop into that register, instead of using the original oop as input. So the LoadBarrierSlowRegNode would sometimes get a different register to the original load, causing the loaded bad value to spill bad values for deopt sometimes, eventually causing crashes in heap iteration code and asserts.
>>>>>> To solve the problem, the original oop was added as an input edge to LoadBarrierSlowRegNode, so that we in the AD file could have the input and output of the node be the same, matching the original load. That attempt first produced a bad AD file because the ADLC parser doesn't like that the LoadBarrierSlowRegNode had an additional edge when it is modeled as a load, which is treated specially in places. It has been originally modeled as a load because it reloads the value and hence needs to quack like a load. However, with the new late barrier expansion approach we can re-model it as a TypeNode instead because the slowreg node can't really move around so much in the circumstances where they are now expanded (late and tucked into tight control flow blocks). Nevertheless, it is a bit iffy and a follow-up RFE will remove the current practice of returning reloaded values from the barrier slowpaths to make this increasingly sane. But I want this fix to be as minimal as possible as it's going in to 13, and the risk of making such changes for 13 is not worth it.
>>>>>> Thanks goes to StefanK, Per and Nils for helping diagnose the problem, finding good reproducers, helping out testing the fix, and good discussions along the way.
>>>>>> This patch has made it through mach5 hs-tier1-5, many runs of gc-testsuite, 100 kitchensink runs, and ~1 day of StefanK's quite reliable reproducer with custom StefanK verification code that normally crashes within an hour.
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~eosterlund/8225642/webrev.00/
>>>>> 
>>>>> Looks good to me!
>>>>> 
>>>>> /Per
>>>>> 
>>>>>> Bug:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8225642
>>>>>> Thanks,
>>>>>> /Erik
>>>> 
>> 



More information about the hotspot-compiler-dev mailing list