RFR: 8225642: ZGC: Crash due to bad oops being spilled to stack in load barriers

Per Liden per.liden at oracle.com
Thu Jun 20 11:04:52 UTC 2019


On 6/20/19 11:38 AM, Erik Österlund wrote:
> Hi,
> 
> With the new late barrier expansion mechanism, we thought we could 
> finally disable some workarounds that fixed up bad oops in the stack, 
> that should never be allowed to be bad really if the compiler did what 
> we want it to do.
> It turns out that it is still possible for bad oops to end up in the 
> stack. This is attributed to spills of the loaded bad oop that never 
> really got usedin the application. These spills could be observed in the 
> disassembly. While the oop isn't used in the nmethod, it might be used 
> by e.g. deoptimization, forcing it to stick around. The cause for the 
> spill causing the trouble is that the medium slow path 
> (LoadBarrierSlowRegNode) defines (DEF) a new value returned from the 
> slow path reloads the oop into that register, instead of using the 
> original oop as input. So the LoadBarrierSlowRegNode would sometimes get 
> a different register to the original load, causing the loaded bad value 
> to spill bad values for deopt sometimes, eventually causing crashes in 
> heap iteration code and asserts.
> 
> To solve the problem, the original oop was added as an input edge to 
> LoadBarrierSlowRegNode, so that we in the AD file could have the input 
> and output of the node be the same, matching the original load. That 
> attempt first produced a bad AD file because the ADLC parser doesn't 
> like that the LoadBarrierSlowRegNode had an additional edge when it is 
> modeled as a load, which is treated specially in places. It has been 
> originally modeled as a load because it reloads the value and hence 
> needs to quack like a load. However, with the new late barrier expansion 
> approach we can re-model it as a TypeNode instead because the slowreg 
> node can't really move around so much in the circumstances where they 
> are now expanded (late and tucked into tight control flow blocks). 
> Nevertheless, it is a bit iffy and a follow-up RFE will remove the 
> current practice of returning reloaded values from the barrier slowpaths 
> to make this increasingly sane. But I want this fix to be as minimal as 
> possible as it's going in to 13, and the risk of making such changes for 
> 13 is not worth it.
> 
> Thanks goes to StefanK, Per and Nils for helping diagnose the problem, 
> finding good reproducers, helping out testing the fix, and good 
> discussions along the way.
> 
> This patch has made it through mach5 hs-tier1-5, many runs of 
> gc-testsuite, 100 kitchensink runs, and ~1 day of StefanK's quite 
> reliable reproducer with custom StefanK verification code that normally 
> crashes within an hour.
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8225642/webrev.00/

Looks good to me!

/Per

> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8225642
> 
> Thanks,
> /Erik


More information about the hotspot-compiler-dev mailing list