RFR: 8225642: ZGC: Crash due to bad oops being spilled to stack in load barriers
Per Liden
per.liden at oracle.com
Thu Jun 20 11:04:52 UTC 2019
On 6/20/19 11:38 AM, Erik Österlund wrote:
> Hi,
>
> With the new late barrier expansion mechanism, we thought we could
> finally disable some workarounds that fixed up bad oops in the stack,
> that should never be allowed to be bad really if the compiler did what
> we want it to do.
> It turns out that it is still possible for bad oops to end up in the
> stack. This is attributed to spills of the loaded bad oop that never
> really got usedin the application. These spills could be observed in the
> disassembly. While the oop isn't used in the nmethod, it might be used
> by e.g. deoptimization, forcing it to stick around. The cause for the
> spill causing the trouble is that the medium slow path
> (LoadBarrierSlowRegNode) defines (DEF) a new value returned from the
> slow path reloads the oop into that register, instead of using the
> original oop as input. So the LoadBarrierSlowRegNode would sometimes get
> a different register to the original load, causing the loaded bad value
> to spill bad values for deopt sometimes, eventually causing crashes in
> heap iteration code and asserts.
>
> To solve the problem, the original oop was added as an input edge to
> LoadBarrierSlowRegNode, so that we in the AD file could have the input
> and output of the node be the same, matching the original load. That
> attempt first produced a bad AD file because the ADLC parser doesn't
> like that the LoadBarrierSlowRegNode had an additional edge when it is
> modeled as a load, which is treated specially in places. It has been
> originally modeled as a load because it reloads the value and hence
> needs to quack like a load. However, with the new late barrier expansion
> approach we can re-model it as a TypeNode instead because the slowreg
> node can't really move around so much in the circumstances where they
> are now expanded (late and tucked into tight control flow blocks).
> Nevertheless, it is a bit iffy and a follow-up RFE will remove the
> current practice of returning reloaded values from the barrier slowpaths
> to make this increasingly sane. But I want this fix to be as minimal as
> possible as it's going in to 13, and the risk of making such changes for
> 13 is not worth it.
>
> Thanks goes to StefanK, Per and Nils for helping diagnose the problem,
> finding good reproducers, helping out testing the fix, and good
> discussions along the way.
>
> This patch has made it through mach5 hs-tier1-5, many runs of
> gc-testsuite, 100 kitchensink runs, and ~1 day of StefanK's quite
> reliable reproducer with custom StefanK verification code that normally
> crashes within an hour.
>
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8225642/webrev.00/
Looks good to me!
/Per
>
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8225642
>
> Thanks,
> /Erik
More information about the hotspot-compiler-dev
mailing list