[14] RFR (XS): 8235305: Corrupted oops embedded in nmethods due to parallel modification during optional evacuation
Stefan Johansson
stefan.johansson at oracle.com
Fri Jan 17 09:06:50 UTC 2020
Hi Thomas,
On 2020-01-16 14:20, Thomas Schatzl wrote:
> Hi all,
>
> can I get reviews for this change that fixes a bug in the abortable
> mixed gc algorithm where G1 might corrupt oops embedded in nmethods due
> to parallel modification during an optional evacuation phase?
>
> G1 currently collects embedded oops in nmethods twice: once in the
> optional roots list, and once as nmethods in the strong code roots list
> for a particular region.
>
> Now it can happen that this oop embedded in in the code stream is
> unaligned, so if that oop is modified during relocation word tearing may
> occur, causing follow-up crashes.
>
> The fix is to not collect oops from nmethods in the optional code root
> list as the strong code root list for a particular region already always
> contains it anyway.
>
> Thanks go to stefank, eriko and sjohanss for helping with analyzing,
> testing and the discussion around it.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8235305
> Webrev:
> http://cr.openjdk.java.net/~tschatzl/8235305/webrev/
Fix looks good. Just some things around the naming of the template
parameter and enum after adding this. I don't have a much better idea
but I don't think "barrier" is exactly what this is. I do think it would
make sense to call the new value G1BarrierNMethod to be more inline with
the other names. I also think it would make sense to move the comment
about why this is needed to where we use it in g1OopClosures.inline.hpp.
Me and StefanK talked a bit about this and if we move the comment and do
the check for the barrier as a separate if-statement, it should be more
obvious when this is needed.
Thanks,
Stefan
> Testing:
> multiple runs of hs-tier1-5, multiple runs of the crashing application
> (24h kitchensink) with and without a VM modification and also with some
> G1 settings that caused crashes within 1-2 hours that reproduced the
> issue within 5 minutes.
> Currently starting perf test runs with and without this change: however
> since this change strictly reduces the work done at all times I am not
> expecting any regressions (and hence I am asking for review in advance).
>
> Thanks,
> Thomas
More information about the hotspot-gc-dev
mailing list