RFR: 8205577: parallel/TestPrintGCDetailsVerbose.java fails assertion
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Wed Jun 27 13:43:20 UTC 2018
Kim, This looks good. Thank you for the detailed analysis of how this
failed.
Coleen
On 6/27/18 1:03 AM, Kim Barrett wrote:
> Please review this fix of an assertion failure during ParallelGC young
> collections.
>
> We have a task (thread) walking the code cache, copying and forwarding
> oop references in those cached nmethods containing scavengable
> (e.g. young) oops. After the oops in an nmethod are processed, a
> second pass is made to determine whether any of the oops are still
> scavengable, removing the nmethod from the list containing scavengable
> oops. That speeds up future collection cycles.
>
> That second pass check logs the values of any scavengable oops
> referred to by the nmethod. This is where things go wrong. If the
> object being printed is a j.l.Class, deep in the guts of the printing
> there is an assert that the mirror of the klass of the object is that
> object. But if the klass mirror has not yet been forwarded, we can be
> comparing a forward object with the not yet forward version, and the
> assertion fails.
>
> There is another task that is running in parallel with the code cache
> walk that is processing the CLD graph, but it may not have reached the
> relevant klass yet, so the klass's mirror hasn't been forwarded yet.
>
> But there's another complication that makes this crash even more
> difficult to trigger. The ParallelGC young collection's code cache
> walk promotes nmethod-referenced oops, and that promotion makes the
> object no longer scavengable. So to trigger the crash, some third
> thread must have copied a young class to survivor space, then the code
> cache walk processes an nmethod referring to that class and prints the
> still young class, all before the CLD graph walk has forwarded the
> klass's mirror.
>
> This problem has been lurking all along, but JDK-8203837 changed the
> printing from only occurring when TraceScavenge was true to printing
> when gc+nmethod=trace logging is enabled. As there were no tests that
> enabled TraceScavenge, we didn't previously notice the problem. But
> there are tests that enable gc+nmethod=trace, enabling the problematic
> printing.
>
> The solution being taken here is to change that logging to no longer
> attempt to print the object. Printing an object while in the middle
> of a copying GC, especially one that is parallel, seems quite risky,
> so let's just not do that.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8205577
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8205577/open.00/
>
> Testing:
> mach5 tier1,2,3
>
> I haven't been able to reproduce the failure; as discussed above, it's
> very timing sensitive. But the scenario described above accounts for
> both the problem and the timing of its appearance.
>
>
More information about the hotspot-dev
mailing list