RFR: 8205577: parallel/TestPrintGCDetailsVerbose.java fails assertion
Kim Barrett
kim.barrett at oracle.com
Wed Jun 27 05:03:59 UTC 2018
Please review this fix of an assertion failure during ParallelGC young
collections.
We have a task (thread) walking the code cache, copying and forwarding
oop references in those cached nmethods containing scavengable
(e.g. young) oops. After the oops in an nmethod are processed, a
second pass is made to determine whether any of the oops are still
scavengable, removing the nmethod from the list containing scavengable
oops. That speeds up future collection cycles.
That second pass check logs the values of any scavengable oops
referred to by the nmethod. This is where things go wrong. If the
object being printed is a j.l.Class, deep in the guts of the printing
there is an assert that the mirror of the klass of the object is that
object. But if the klass mirror has not yet been forwarded, we can be
comparing a forward object with the not yet forward version, and the
assertion fails.
There is another task that is running in parallel with the code cache
walk that is processing the CLD graph, but it may not have reached the
relevant klass yet, so the klass's mirror hasn't been forwarded yet.
But there's another complication that makes this crash even more
difficult to trigger. The ParallelGC young collection's code cache
walk promotes nmethod-referenced oops, and that promotion makes the
object no longer scavengable. So to trigger the crash, some third
thread must have copied a young class to survivor space, then the code
cache walk processes an nmethod referring to that class and prints the
still young class, all before the CLD graph walk has forwarded the
klass's mirror.
This problem has been lurking all along, but JDK-8203837 changed the
printing from only occurring when TraceScavenge was true to printing
when gc+nmethod=trace logging is enabled. As there were no tests that
enabled TraceScavenge, we didn't previously notice the problem. But
there are tests that enable gc+nmethod=trace, enabling the problematic
printing.
The solution being taken here is to change that logging to no longer
attempt to print the object. Printing an object while in the middle
of a copying GC, especially one that is parallel, seems quite risky,
so let's just not do that.
CR:
https://bugs.openjdk.java.net/browse/JDK-8205577
Webrev:
http://cr.openjdk.java.net/~kbarrett/8205577/open.00/
Testing:
mach5 tier1,2,3
I haven't been able to reproduce the failure; as discussed above, it's
very timing sensitive. But the scenario described above accounts for
both the problem and the timing of its appearance.
More information about the hotspot-dev
mailing list