RFR: 8205577: parallel/TestPrintGCDetailsVerbose.java fails assertion

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Jun 27 13:15:27 UTC 2018


Wonderful analysis Kim!


 > http://cr.openjdk.java.net/~kbarrett/8205577/open.00/

src/hotspot/share/code/nmethod.cpp
     (old) L1694:       (*p)->print_value_on(&ls);
         I agree with not printing, but you might want to leave a
         comment behind. Something like:

         // Do not try "(*p)->print_value_on()" here because it
         // is racy with parallel operations.

Thumbs up in any case.

Dan


On 6/27/18 1:03 AM, Kim Barrett wrote:
> Please review this fix of an assertion failure during ParallelGC young
> collections.
>
> We have a task (thread) walking the code cache, copying and forwarding
> oop references in those cached nmethods containing scavengable
> (e.g. young) oops.  After the oops in an nmethod are processed, a
> second pass is made to determine whether any of the oops are still
> scavengable, removing the nmethod from the list containing scavengable
> oops.  That speeds up future collection cycles.
>
> That second pass check logs the values of any scavengable oops
> referred to by the nmethod.  This is where things go wrong.  If the
> object being printed is a j.l.Class, deep in the guts of the printing
> there is an assert that the mirror of the klass of the object is that
> object.  But if the klass mirror has not yet been forwarded, we can be
> comparing a forward object with the not yet forward version, and the
> assertion fails.
>
> There is another task that is running in parallel with the code cache
> walk that is processing the CLD graph, but it may not have reached the
> relevant klass yet, so the klass's mirror hasn't been forwarded yet.
>
> But there's another complication that makes this crash even more
> difficult to trigger.  The ParallelGC young collection's code cache
> walk promotes nmethod-referenced oops, and that promotion makes the
> object no longer scavengable.  So to trigger the crash, some third
> thread must have copied a young class to survivor space, then the code
> cache walk processes an nmethod referring to that class and prints the
> still young class, all before the CLD graph walk has forwarded the
> klass's mirror.
>
> This problem has been lurking all along, but JDK-8203837 changed the
> printing from only occurring when TraceScavenge was true to printing
> when gc+nmethod=trace logging is enabled.  As there were no tests that
> enabled TraceScavenge, we didn't previously notice the problem.  But
> there are tests that enable gc+nmethod=trace, enabling the problematic
> printing.
>
> The solution being taken here is to change that logging to no longer
> attempt to print the object.  Printing an object while in the middle
> of a copying GC, especially one that is parallel, seems quite risky,
> so let's just not do that.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8205577
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8205577/open.00/
>
> Testing:
> mach5 tier1,2,3
>
> I haven't been able to reproduce the failure; as discussed above, it's
> very timing sensitive.  But the scenario described above accounts for
> both the problem and the timing of its appearance.
>
>



More information about the hotspot-dev mailing list