RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10]

Thu Sep 12 03:23:07 UTC 2024

On Wed, 11 Sep 2024 12:14:45 GMT, David Holmes <dholmes at openjdk.org> wrote:

> I guess I am still missing a piece here. We have the initial check for k being alive (which doesn't ensure it stays alive it just allows an early bail out), and we end with creating a JNI reference for the mirror oop. I assume that once we have the JNI reference the mirror oop is again strongly reachable and safe (if not how are we allowed to create the JNI reference for it?). So somewhere inbetween k is no longer alive and the mirror oop is junk. Or is it that things go bad after we create the JNI reference?

That's a good point. It's related to the fundamental GC algorithm and implementation in hotspot. In GC cycle we check the klass alive and create JNI strong reference in meantime but later the klass is dead which is counterintuitive. That is because GC concurrent marking/tracing replies on a basical tri-color algorithm which guarantees the safety of object graph traverse with modification to the graph. The problem is that we created a connection between a `WHITE` dying klass oop to a `BLACK` JNI Handle (which is part of GC roots scanned before graph traverse) and therefore `BLACK` pointing to `WHITE` violates the tri-color invariance. In the earlier concurrent GC implementation CMS, we don't need resurrection because CMS uses the graph insertion protection in tri-color AKA `incremental-update`. The JNI Handle will be rescanned after graph traverse and mark the dying class oop alive. However G1 and ZGC uses deletion protection in tri-color (which SATB belongs to) which has the advantag
 e to getting rid of rescan but cannot mark alive for the scenario you described. The solution is to keep the weak referent alive while accessing in GC cycle which is the CLD holder oop in this case. Technically we could definitely do something like CMS incremental-update to revive the oop while in JNI Hanlde creation(we tried and fixed this crash) but I guess it is not the general consistent way in hotspot and make things more confused and difficult to understand. I hope this can help.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2345187437