RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10]

Thu Sep 12 05:31:08 UTC 2024

On Wed, 11 Sep 2024 16:09:09 GMT, Erik Österlund <eosterlund at openjdk.org> wrote:

>> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway.
>
>> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway.
> 
> We can not detect the oop is dying. That is precisely what the GC is trying to figure out by going through the hassle of traversing the object graph. If what you are proposing was possible (detect unreachable oops by just looking at some cheap local property), then we would rewrite our GCs to exploit that magic. ;-) We would also rewrite Reference.get() to not keep the referent alive because we could just magically tell if it will get cleared in the future, or not.
> 
> If you are imagining, for example, looking at not yet finalized marking bitmaps from the GC and report errors when encountering a not yet marked object, then we would randomly report errors for perfectly valid uses of the API. The GC just didn't get to that object yet. In other words, we have no way of telling by just looking at an object if the object *will* be found to be not reachable, or not, once it terminates. But by keeping it alive, we can control the answer: the oop will be found to be live.
> 
> This is not a new problem. We have encountered it many times before. The standard way of dealing with this situation (wanting to publish edges to "peeked" oops in the object graph), is to keep the oop alive. Not sure why we would treat it differently here. Unless of course we say this is not supported and crash, but that seems a bit unfortunate IMO.

@fisk Do you think hotspot abuses the weak's `peek`? IMHO, `peek` should be restricted inside GC scope because only very few places need to use peek. In other component of VM, we could always keep alive if some alive API return true or try to access weak referent just like the Java code did. Does it make sense?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2345305346