RFR: 8347564: ZGC: Crash in DependencyContext::clean_unloading_dependents

Vladimir Ivanov vlivanov at openjdk.org
Tue Jan 28 04:06:45 UTC 2025


On Mon, 20 Jan 2025 07:56:49 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> The proposed change here is the following:
>  1. Move the `vmdependencies` list head from the `Context` to the `CallSite` object
>  2. Remove the Context object and its corresponding cleaner
> 
> (1.) fixes the crash. (2.) is because without `vmdependencies` the Context and its cleaner serves no purpose.
> 
> First what goes wrong without this patch.
> 
> Currently when the GC unlinks a nmethod it clean/flush out all the dependency contexts. These are either attached to an InstanceKlass, or a Context object reachable through a CallSite oop. A nmethod is unlinked when either one of its oops have died or it is heuristically determined to be cold and should be unloaded. So when the GC clean's the context through a CallSite oop, it may be that that CallSite object is dead. 
> 
> For ZGC, which is a concurrent generational GC (the different generations are collected concurrently, but in a coordinated manner), it is important that the unlinking is coordinated with the reclamation of this dead object. In generational ZGC all nmethod oops are considered as strong roots if they reside in the young generation and thusly can only become unreachable / dead after promotion to the old generation. This means that the CallSite object at the time of unlinking is either reachable / live, or unreachable / dead and is reclaimed by the old generation collection (the same generation that does the unlinking). So we can make reading from this object safe by not reclaiming the object before unlinking is finished. 
> 
> The issue is that we do not have this guarantee for the Context object. As this is a distinct object it may be that it has not been promoted and resides in the young generation at the time of its CallSite object becoming unreachable and collected by the old generation collection.
> 
> If this is the case and a young generation collection runs after old marking has finished, we have two bad scenarios. If it the young generation collection starts after reference processing and the cleaner has run, the Context object would be unreachable and the young generation collection  would reclaim the memory. If it started before the reference processing it would still be reachable, but may be relocated. 
> 
> For reachable old CallSite objects the Context oop field would have been tracked and remapped if a young collection relocates the Context object, however this is not true then the CallSite is unreachable. The Context object may have moved or been reclaimed, and the load barrier on the field will produce ...

Good work, Axel. `Cleaner`-based solution to manage nmethod dependencies on `java.lang.invoke.CallSite`s does look redundant. If that's the case, then I'd expect that by the time cleanup action is invoked corresponding dependency list is empty. Is it the case in practice? 

But the original problem makes me wonder whether `Cleaner`-based approach to manage native resources is broken or not.
Dependent instance is used to avoid referent to be artificially kept alive by cleanup action. If it is not safe to access it in order to perform cleanup, how the whole approach is intended to work? Or does anything make 'CallSite` use case special?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23194#issuecomment-2617796038


More information about the core-libs-dev mailing list