RFR: 8347564: ZGC: Crash in DependencyContext::clean_unloading_dependents

Vladimir Ivanov vlivanov at openjdk.org
Tue Jan 28 21:10:46 UTC 2025


On Mon, 20 Jan 2025 07:56:49 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> The proposed change here is the following:
>  1. Move the `vmdependencies` list head from the `Context` to the `CallSite` object
>  2. Remove the Context object and its corresponding cleaner
> 
> (1.) fixes the crash. (2.) is because without `vmdependencies` the Context and its cleaner serves no purpose.
> 
> First what goes wrong without this patch.
> 
> Currently when the GC unlinks a nmethod it clean/flush out all the dependency contexts. These are either attached to an InstanceKlass, or a Context object reachable through a CallSite oop. A nmethod is unlinked when either one of its oops have died or it is heuristically determined to be cold and should be unloaded. So when the GC clean's the context through a CallSite oop, it may be that that CallSite object is dead. 
> 
> For ZGC, which is a concurrent generational GC (the different generations are collected concurrently, but in a coordinated manner), it is important that the unlinking is coordinated with the reclamation of this dead object. In generational ZGC all nmethod oops are considered as strong roots if they reside in the young generation and thusly can only become unreachable / dead after promotion to the old generation. This means that the CallSite object at the time of unlinking is either reachable / live, or unreachable / dead and is reclaimed by the old generation collection (the same generation that does the unlinking). So we can make reading from this object safe by not reclaiming the object before unlinking is finished. 
> 
> The issue is that we do not have this guarantee for the Context object. As this is a distinct object it may be that it has not been promoted and resides in the young generation at the time of its CallSite object becoming unreachable and collected by the old generation collection.
> 
> If this is the case and a young generation collection runs after old marking has finished, we have two bad scenarios. If it the young generation collection starts after reference processing and the cleaner has run, the Context object would be unreachable and the young generation collection  would reclaim the memory. If it started before the reference processing it would still be reachable, but may be relocated. 
> 
> For reachable old CallSite objects the Context oop field would have been tracked and remapped if a young collection relocates the Context object, however this is not true then the CallSite is unreachable. The Context object may have moved or been reclaimed, and the load barrier on the field will produce ...

Thanks for the clarifications, Axel. The fix looks good. 

> But I'll take it a spin through our CI as well. 

Thanks!

-------------

Marked as reviewed by vlivanov (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23194#pullrequestreview-2579417360


More information about the hotspot-dev mailing list