RFR: 8326820: Metadata artificially kept alive [v3]
Coleen Phillimore
coleenp at openjdk.org
Sat Jun 22 00:21:10 UTC 2024
On Wed, 19 Jun 2024 15:06:25 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:
>> ClassLoaderDataGraph provides APIs for walking different metadata. All the iterators which are not designed to be used by the GC also keep the holder of the CLDs alive and by extensions keeps all metadata alive. This is problematic for concurrent GC as it keeps otherwise unreachable classes from being unloaded and the respective metadata freed.
>>
>> This patch changes the default iteration behaviour to not keep the holder alive, with the exception of `loaded_classes_do` (renamed `loaded_classes_do_keepalive`) and `modules_do` (renamed `modules_do_keepalive`) which is used by jvmti APIs that requires that the holder is kept alive.
>>
>> All other uses consumes all the metadata it queries during its safepoint or before releasing the `ClassLoaderDataGraph_lock`.
>>
>> Before this change some jcmd, new jfr chunks and some jfr events, all of which consumed these APIs, could cause class unloading to not occur.
>>
>> Been running our internal stress test in an even more stressful mode which without this patch reproduces the metaspace OOME [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently within a few hours. And after this patch it does not.
>>
>> Currently running tier1-tier8 testing.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>
> Rename and comment SystemDictionary::methods_do
So the problem with the no-keepalive names is that for these functions it's very unclear what exactly you're keeping alive, unless you're thinking about concurrent collection GC, and even then it's a puzzler. I don't see how it's going to help prevent more bugs. I don't yet understand the current bug. The 'resolve' for the CLDG iterator was to temporarily keep that CLD from being unloaded, in the short time that we're iterating on that particular CLD.
The CLDG_lock was added later and it may be that that's what prevents the CLD from being *deleted* while we are holding the lock while iterating. So that keeps the CLD metadata alive. The oops in the CLD are not kept alive unless we have resolved the holder which the GC sees as the root. So if we are accessing oops from the CLD, we need to use the keepalive version. It's not really keeping the CLD alive. It's making the oops alive, or accessible until they are handled somewhere by the caller.
So the no-keepalive is not a great name for this, even though it's used when accessing oops in other places. I think the versions of these functions that access oops that need to be kept alive should have a different name, like modules_do_keeping_oops_alive(). But that's a sentence name. Maybe modules_do_keepalive() can be read as keeping oops alive. But classes_do_no_keepalive() is a terrible name because we want the class metadata to be kept alive and the name is disconcerting.
Usually I prefer good names to good comments, but here I think good comments are better.
-------------
PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2133534047
More information about the hotspot-dev
mailing list