RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v6]

Tue Dec 5 18:49:52 UTC 2023

On Tue, 5 Dec 2023 15:47:57 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:

>> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   ayang review: move class unloading outside of weak_refs_work
>
> src/hotspot/share/gc/shared/classUnloadingContext.cpp line 91:
> 
>> 89:     cld->classes_do(f);
>> 90:   }
>> 91: }
> 
> I don't understand why CLDG specific methods were moved here.  They should be unaware of nmethod purging. and these 4 methods don't have any nmethod purging in them either and are specific to the CLDG implementation.

The idea is to have GC take control how the unloading CLDs are stored/which data structure it is going to use to manage them to ultimately allow more control about class unloading for parallelization.

Which on the one hand makes pauses shorter (for stw collectors), and on the other hand decreases the time the CLDG_lock is held (not sure it is nice that the concurrent collectors currently may hold that one for ~100ms in my test...).

I believe having the linked list of unloading CLDs embedded in the CLDs for use by the GC not only seems wrong (i.e. it's a GC data structure located in runtime code) but is also very limiting (need to have one for all, fixed singly linked list).

This change moves knowledge of how unloading CLDs are managed to GC area - runtime code just tells GC that a particular CLD is unloading.
(Currently the `ClassUnloadingContext` also calls the `unload` method during registration to keep current functionality, but the plan is to separate the step of registration and actual unloading to allow custom handling of the second part; the registering, although it's still walking a singly linked list, is comparatively fast).

These four methods provide a thin abstraction over the CLDs that are unloading (that runtime doesn't need and should not worry about imo).

With that in place it is possible to slice the actual unloading work into phases according to dependencies (depending on GC if desired), potentially overlapping with other existing phases in collectors already allowing that (e.g. the parallel code unloading, but that is only an implementation detail to reduce overall parallel phases), or even moving some of that work sometime else (the `CLD::unload()` method unfortunately currently may do some memory freeing too).

However most time is spent in notifying various components which can be parallelized (at least parallelize the different types of notifications).

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1416139587