RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v6]

Thomas Schatzl tschatzl at openjdk.org
Mon Dec 4 13:55:41 UTC 2023


On Mon, 4 Dec 2023 12:39:59 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2)
>> 
>> Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge).
>> 
>> The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress.
>> 
>> Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2).
>> 
>> Upcoming changes will
>> * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors.
>> * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`)
>> * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism
>> * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging)
>> * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging.
>> 
>> Please also first looking into the (small) PR this depends on.
>> 
>> The crash on linux-x86 is fixed by PR#16766 which I split out for quicker reviews.
>> 
>> Testing: tier1-7
>> 
>> Thanks,
>>   Thomas
>
> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision:
> 
>   ayang review: move class unloading outside of weak_refs_work

Fwiw, to put this change in a bit more context: it  is part of a series of changes to improve class unloading performance back to pre-jdk21 levels (and better).

The basic plan:

* this change, [JDK-8317809](https://bugs.openjdk.org/browse/JDK-8317809), that improves nmethod sorting/free list handling (and introduces the ClassUnloadingContext)

* [JDK-8317007](https://bugs.openjdk.org/browse/JDK-8317007) that allows bulk unregistering of nmethods instead of (slow) per-nmethod unregistering (also out for review)

With the above two changes, Remark pause time should be <= before removal of the code root sweeper (lots of changes went in already that improved time taken for various parts of the class/code unloading).

I am planning the following follow-ups in the next few months (after FC time will be spent on bugfixing, and holidays coming up):

* (for G1) move out several parts of class unloading into the concurrent phase, at least this will include
    - bulk nmethod unregistering ([JDK-8317007](https://bugs.openjdk.org/browse/JDK-8317007))
    - nmethod code blob freeing (this change)
    - metaspace unloading

Not necessarily in a single change; this basically halves g1 remark pause times again in my testing.

* split up and parallelize ClassLoaderData unloading; currently with this change, when registering CLDs CLD->unload() is immediately called as before. However this is wasteful as most of that method can either be "obviously" parallelized or made so that other tasks can run in parallel.
So the plan is that class unloading (`SystemDictionary::do_unloading`) will be split into a part that iterates only over the CLD list to determine dead ones, and a parallel part. 

There are no CR/PRs out for these latter two items, but hopefully this will short of making everything concurrent keep class/code unloading times low enough for some time.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16759#issuecomment-1838687260


More information about the hotspot-gc-dev mailing list