RFR: 8291237: Encapsulate nmethod Deoptimization logic [v2]

Wed Aug 3 13:39:43 UTC 2022

On Mon, 1 Aug 2022 04:58:49 GMT, Axel Boldt-Christmas <duke at openjdk.org> wrote:

>> The proposal is to encapsulate the nmethod mark for deoptimization logic in one place and only allow access to the `mark_for_deoptimization` from a closure object:
>> ```C++
>> class DeoptimizationMarkerClosure : StackObj {
>> public:
>>   virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0;
>> };
>> 
>> This closure takes a `MarkFn` which it uses to mark which nmethods should be deoptimized. This marking can only be done through the `MarkFn` and a `MarkFn` can only be created in the following code which runs the closure. 
>> ```C++
>> {
>>   NoSafepointVerifier nsv;
>>   assert_locked_or_safepoint(Compile_lock);
>>   marker_closure.marker_do(MarkFn());
>>   anything_deoptimized = deoptimize_all_marked();
>> }
>> if (anything_deoptimized) {
>>   run_deoptimize_closure();
>> }
>> 
>> This ensures that this logic is encapsulated and the `NoSafepointVerifier` and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` not having to scan the whole code cache sound.
>> 
>> The exception to this pattern, from `InstanceKlass::unload_class`, is discussed in the JBS issue, and gives reasons why not marking for deoptimization there is ok.
>> 
>> An effect of this encapsulation is that the deoptimization logic was moved from the `CodeCache` class to the `Deoptimization` class and the class redefinition logic was moved from the `CodeCache` class to the `VM_RedefineClasses` class/operation. 
>> 
>> Testing: Tier 1-5
>> 
>> _Update_
>> ---
>> Switched too using a RAII object to track the context instead of putting code in a closure. But all the encapsulation is still the same. 
>> 
>> Testing: Tier 1-7
>> 
>> _Update_
>> ---
>>> @stefank suggested splitting out unloading klass logic change into a separate issue [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718).
>>> 
>>> Will probably also limit this PR to only encapsulation. (Skipping the linked list optimisation) And create a separate issue for that as well.
>>> 
>>> But this creates a chain of three dependent issues. [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237) depends on [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). And the link list optimisation depend will depend on [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237).
>>> 
>>> Will mark this as a draft for now and create a PR for [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) first.
>
> Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Add assertions
>  - Fix marked logic
>  - Erik refactorings

As my previous comment will probably back out the linked list of this PR and create a separate issue that depends on this encapsulation that only deal with swapping to a linked list instead of walking the whole code cache.

But as it will be based on this code your question is still relevant. 

I am pretty new to the codebase so please correct me if I am wrong, especially with regards to different locks, their scope and usage. Here is my understanding of why the linked list is valid as is. 

While working with the list we have a few things which are true:
* `assert_locked_or_safepoint(Compile_lock);` anywhere were we modify the list, 
*  once something is linked it cannot be linked again, as it is only linked if it is `not_marked` and the status can only go  `not_marked -> deoptimize | deoptimize_noupdate -> deoptimize_done`. 
    *  `assert(extract_compiled_method(_mark_link) == nullptr, ...)` for some extra sanity around this
    *  While the `MarkForDeoptimizationStatus` value makes linking something already linked impossible the assert above is not enough to catch if a tail of the list is linked in creating a cycle. Can either add an assert that walks the list and checks that the `tail != this`, or the `next_marked()` code can be changed to `take_next_marked()` which sets the field to `nullptr` and thus breaks any cycles when iterating (avoids infinite loops). The second would be alright as creating a loop from the tail to the root will never drop parts of the list, even if more elements are added after the cycle is created, as the  `assert(extract_compiled_method(_mark_link) == nullptr, ...)`  assert that the list will not break.
* From the creation of a `DeoptimizationContext` to until the linked list has been processed (`make_not_entrant` and `make_deoptimized`) we have `NoSafepointVerifier` (which is the time to safe point issue this change is trying to address)
* And this active part of the DeoptimizationContext cannot overlap with another active part of another DeoptimizationContext, checked via a bool flag. Maybe it is not correct to have a non volatile static bool here, but I though `assert_locked_or_safepoint(Compile_lock);` would be enough to guarantee synchronisation. But should probably change to load acquire atomics.

I think most of the terminology will change to make it more deoptimization specific. Having it called `not_marked`, `_root_mark_link`,  `_mark_link`, `next_mark()` and `take_root()` is to general and can lead to confusion. Another reason to split this out to a separate issue. Also using some other terminology than `mark` could be used.

-------------

PR: https://git.openjdk.org/jdk/pull/9655