RFR: 8291237: Encapsulate nmethod Deoptimization logic [v6]
Axel Boldt-Christmas
aboldtch at openjdk.org
Thu Aug 18 08:55:22 UTC 2022
On Thu, 18 Aug 2022 08:18:27 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:
>> The proposal is to encapsulate the nmethod mark for deoptimization logic in one place and only allow access to the `mark_for_deoptimization` from a closure object:
>> ```C++
>> class DeoptimizationMarkerClosure : StackObj {
>> public:
>> virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0;
>> };
>>
>> This closure takes a `MarkFn` which it uses to mark which nmethods should be deoptimized. This marking can only be done through the `MarkFn` and a `MarkFn` can only be created in the following code which runs the closure.
>> ```C++
>> {
>> NoSafepointVerifier nsv;
>> assert_locked_or_safepoint(Compile_lock);
>> marker_closure.marker_do(MarkFn());
>> anything_deoptimized = deoptimize_all_marked();
>> }
>> if (anything_deoptimized) {
>> run_deoptimize_closure();
>> }
>>
>> This ensures that this logic is encapsulated and the `NoSafepointVerifier` and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` not having to scan the whole code cache sound.
>>
>> The exception to this pattern, from `InstanceKlass::unload_class`, is discussed in the JBS issue, and gives reasons why not marking for deoptimization there is ok.
>>
>> An effect of this encapsulation is that the deoptimization logic was moved from the `CodeCache` class to the `Deoptimization` class and the class redefinition logic was moved from the `CodeCache` class to the `VM_RedefineClasses` class/operation.
>>
>> Testing: Tier 1-5
>>
>> _Update_
>> ---
>> Switched too using a RAII object to track the context instead of putting code in a closure. But all the encapsulation is still the same.
>>
>> Testing: Tier 1-7
>>
>> _Update_
>> ---
>>> @stefank suggested splitting out unloading klass logic change into a separate issue [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718).
>>>
>>> Will probably also limit this PR to only encapsulation. (Skipping the linked list optimisation) And create a separate issue for that as well.
>>>
>>> But this creates a chain of three dependent issues. [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237) depends on [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). And the link list optimisation depend will depend on [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237).
>>>
>>> Will mark this as a draft for now and create a PR for [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) first.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision:
>
> Cleanup comment
Opening this PR for review.
Terminology for deoptimizing compiled methods has been updated and the process encapsulated.
The terminology was chosen to better reflect what is happening and not clash with other terminology used when working with compiled methods.
The encapsulation was done with a RAII object which enables more rigorous assertion of the usage and behaviour of the code.
The fix for the deviation mentioned in the issue and the comments above was address in [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) and has been merged.
I made some comments about moving out the linked list optimisation into a separate issue, however there are two tests that currently are sensitive to (I suspect) asserts inside the critical sections of the deoptimization. The tests are:
* `vmTestbase/vm/mlvm/indy/stress/java/volatileCallSiteDekker/TestDescription.java`
* `vmTestbase/vm/mlvm/indy/stress/java/mutableCallSiteDekker/TestDescription.java`
They will (most of the time) timeout on linux debug builds, but succeed with a larger `timeoutFactor`. I do not yet have access to a linux machine to debug and investigate more thoroughly, but I suspect that, as this test just keep rebinding CallSites in a loop, they trigger a lot of deoptimizations which creates a lot of contention on the `Compile_lock` and `CodeCache_lock`, and the longer critical sections (due to more asserts) increase the probability of contention slowdowns.
With the bea67de the timeout problem goes away as we are not walking the CodeCache when we do not need to. The unnecessary CodeCache walking time to safepoint issue is what motivated the encapsulation in the first place.
__Testing: Oracle Platforms Tier 1-5__
-------------
PR: https://git.openjdk.org/jdk/pull/9655
More information about the hotspot-dev
mailing list