RFR(L): 8226705: [REDO] Deoptimize with handshakes

Thu Sep 12 08:36:51 UTC 2019

Hi Dan, thanks for having a look.

On 9/11/19 12:19 AM, Daniel D. Daugherty wrote:
> src/hotspot/share/runtime/biasedLocking.cpp
>      L729: void BiasedLocking::revoke_own_locks(Handle obj, TRAPS) {
>          Perhaps add:
>            assert(THREAD->is_Java_thread(), "must be called by a JavaThread");
>            JavaThread* thread = (JavaThread*)THREAD;
> 
>          and then switch uses of 'THREAD' for 'thread'. This way we
>          are not casting away a bad caller...

Fixed

> 
> src/hotspot/share/runtime/biasedLocking.hpp
>      L193:   // This should be called by a JavaThread to revoke the bias of an 
> owned object
>          Perhaps:
>              // This must only be called by a JavaThread to revoke the bias of 
> an owned object.
> 

Fixed

> src/hotspot/share/runtime/deoptimization.cpp
>      Moving the top part of Deoptimization::fetch_unroll_info_helper()
>      made this hard to review, but I suspect leaving it in place would
>      have been just as hard because of the refactor of code into
>      eliminate_allocations() and eliminate_locks(). Did lots of
>      scrolling of the frames windows and I think it looks correct...

There was one issue with ifdef which Patricio found.

> 
> src/hotspot/share/runtime/mutexLocker.cpp
>      L236:   def(CompiledMethod_lock          , PaddedMutex  , special-1,   
> true,  Monitor::_safepoint_check_never);
>          So CompiledMethod_lock is 'tty' + 1. Is that so that it ranks
>          lower than the Patching_lock? Any lock ranking issues expected
>          for the places you replaced Patching_lock with CompiledMethod_lock?
> 
>          I don't think you replaced all uses of Patching_lock, right?
>          (Sorry if I asked that before...)

Patching_lock is only used to mt-safe patch code.

> 
>      old L251:   def(OsrList_lock                 , PaddedMutex  , leaf,        
> true,  Monitor::_safepoint_check_never);
>          So OsrList_lock was a 'leaf' and you've replaced its uses with
>          CompiledMethod_lock which is (special - 1). I'm assuming that
>          no lock ranking hiccups have been seen...

No problems.

> 
> src/hotspot/share/runtime/mutexLocker.hpp
>      L35: extern Mutex*   CompiledMethod_lock;             // a lock used to 
> guard a compiled method
>          Should you also mention the OSR queues since CompiledMethod_lock
>          also replaced this:
> 
>          old L93: extern Mutex*   OsrList_lock; // a lock used to serialize 
> access to OSR queues

Fixed

> 
> src/hotspot/share/runtime/sharedRuntime.cpp
>      No comments.
> 
> src/hotspot/share/runtime/thread.cpp
>      No comments.
> 
> src/hotspot/share/runtime/thread.hpp
>      No comments.
> 
> src/hotspot/share/runtime/tieredThresholdPolicy.cpp
>      No comments.
> 
> src/hotspot/share/runtime/vmOperations.cpp
>      No comments.
> 
> src/hotspot/share/runtime/vmOperations.hpp
>      No comments (another VM_Op bites the dust...)
> 
> src/hotspot/share/services/dtraceAttacher.cpp
>      No comments other than: Such a strange place to find the
>      VM_DeoptimizeTheWorld VM_Op...
> 
> test/hotspot/jtreg/compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java
>      Shouldn't there be an @bug 8226705 with the new test?
> 
>      What tier does the new test execute in and have you verified
>      that it passes on all Oracle platforms?

So this test was done for the original changeset.
This test don't reproduce that issue.
(it can in theory, but I have never seen it)
We had a WB test method which where not used by any test,
so I added a test which used, since this WB method would stress deopt.

> 
> You'll definitely want a review from a Compiler team member
> for this one. :-) Since there are Biased Locking changes,
> Patricio should also chime in here.
> 
> For testing, I'm assuming that the tests that failed due to
> 
>      JDK-8221734 Deoptimize with handshakes
> 
> have been run on this REDO and that they pass.

The main issue is safepointing between revoking and expanding the stack, since a 
safepoint can rebais the lock (VM_BulkRevokeBias).

> 
> 
> Thumbs up! I don't need to see another webrev if you decide to
> make any changes due to my few comments above.

Thanks Dan!

/Robbin

> 
> Dan
> 
> 
>>
>> The problem is a nmethods state must never go backwards.
>> When the methods code is set to the compiled code it can be made not_entrant.
>> (it can be set to not_entrant before not having Compiled_lock)
>> When unlinking the methods code must point to the nmethod.
>>
>> If we set code before changing state, it can go backwards.
>> If we set state before setting code, we can't unlink it.
>>
>> This solution uses the new CompiledMethod_lock to synchronize code and state 
>> changes. (unloaded don't need the lock since it require a safepoint to happen)
>>
>> But this resulted in a circular lock issue with OsrList_lock.
>> Therefore OsrList_lock is removed, instead we use CompiledMethod_lock.
>>
>> After this patch it should be possible to remove Compile_lock from some paths. 
>> (Note that JVMCI is missing Compile_lock from several pathes, but fewer with 
>> this changeset)
>>
>> Also InterpreterRuntime::frequency_counter_overflow_inner seem to have the 
>> same bug as the reason for this redo: BiasedLocking::revoke(objects_to_revoke, 
>> thread);
>> The objects could have been rebaised by the time we return from it.
>> I think we should call the new 
>> BiasedLocking::revoke_own_locks(objects_to_revoke->at(i), thread); here also.
>>
>> Passes t1-5 twice and KS 60 min.
>>
>> @Erik I stayed with current is_in_use (not_installed + in_use), there were to 
>> many places and many of them needed an equivalent method if is_in_use was to 
>> be changed (to only in_use). I didn't see any issue with using the current 
>> definition.
>>
>> Thanks, Robbin
>>
>> On 8/29/19 4:35 PM, Erik Österlund wrote:
>>> Hi Robbin,
>>>
>>> On 2019-08-29 14:25, Robbin Ehn wrote:
>>>> Hi Erik, thanks for having a look.
>>>>
>>>> Just some short answers and questions.
>>>>
>>>> On 8/29/19 12:01 PM, Erik Österlund wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> Glad to see this work making it back again. Last time this patch was 
>>>>> proposed, there were no guards against non-monotonic nmethod state 
>>>>> transitions. Today there are (since 
>>>>> https://bugs.openjdk.java.net/browse/JDK-8224674). In other words, if you 
>>>>> call make_not_entrant() on an nmethod that has racingly become zombie, it 
>>>>> is today impossible to resurrect that nmethod. Instead, make_not_entrant() 
>>>>> will return false.
>>>>
>>>> Great, I'll have a look.
>>>>
>>>>>
>>>>> In particular, I think that is what the following code in the patch is 
>>>>> guarding against (in codeCache.cpp):
>>>>>
>>>>> 1150 // Mark methods for deopt (if safe or possible).
>>>>> 1151 void CodeCache::mark_all_nmethods_for_deoptimization() {
>>>>> 1152   MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>>>>> 1153   CompiledMethodIterator 
>>>>> iter(CompiledMethodIterator::only_alive_and_not_unloading);
>>>>> 1154   while(iter.next()) {
>>>>> 1155     CompiledMethod* nm = iter.method();
>>>>> 1156     if (!nm->method()->is_method_handle_intrinsic() &&
>>>>> 1157         !nm->is_not_installed() &&
>>>>> 1158         nm->is_in_use() &&
>>>>> 1159         !nm->is_native_method()) {
>>>>> 1160       // Intrinsics and native methods are never deopted. A method 
>>>>> that is
>>>>> 1161       // not installed yet or is not in use is not safe to deopt; the
>>>>> 1162       // is_in_use() check covers the not_entrant and not zombie cases.
>>>>> 1163       // Note: A not_entrant method can become a zombie at anytime if 
>>>>> it was
>>>>> 1164       // made not_entrant before the previous safepoint/handshake.
>>>>> 1165       nm->mark_for_deoptimization();
>>>>> 1166     }
>>>>> 1167   }
>>>>> 1168 }
>>>>>
>>>>> In fact, this code is also guarding against is_not_installed nmethods 
>>>>> (which was introduced as a new function). If we find that an nmethod is bad 
>>>>> and say it needs deoptimization, and that nmethod is just about to get 
>>>>> installed, it seems to me that skipping that nmethod and not making it not 
>>>>> entrant is really dangerous and will eventually lead to a crash. Because in 
>>>>> ciEnv.cpp we set_code to the about to be installed nmethod, and it will get 
>>>>> called without anything stopping the invalid code from being executed by 
>>>>> mutators, and it was being invalidated for a reason. So basically we really 
>>>>> do have to make these not entrant, or we will blow up.
>>>>>
>>>>> I guess the reason you needed this check is because make_in_use() doesn't 
>>>>> check what the current state is, causing monotonicity problems, making a 
>>>>> potentially already not entrant nmethod become in_use, eventually blowing 
>>>>> up asserts and probably the VM. Regrettably, make_in_use() still does that, 
>>>>> because I had no need for changing it to use nmethod::try_transition in 
>>>>> 8224674, nobody previously could find these nmethods. In retrospect, not 
>>>>> changing that was a bit silly. So if we just change make_in_use() to use 
>>>>> nmethod::try_transition instead, then you can remove the !is_not_installed 
>>>>> check... which I argue we have to do. The reason we never ran in to this 
>>>>> before is because we made the nmethods not entrant in a safepoint, and 
>>>>> between making an nmethod and making it in_use, there is no safepoint 
>>>>> check, so they were never observed in this state.
>>>>>
>>>>
>>>> Yes, thread A is creating a new method X, it have state not installed.
>>>> If we make method X not entrant, Thread A will flip it back to in_use, which 
>>>> bad
>>>> and we crash. mark_all_nmethods_for_deoptimization is a test method used in the
>>>> new jtreg test, which is the only use (I see dtrace have some hack to use 
>>>> it, it
>>>> can crash the vm).
>>>
>>> Exactly. However, what you are missing here is that this does not happen only 
>>> for the whitebox test, now that we do normal deopt with handshakes.
>>>
>>> Consider the following race.
>>>
>>> A compiler thread compiles a new nmethod. In this nmethod, it's assumed that 
>>> a certain invoke_virtual callsite is completely monomorphic, and the compiler 
>>> chooses to implement the callsite with a direct call, which is only valid 
>>> given its monomorphism. The compilation finishes, and at the very end of the 
>>> process, the code cache lock is grabbed, under which we allocate the new 
>>> nmethod, verify the assumptions made (which still hold), inject entries in 
>>> DependencyContexts so that if these assumptions change, we will deopt, and 
>>> then unlock the code cache lock. The nmethod is now still in the state 
>>> not_installed.
>>>
>>> What can happen now is that another thread loads a class that makes the 
>>> callsite potentially megamorphic. Deopt is triggered, walking the dependency 
>>> contexts under the CodeCache_lock, marking things that are now bad. In this 
>>> list we will find the newly compiled nmethod that has not been made in_use 
>>> yet. It is marked for deoptimization. Then the 
>>> Deoptimization::deoptimize_all_marked() function is called to make sure that 
>>> we 1) make all the marked nmethods not_entrant, and 2) arm the bad activation 
>>> records in the stack. Now the first step of the two grabs the code cache 
>>> lock, and walks the code cache. It finds our about to be installed nmethod, 
>>> and shoots make_not_entrant() at it, racing with the compiler thread calling 
>>> make_in_use() on it.
>>>
>>> This is why I advocate an approach where we simply make it safe to call 
>>> make_in_use() racing with make_not_entrant(), instead of trying to chase down 
>>> all possible scenarios in which this can happen. I can imagine a number of 
>>> others, now that deoptimization is being done with handshakes.
>>>
>>>>> ... and similarly here:
>>>>>
>>>>> 1192     if (nm->is_marked_for_deoptimization() && nm->is_in_use()) {
>>>>> 1193       // only_alive_and_not_unloading() can return not_entrant nmethods.
>>>>> 1194       // A not_entrant method can become a zombie at anytime if it was
>>>>> 1195       // made not_entrant before the previous safepoint/handshake. The
>>>>> 1196       // is_in_use() check covers the not_entrant and not zombie cases
>>>>> 1197       // that have become true after the method was marked for deopt.
>>>>> 1198       nm->make_not_entrant();
>>>>> 1199     }
>>>>>
>>>>> ... we now don't need the guard against is_in_use() here.
>>>>
>>>> Here we don't need that, but we don't need !nm->is_not_entrant() either, since
>>>> it would return false as you say!?
>>>
>>> Indeed. All state checks to see if it is even safe to call the state 
>>> transition function should be removed, as it should always be safe.
>>>
>>>> If we should have an early filter it think nm->is_in_use() is clearer than:
>>>> not not entrant ;) (since the not_installed is normally not seen here)
>>>
>>> The transitioning functions themselves already have early checks, so there 
>>> really is nothing to gain from checking if we should transition first and 
>>> then doing it.
>>>
>>>>>
>>>>> For what I propose to work out, we need to change nmethod::make_in_use to 
>>>>> use nmethod::try_transition, and AOTCompiledMethod::make_entrant() needs to 
>>>>> also return false when encountering a not_entrant nmethod, enforcing 
>>>>> monotonicity, instead of asserting about that. AOT methods may turn 
>>>>> not_in_use AOT methods to in_use, but not not_entrant to in_use. Once 
>>>>> not_entrant, an AOT methods should remain not entrant forever.
>>>>
>>>> The only time we see methods which are not installed should be methods that 
>>>> will
>>>> never be deoptimize (except when calling the test method
>>>> mark_all_nmethods_for_deoptimization), intrinsic and native.
>>>
>>> That's not right. It's visible to a bunch of deoptimizations.
>>>
>>>> And if I remember correctly make_in_use is called in a constructor which
>>>> requires extra error-handling. Construction would fail if try_transition 
>>>> failed.
>>>> The assumption about these 'never deopted' methods seem to be just that :)
>>>
>>> That is also not right. We grab the code cache lock, create the nmethod, 
>>> verify dependencies, install dependencies, drop the code cache lock, all 
>>> under not_installed. It's only after linking the Method to the code, that we 
>>> eventually set it to in_use. So it's visible for longer than you thought. And 
>>> it's visible to calls, a small instant before we change the state to 
>>> in_use(). That's what I claim isn't right. We don't want to be able to call 
>>> into these guys.
>>>
>>>>>
>>>>> So basically, these guards are for something that used to be racy and 
>>>>> dangerous due to the lack of guards inside of the state transitions, and 
>>>>> today that is completely harmless, as the proper guards are in the attempts 
>>>>> to change state. I would prefer to remove these guards, as it is confusing 
>>>>> to the reader that we are guarding against a problem that can't happen. So 
>>>>> I think you can remove the various checks for what state the nmethod is in, 
>>>>> the comments describing races, and just keep the checks if it's a method 
>>>>> handle intrinsic or native method.
>>>>
>>>> I'll revisit this. Was a few months since I did this.
>>>
>>> Okay!
>>>
>>>>>
>>>>> In deoptimize.cpp, the fetch_unroll_info_helper function has moved down, 
>>>>> making it difficult to review as shows the code delta as if everything 
>>>>> changed. So I can't see what actually changed there. :( Would you mind 
>>>>> moving that code back so the webrev shows what changed there?
>>>>
>>>> I have inserted two new static functions before fetch_unroll_info_helper,
>>>> since code from fetch_unroll_info_helper are in these functions diff do this.
>>>> They are static function they are best off before, and putting after them I
>>>> don't think would help the diff tool + I need fwd decl.
>>>> Suggestion?
>>>
>>> Okay, I will try to work out what the actual diff is manually.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> Thanks, Robbin
>>>>
>>>>>
>>>>> Thanks,
>>>>> /Erik
>>>>>
>>>>> On 2019-08-28 11:57, Robbin Ehn wrote:
>>>>>> Hi, hotspot-dev was the intended list.
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>> On 2019-08-28 09:30, Robbin Ehn wrote:
>>>>>>> Hi all, please consider,
>>>>>>>
>>>>>>> To get away from the issues of us revoking in the handshake, but before 
>>>>>>> deopt
>>>>>>> the locks get re-biased before we hit deopt handler, we instead revoke in 
>>>>>>> deopt
>>>>>>> handler.
>>>>>>> The deopt handler have a JRT_BLOCK/_END which can safepoint so we revoke 
>>>>>>> after
>>>>>>> that but before we start looking at the monitors, with a 
>>>>>>> NoSafepointVerifier.
>>>>>>>
>>>>>>> Here is the previous changeset on top of jdk/jdk tip but due to merge 
>>>>>>> conflicts
>>>>>>> some pieces did not apply:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-8221734-baseline/webrev/
>>>>>>> So this is what was reviewed.
>>>>>>>
>>>>>>> The rebase (merge conflict resolutions) and 8224795 bug fix :
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/
>>>>>>>
>>>>>>> Bug 8224795 fix is here:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/src/hotspot/share/code/nmethod.cpp.sdiff.html 
>>>>>>>
>>>>>>>
>>>>>>> After this we have the same functionally as the reverted change-set.
>>>>>>>
>>>>>>> Here is the changes for doing the revoke in deopt handler instead:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-revoke-on-deopt/webrev/
>>>>>>>
>>>>>>> This code was messy adding the deopt revocation in 3 places made it worse.
>>>>>>> Here is a refactoring of that code. (also removed a dead method in 
>>>>>>> biasedlocking):
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-refactor/webrev/
>>>>>>>
>>>>>>> And here is full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/webrev/
>>>>>>>
>>>>>>> Also a full squashed patch file:
>>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/full.patch
>>>>>>>
>>>>>>> Latest test run I did t1-t6, Linux/Windows x64 have passed, MacOSX still
>>>>>>> running.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>>
>