RFR(L): 8226705: [REDO] Deoptimize with handshakes

Tue Sep 10 22:19:54 UTC 2019

On 9/9/19 8:45 AM, Robbin Ehn wrote:
> Hi all,
>
> Me and Erik had some offline discussions, which ended up in this 
> incremental change:
> http://cr.openjdk.java.net/~rehn/8226705/v2/inc/webrev/
>
> Full:
> http://cr.openjdk.java.net/~rehn/8226705/v2/full/webrev/

Thanks for coming back to this change! I know it has been a
difficult one...

Going with the full webrev since it has been so long since I looked
at the original changes:

src/hotspot/share/aot/aotCodeHeap.cpp
     No comments.

src/hotspot/share/aot/aotCompiledMethod.cpp
     No comments.

src/hotspot/share/aot/aotCompiledMethod.hpp
     Hmmm... No changes to this file.

src/hotspot/share/ci/ciEnv.cpp
     No comments.

src/hotspot/share/code/codeCache.cpp
     No comments.

src/hotspot/share/code/compiledMethod.cpp
     No comments.

src/hotspot/share/code/compiledMethod.hpp
     No comments.

src/hotspot/share/code/nmethod.cpp
     No comments.

src/hotspot/share/code/nmethod.hpp
     No comments.

src/hotspot/share/gc/z/zBarrierSetNMethod.cpp
     No comments.

src/hotspot/share/gc/z/zNMethod.cpp
     No comments.

src/hotspot/share/jvmci/jvmciEnv.cpp
     No comments.

src/hotspot/share/jvmci/jvmciRuntime.cpp
     No comments.

src/hotspot/share/oops/instanceKlass.cpp
     No comments.

src/hotspot/share/oops/method.cpp
     No comments.

src/hotspot/share/oops/method.hpp
     No comments.

src/hotspot/share/prims/jvmtiEventController.cpp
     No comments.

src/hotspot/share/prims/methodHandles.cpp
     No comments.

src/hotspot/share/prims/whitebox.cpp
     No comments.

src/hotspot/share/runtime/biasedLocking.cpp
     L729: void BiasedLocking::revoke_own_locks(Handle obj, TRAPS) {
         Perhaps add:
           assert(THREAD->is_Java_thread(), "must be called by a 
JavaThread");
           JavaThread* thread = (JavaThread*)THREAD;

         and then switch uses of 'THREAD' for 'thread'. This way we
         are not casting away a bad caller...

src/hotspot/share/runtime/biasedLocking.hpp
     L193:   // This should be called by a JavaThread to revoke the bias 
of an owned object
         Perhaps:
             // This must only be called by a JavaThread to revoke the 
bias of an owned object.

src/hotspot/share/runtime/deoptimization.cpp
     Moving the top part of Deoptimization::fetch_unroll_info_helper()
     made this hard to review, but I suspect leaving it in place would
     have been just as hard because of the refactor of code into
     eliminate_allocations() and eliminate_locks(). Did lots of
     scrolling of the frames windows and I think it looks correct...

src/hotspot/share/runtime/deoptimization.hpp
     L152:   static void deoptimize(JavaThread* thread, frame fr, 
RegisterMap *reg_map, DeoptReason reason = Reason_constraint);
         I'm not fond of default parameters, but I'll live... :-)

src/hotspot/share/runtime/mutex.hpp
     old L67:        special        = tty            +   1,
     new L67:        special        = tty            +   2,
         I'll be _so glad_ when the lock ranking stuff is cleaned up!

     No (non-whiny) comments. :-)

src/hotspot/share/runtime/mutexLocker.cpp
     L236:   def(CompiledMethod_lock          , PaddedMutex  , 
special-1,   true,  Monitor::_safepoint_check_never);
         So CompiledMethod_lock is 'tty' + 1. Is that so that it ranks
         lower than the Patching_lock? Any lock ranking issues expected
         for the places you replaced Patching_lock with CompiledMethod_lock?

         I don't think you replaced all uses of Patching_lock, right?
         (Sorry if I asked that before...)

     old L251:   def(OsrList_lock                 , PaddedMutex  , 
leaf,        true,  Monitor::_safepoint_check_never);
         So OsrList_lock was a 'leaf' and you've replaced its uses with
         CompiledMethod_lock which is (special - 1). I'm assuming that
         no lock ranking hiccups have been seen...

src/hotspot/share/runtime/mutexLocker.hpp
     L35: extern Mutex*   CompiledMethod_lock;             // a lock 
used to guard a compiled method
         Should you also mention the OSR queues since CompiledMethod_lock
         also replaced this:

         old L93: extern Mutex*   OsrList_lock; // a lock used to 
serialize access to OSR queues

src/hotspot/share/runtime/sharedRuntime.cpp
     No comments.

src/hotspot/share/runtime/thread.cpp
     No comments.

src/hotspot/share/runtime/thread.hpp
     No comments.

src/hotspot/share/runtime/tieredThresholdPolicy.cpp
     No comments.

src/hotspot/share/runtime/vmOperations.cpp
     No comments.

src/hotspot/share/runtime/vmOperations.hpp
     No comments (another VM_Op bites the dust...)

src/hotspot/share/services/dtraceAttacher.cpp
     No comments other than: Such a strange place to find the
     VM_DeoptimizeTheWorld VM_Op...

test/hotspot/jtreg/compiler/codecache/stress/UnexpectedDeoptimizationAllTest.java
     Shouldn't there be an @bug 8226705 with the new test?

     What tier does the new test execute in and have you verified
     that it passes on all Oracle platforms?

You'll definitely want a review from a Compiler team member
for this one. :-) Since there are Biased Locking changes,
Patricio should also chime in here.

For testing, I'm assuming that the tests that failed due to

     JDK-8221734 Deoptimize with handshakes

have been run on this REDO and that they pass.

Thumbs up! I don't need to see another webrev if you decide to
make any changes due to my few comments above.

Dan

>
> The problem is a nmethods state must never go backwards.
> When the methods code is set to the compiled code it can be made 
> not_entrant.
> (it can be set to not_entrant before not having Compiled_lock)
> When unlinking the methods code must point to the nmethod.
>
> If we set code before changing state, it can go backwards.
> If we set state before setting code, we can't unlink it.
>
> This solution uses the new CompiledMethod_lock to synchronize code and 
> state changes. (unloaded don't need the lock since it require a 
> safepoint to happen)
>
> But this resulted in a circular lock issue with OsrList_lock.
> Therefore OsrList_lock is removed, instead we use CompiledMethod_lock.
>
> After this patch it should be possible to remove Compile_lock from 
> some paths. (Note that JVMCI is missing Compile_lock from several 
> pathes, but fewer with this changeset)
>
> Also InterpreterRuntime::frequency_counter_overflow_inner seem to have 
> the same bug as the reason for this redo: 
> BiasedLocking::revoke(objects_to_revoke, thread);
> The objects could have been rebaised by the time we return from it.
> I think we should call the new 
> BiasedLocking::revoke_own_locks(objects_to_revoke->at(i), thread); 
> here also.
>
> Passes t1-5 twice and KS 60 min.
>
> @Erik I stayed with current is_in_use (not_installed + in_use), there 
> were to many places and many of them needed an equivalent method if 
> is_in_use was to be changed (to only in_use). I didn't see any issue 
> with using the current definition.
>
> Thanks, Robbin
>
> On 8/29/19 4:35 PM, Erik Österlund wrote:
>> Hi Robbin,
>>
>> On 2019-08-29 14:25, Robbin Ehn wrote:
>>> Hi Erik, thanks for having a look.
>>>
>>> Just some short answers and questions.
>>>
>>> On 8/29/19 12:01 PM, Erik Österlund wrote:
>>>> Hi Robbin,
>>>>
>>>> Glad to see this work making it back again. Last time this patch 
>>>> was proposed, there were no guards against non-monotonic nmethod 
>>>> state transitions. Today there are (since 
>>>> https://bugs.openjdk.java.net/browse/JDK-8224674). In other words, 
>>>> if you call make_not_entrant() on an nmethod that has racingly 
>>>> become zombie, it is today impossible to resurrect that nmethod. 
>>>> Instead, make_not_entrant() will return false.
>>>
>>> Great, I'll have a look.
>>>
>>>>
>>>> In particular, I think that is what the following code in the patch 
>>>> is guarding against (in codeCache.cpp):
>>>>
>>>> 1150 // Mark methods for deopt (if safe or possible).
>>>> 1151 void CodeCache::mark_all_nmethods_for_deoptimization() {
>>>> 1152   MutexLocker mu(CodeCache_lock, 
>>>> Mutex::_no_safepoint_check_flag);
>>>> 1153   CompiledMethodIterator 
>>>> iter(CompiledMethodIterator::only_alive_and_not_unloading);
>>>> 1154   while(iter.next()) {
>>>> 1155     CompiledMethod* nm = iter.method();
>>>> 1156     if (!nm->method()->is_method_handle_intrinsic() &&
>>>> 1157         !nm->is_not_installed() &&
>>>> 1158         nm->is_in_use() &&
>>>> 1159         !nm->is_native_method()) {
>>>> 1160       // Intrinsics and native methods are never deopted. A 
>>>> method that is
>>>> 1161       // not installed yet or is not in use is not safe to 
>>>> deopt; the
>>>> 1162       // is_in_use() check covers the not_entrant and not 
>>>> zombie cases.
>>>> 1163       // Note: A not_entrant method can become a zombie at 
>>>> anytime if it was
>>>> 1164       // made not_entrant before the previous 
>>>> safepoint/handshake.
>>>> 1165       nm->mark_for_deoptimization();
>>>> 1166     }
>>>> 1167   }
>>>> 1168 }
>>>>
>>>> In fact, this code is also guarding against is_not_installed 
>>>> nmethods (which was introduced as a new function). If we find that 
>>>> an nmethod is bad and say it needs deoptimization, and that nmethod 
>>>> is just about to get installed, it seems to me that skipping that 
>>>> nmethod and not making it not entrant is really dangerous and will 
>>>> eventually lead to a crash. Because in ciEnv.cpp we set_code to the 
>>>> about to be installed nmethod, and it will get called without 
>>>> anything stopping the invalid code from being executed by mutators, 
>>>> and it was being invalidated for a reason. So basically we really 
>>>> do have to make these not entrant, or we will blow up.
>>>>
>>>> I guess the reason you needed this check is because make_in_use() 
>>>> doesn't check what the current state is, causing monotonicity 
>>>> problems, making a potentially already not entrant nmethod become 
>>>> in_use, eventually blowing up asserts and probably the VM. 
>>>> Regrettably, make_in_use() still does that, because I had no need 
>>>> for changing it to use nmethod::try_transition in 8224674, nobody 
>>>> previously could find these nmethods. In retrospect, not changing 
>>>> that was a bit silly. So if we just change make_in_use() to use 
>>>> nmethod::try_transition instead, then you can remove the 
>>>> !is_not_installed check... which I argue we have to do. The reason 
>>>> we never ran in to this before is because we made the nmethods not 
>>>> entrant in a safepoint, and between making an nmethod and making it 
>>>> in_use, there is no safepoint check, so they were never observed in 
>>>> this state.
>>>>
>>>
>>> Yes, thread A is creating a new method X, it have state not installed.
>>> If we make method X not entrant, Thread A will flip it back to 
>>> in_use, which bad
>>> and we crash. mark_all_nmethods_for_deoptimization is a test method 
>>> used in the
>>> new jtreg test, which is the only use (I see dtrace have some hack 
>>> to use it, it
>>> can crash the vm).
>>
>> Exactly. However, what you are missing here is that this does not 
>> happen only for the whitebox test, now that we do normal deopt with 
>> handshakes.
>>
>> Consider the following race.
>>
>> A compiler thread compiles a new nmethod. In this nmethod, it's 
>> assumed that a certain invoke_virtual callsite is completely 
>> monomorphic, and the compiler chooses to implement the callsite with 
>> a direct call, which is only valid given its monomorphism. The 
>> compilation finishes, and at the very end of the process, the code 
>> cache lock is grabbed, under which we allocate the new nmethod, 
>> verify the assumptions made (which still hold), inject entries in 
>> DependencyContexts so that if these assumptions change, we will 
>> deopt, and then unlock the code cache lock. The nmethod is now still 
>> in the state not_installed.
>>
>> What can happen now is that another thread loads a class that makes 
>> the callsite potentially megamorphic. Deopt is triggered, walking the 
>> dependency contexts under the CodeCache_lock, marking things that are 
>> now bad. In this list we will find the newly compiled nmethod that 
>> has not been made in_use yet. It is marked for deoptimization. Then 
>> the Deoptimization::deoptimize_all_marked() function is called to 
>> make sure that we 1) make all the marked nmethods not_entrant, and 2) 
>> arm the bad activation records in the stack. Now the first step of 
>> the two grabs the code cache lock, and walks the code cache. It finds 
>> our about to be installed nmethod, and shoots make_not_entrant() at 
>> it, racing with the compiler thread calling make_in_use() on it.
>>
>> This is why I advocate an approach where we simply make it safe to 
>> call make_in_use() racing with make_not_entrant(), instead of trying 
>> to chase down all possible scenarios in which this can happen. I can 
>> imagine a number of others, now that deoptimization is being done 
>> with handshakes.
>>
>>>> ... and similarly here:
>>>>
>>>> 1192     if (nm->is_marked_for_deoptimization() && nm->is_in_use()) {
>>>> 1193       // only_alive_and_not_unloading() can return not_entrant 
>>>> nmethods.
>>>> 1194       // A not_entrant method can become a zombie at anytime 
>>>> if it was
>>>> 1195       // made not_entrant before the previous 
>>>> safepoint/handshake. The
>>>> 1196       // is_in_use() check covers the not_entrant and not 
>>>> zombie cases
>>>> 1197       // that have become true after the method was marked for 
>>>> deopt.
>>>> 1198       nm->make_not_entrant();
>>>> 1199     }
>>>>
>>>> ... we now don't need the guard against is_in_use() here.
>>>
>>> Here we don't need that, but we don't need !nm->is_not_entrant() 
>>> either, since
>>> it would return false as you say!?
>>
>> Indeed. All state checks to see if it is even safe to call the state 
>> transition function should be removed, as it should always be safe.
>>
>>> If we should have an early filter it think nm->is_in_use() is 
>>> clearer than:
>>> not not entrant ;) (since the not_installed is normally not seen here)
>>
>> The transitioning functions themselves already have early checks, so 
>> there really is nothing to gain from checking if we should transition 
>> first and then doing it.
>>
>>>>
>>>> For what I propose to work out, we need to change 
>>>> nmethod::make_in_use to use nmethod::try_transition, and 
>>>> AOTCompiledMethod::make_entrant() needs to also return false when 
>>>> encountering a not_entrant nmethod, enforcing monotonicity, instead 
>>>> of asserting about that. AOT methods may turn not_in_use AOT 
>>>> methods to in_use, but not not_entrant to in_use. Once not_entrant, 
>>>> an AOT methods should remain not entrant forever.
>>>
>>> The only time we see methods which are not installed should be 
>>> methods that will
>>> never be deoptimize (except when calling the test method
>>> mark_all_nmethods_for_deoptimization), intrinsic and native.
>>
>> That's not right. It's visible to a bunch of deoptimizations.
>>
>>> And if I remember correctly make_in_use is called in a constructor 
>>> which
>>> requires extra error-handling. Construction would fail if 
>>> try_transition failed.
>>> The assumption about these 'never deopted' methods seem to be just 
>>> that :)
>>
>> That is also not right. We grab the code cache lock, create the 
>> nmethod, verify dependencies, install dependencies, drop the code 
>> cache lock, all under not_installed. It's only after linking the 
>> Method to the code, that we eventually set it to in_use. So it's 
>> visible for longer than you thought. And it's visible to calls, a 
>> small instant before we change the state to in_use(). That's what I 
>> claim isn't right. We don't want to be able to call into these guys.
>>
>>>>
>>>> So basically, these guards are for something that used to be racy 
>>>> and dangerous due to the lack of guards inside of the state 
>>>> transitions, and today that is completely harmless, as the proper 
>>>> guards are in the attempts to change state. I would prefer to 
>>>> remove these guards, as it is confusing to the reader that we are 
>>>> guarding against a problem that can't happen. So I think you can 
>>>> remove the various checks for what state the nmethod is in, the 
>>>> comments describing races, and just keep the checks if it's a 
>>>> method handle intrinsic or native method.
>>>
>>> I'll revisit this. Was a few months since I did this.
>>
>> Okay!
>>
>>>>
>>>> In deoptimize.cpp, the fetch_unroll_info_helper function has moved 
>>>> down, making it difficult to review as shows the code delta as if 
>>>> everything changed. So I can't see what actually changed there. :( 
>>>> Would you mind moving that code back so the webrev shows what 
>>>> changed there?
>>>
>>> I have inserted two new static functions before 
>>> fetch_unroll_info_helper,
>>> since code from fetch_unroll_info_helper are in these functions diff 
>>> do this.
>>> They are static function they are best off before, and putting after 
>>> them I
>>> don't think would help the diff tool + I need fwd decl.
>>> Suggestion?
>>
>> Okay, I will try to work out what the actual diff is manually.
>>
>> Thanks,
>> /Erik
>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 2019-08-28 11:57, Robbin Ehn wrote:
>>>>> Hi, hotspot-dev was the intended list.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 2019-08-28 09:30, Robbin Ehn wrote:
>>>>>> Hi all, please consider,
>>>>>>
>>>>>> To get away from the issues of us revoking in the handshake, but 
>>>>>> before deopt
>>>>>> the locks get re-biased before we hit deopt handler, we instead 
>>>>>> revoke in deopt
>>>>>> handler.
>>>>>> The deopt handler have a JRT_BLOCK/_END which can safepoint so we 
>>>>>> revoke after
>>>>>> that but before we start looking at the monitors, with a 
>>>>>> NoSafepointVerifier.
>>>>>>
>>>>>> Here is the previous changeset on top of jdk/jdk tip but due to 
>>>>>> merge conflicts
>>>>>> some pieces did not apply:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-8221734-baseline/webrev/ 
>>>>>>
>>>>>> So this is what was reviewed.
>>>>>>
>>>>>> The rebase (merge conflict resolutions) and 8224795 bug fix :
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/
>>>>>>
>>>>>> Bug 8224795 fix is here:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/src/hotspot/share/code/nmethod.cpp.sdiff.html 
>>>>>>
>>>>>>
>>>>>> After this we have the same functionally as the reverted change-set.
>>>>>>
>>>>>> Here is the changes for doing the revoke in deopt handler instead:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-revoke-on-deopt/webrev/ 
>>>>>>
>>>>>>
>>>>>> This code was messy adding the deopt revocation in 3 places made 
>>>>>> it worse.
>>>>>> Here is a refactoring of that code. (also removed a dead method 
>>>>>> in biasedlocking):
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-refactor/webrev/
>>>>>>
>>>>>> And here is full:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/webrev/
>>>>>>
>>>>>> Also a full squashed patch file:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/full.patch
>>>>>>
>>>>>> Latest test run I did t1-t6, Linux/Windows x64 have passed, 
>>>>>> MacOSX still
>>>>>> running.
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>