RFR(L): 8226705: [REDO] Deoptimize with handshakes
Patricio Chilano
patricio.chilano.mateo at oracle.com
Wed Sep 11 06:07:50 UTC 2019
Hi Robbin,
Thanks for looking into this and for making fetch_unroll_info_helper()
more clear to read!
I looked again at the biased locking part only since I don't really know
the rest of code. Those look good to me, just some comments:
src/hotspot/share/runtime/deoptimization.cpp:
1) What was the problem of calling revoke_from_deopt_handler() before
eliminate_allocations()?
2) If COMPILER2_OR_JVMCI is 0, shouldn't we still call
revoke_from_deopt_handler()?
src/hotspot/share/runtime/biasedLocking.cpp:
L737: markWord prototype_header = k->prototype_header();
Although that will probably get optimized away in product code, I would
just place that load inside the assert.
src/hotspot/share/runtime/biasedLocking.xpp:
Maybe s/revoke_own_locks()/revoke_own_lock() ?
Regarding InterpreterRuntime::frequency_counter_overflow_inner() having
possibly the same bug, not sure I see the trace for that.
Thanks!
Patricio
On 9/9/19 9:45 AM, Robbin Ehn wrote:
> Hi all,
>
> Me and Erik had some offline discussions, which ended up in this
> incremental change:
> http://cr.openjdk.java.net/~rehn/8226705/v2/inc/webrev/
>
> Full:
> http://cr.openjdk.java.net/~rehn/8226705/v2/full/webrev/
>
> The problem is a nmethods state must never go backwards.
> When the methods code is set to the compiled code it can be made
> not_entrant.
> (it can be set to not_entrant before not having Compiled_lock)
> When unlinking the methods code must point to the nmethod.
>
> If we set code before changing state, it can go backwards.
> If we set state before setting code, we can't unlink it.
>
> This solution uses the new CompiledMethod_lock to synchronize code and
> state changes. (unloaded don't need the lock since it require a
> safepoint to happen)
>
> But this resulted in a circular lock issue with OsrList_lock.
> Therefore OsrList_lock is removed, instead we use CompiledMethod_lock.
>
> After this patch it should be possible to remove Compile_lock from
> some paths. (Note that JVMCI is missing Compile_lock from several
> pathes, but fewer with this changeset)
>
> Also InterpreterRuntime::frequency_counter_overflow_inner seem to have
> the same bug as the reason for this redo:
> BiasedLocking::revoke(objects_to_revoke, thread);
> The objects could have been rebaised by the time we return from it.
> I think we should call the new
> BiasedLocking::revoke_own_locks(objects_to_revoke->at(i), thread);
> here also.
>
> Passes t1-5 twice and KS 60 min.
>
> @Erik I stayed with current is_in_use (not_installed + in_use), there
> were to many places and many of them needed an equivalent method if
> is_in_use was to be changed (to only in_use). I didn't see any issue
> with using the current definition.
>
> Thanks, Robbin
>
> On 8/29/19 4:35 PM, Erik Österlund wrote:
>> Hi Robbin,
>>
>> On 2019-08-29 14:25, Robbin Ehn wrote:
>>> Hi Erik, thanks for having a look.
>>>
>>> Just some short answers and questions.
>>>
>>> On 8/29/19 12:01 PM, Erik Österlund wrote:
>>>> Hi Robbin,
>>>>
>>>> Glad to see this work making it back again. Last time this patch
>>>> was proposed, there were no guards against non-monotonic nmethod
>>>> state transitions. Today there are (since
>>>> https://bugs.openjdk.java.net/browse/JDK-8224674). In other words,
>>>> if you call make_not_entrant() on an nmethod that has racingly
>>>> become zombie, it is today impossible to resurrect that nmethod.
>>>> Instead, make_not_entrant() will return false.
>>>
>>> Great, I'll have a look.
>>>
>>>>
>>>> In particular, I think that is what the following code in the patch
>>>> is guarding against (in codeCache.cpp):
>>>>
>>>> 1150 // Mark methods for deopt (if safe or possible).
>>>> 1151 void CodeCache::mark_all_nmethods_for_deoptimization() {
>>>> 1152 MutexLocker mu(CodeCache_lock,
>>>> Mutex::_no_safepoint_check_flag);
>>>> 1153 CompiledMethodIterator
>>>> iter(CompiledMethodIterator::only_alive_and_not_unloading);
>>>> 1154 while(iter.next()) {
>>>> 1155 CompiledMethod* nm = iter.method();
>>>> 1156 if (!nm->method()->is_method_handle_intrinsic() &&
>>>> 1157 !nm->is_not_installed() &&
>>>> 1158 nm->is_in_use() &&
>>>> 1159 !nm->is_native_method()) {
>>>> 1160 // Intrinsics and native methods are never deopted. A
>>>> method that is
>>>> 1161 // not installed yet or is not in use is not safe to
>>>> deopt; the
>>>> 1162 // is_in_use() check covers the not_entrant and not
>>>> zombie cases.
>>>> 1163 // Note: A not_entrant method can become a zombie at
>>>> anytime if it was
>>>> 1164 // made not_entrant before the previous
>>>> safepoint/handshake.
>>>> 1165 nm->mark_for_deoptimization();
>>>> 1166 }
>>>> 1167 }
>>>> 1168 }
>>>>
>>>> In fact, this code is also guarding against is_not_installed
>>>> nmethods (which was introduced as a new function). If we find that
>>>> an nmethod is bad and say it needs deoptimization, and that nmethod
>>>> is just about to get installed, it seems to me that skipping that
>>>> nmethod and not making it not entrant is really dangerous and will
>>>> eventually lead to a crash. Because in ciEnv.cpp we set_code to the
>>>> about to be installed nmethod, and it will get called without
>>>> anything stopping the invalid code from being executed by mutators,
>>>> and it was being invalidated for a reason. So basically we really
>>>> do have to make these not entrant, or we will blow up.
>>>>
>>>> I guess the reason you needed this check is because make_in_use()
>>>> doesn't check what the current state is, causing monotonicity
>>>> problems, making a potentially already not entrant nmethod become
>>>> in_use, eventually blowing up asserts and probably the VM.
>>>> Regrettably, make_in_use() still does that, because I had no need
>>>> for changing it to use nmethod::try_transition in 8224674, nobody
>>>> previously could find these nmethods. In retrospect, not changing
>>>> that was a bit silly. So if we just change make_in_use() to use
>>>> nmethod::try_transition instead, then you can remove the
>>>> !is_not_installed check... which I argue we have to do. The reason
>>>> we never ran in to this before is because we made the nmethods not
>>>> entrant in a safepoint, and between making an nmethod and making it
>>>> in_use, there is no safepoint check, so they were never observed in
>>>> this state.
>>>>
>>>
>>> Yes, thread A is creating a new method X, it have state not installed.
>>> If we make method X not entrant, Thread A will flip it back to
>>> in_use, which bad
>>> and we crash. mark_all_nmethods_for_deoptimization is a test method
>>> used in the
>>> new jtreg test, which is the only use (I see dtrace have some hack
>>> to use it, it
>>> can crash the vm).
>>
>> Exactly. However, what you are missing here is that this does not
>> happen only for the whitebox test, now that we do normal deopt with
>> handshakes.
>>
>> Consider the following race.
>>
>> A compiler thread compiles a new nmethod. In this nmethod, it's
>> assumed that a certain invoke_virtual callsite is completely
>> monomorphic, and the compiler chooses to implement the callsite with
>> a direct call, which is only valid given its monomorphism. The
>> compilation finishes, and at the very end of the process, the code
>> cache lock is grabbed, under which we allocate the new nmethod,
>> verify the assumptions made (which still hold), inject entries in
>> DependencyContexts so that if these assumptions change, we will
>> deopt, and then unlock the code cache lock. The nmethod is now still
>> in the state not_installed.
>>
>> What can happen now is that another thread loads a class that makes
>> the callsite potentially megamorphic. Deopt is triggered, walking the
>> dependency contexts under the CodeCache_lock, marking things that are
>> now bad. In this list we will find the newly compiled nmethod that
>> has not been made in_use yet. It is marked for deoptimization. Then
>> the Deoptimization::deoptimize_all_marked() function is called to
>> make sure that we 1) make all the marked nmethods not_entrant, and 2)
>> arm the bad activation records in the stack. Now the first step of
>> the two grabs the code cache lock, and walks the code cache. It finds
>> our about to be installed nmethod, and shoots make_not_entrant() at
>> it, racing with the compiler thread calling make_in_use() on it.
>>
>> This is why I advocate an approach where we simply make it safe to
>> call make_in_use() racing with make_not_entrant(), instead of trying
>> to chase down all possible scenarios in which this can happen. I can
>> imagine a number of others, now that deoptimization is being done
>> with handshakes.
>>
>>>> ... and similarly here:
>>>>
>>>> 1192 if (nm->is_marked_for_deoptimization() && nm->is_in_use()) {
>>>> 1193 // only_alive_and_not_unloading() can return not_entrant
>>>> nmethods.
>>>> 1194 // A not_entrant method can become a zombie at anytime
>>>> if it was
>>>> 1195 // made not_entrant before the previous
>>>> safepoint/handshake. The
>>>> 1196 // is_in_use() check covers the not_entrant and not
>>>> zombie cases
>>>> 1197 // that have become true after the method was marked for
>>>> deopt.
>>>> 1198 nm->make_not_entrant();
>>>> 1199 }
>>>>
>>>> ... we now don't need the guard against is_in_use() here.
>>>
>>> Here we don't need that, but we don't need !nm->is_not_entrant()
>>> either, since
>>> it would return false as you say!?
>>
>> Indeed. All state checks to see if it is even safe to call the state
>> transition function should be removed, as it should always be safe.
>>
>>> If we should have an early filter it think nm->is_in_use() is
>>> clearer than:
>>> not not entrant ;) (since the not_installed is normally not seen here)
>>
>> The transitioning functions themselves already have early checks, so
>> there really is nothing to gain from checking if we should transition
>> first and then doing it.
>>
>>>>
>>>> For what I propose to work out, we need to change
>>>> nmethod::make_in_use to use nmethod::try_transition, and
>>>> AOTCompiledMethod::make_entrant() needs to also return false when
>>>> encountering a not_entrant nmethod, enforcing monotonicity, instead
>>>> of asserting about that. AOT methods may turn not_in_use AOT
>>>> methods to in_use, but not not_entrant to in_use. Once not_entrant,
>>>> an AOT methods should remain not entrant forever.
>>>
>>> The only time we see methods which are not installed should be
>>> methods that will
>>> never be deoptimize (except when calling the test method
>>> mark_all_nmethods_for_deoptimization), intrinsic and native.
>>
>> That's not right. It's visible to a bunch of deoptimizations.
>>
>>> And if I remember correctly make_in_use is called in a constructor
>>> which
>>> requires extra error-handling. Construction would fail if
>>> try_transition failed.
>>> The assumption about these 'never deopted' methods seem to be just
>>> that :)
>>
>> That is also not right. We grab the code cache lock, create the
>> nmethod, verify dependencies, install dependencies, drop the code
>> cache lock, all under not_installed. It's only after linking the
>> Method to the code, that we eventually set it to in_use. So it's
>> visible for longer than you thought. And it's visible to calls, a
>> small instant before we change the state to in_use(). That's what I
>> claim isn't right. We don't want to be able to call into these guys.
>>
>>>>
>>>> So basically, these guards are for something that used to be racy
>>>> and dangerous due to the lack of guards inside of the state
>>>> transitions, and today that is completely harmless, as the proper
>>>> guards are in the attempts to change state. I would prefer to
>>>> remove these guards, as it is confusing to the reader that we are
>>>> guarding against a problem that can't happen. So I think you can
>>>> remove the various checks for what state the nmethod is in, the
>>>> comments describing races, and just keep the checks if it's a
>>>> method handle intrinsic or native method.
>>>
>>> I'll revisit this. Was a few months since I did this.
>>
>> Okay!
>>
>>>>
>>>> In deoptimize.cpp, the fetch_unroll_info_helper function has moved
>>>> down, making it difficult to review as shows the code delta as if
>>>> everything changed. So I can't see what actually changed there. :(
>>>> Would you mind moving that code back so the webrev shows what
>>>> changed there?
>>>
>>> I have inserted two new static functions before
>>> fetch_unroll_info_helper,
>>> since code from fetch_unroll_info_helper are in these functions diff
>>> do this.
>>> They are static function they are best off before, and putting after
>>> them I
>>> don't think would help the diff tool + I need fwd decl.
>>> Suggestion?
>>
>> Okay, I will try to work out what the actual diff is manually.
>>
>> Thanks,
>> /Erik
>>
>>> Thanks, Robbin
>>>
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 2019-08-28 11:57, Robbin Ehn wrote:
>>>>> Hi, hotspot-dev was the intended list.
>>>>>
>>>>> Thanks, Robbin
>>>>>
>>>>> On 2019-08-28 09:30, Robbin Ehn wrote:
>>>>>> Hi all, please consider,
>>>>>>
>>>>>> To get away from the issues of us revoking in the handshake, but
>>>>>> before deopt
>>>>>> the locks get re-biased before we hit deopt handler, we instead
>>>>>> revoke in deopt
>>>>>> handler.
>>>>>> The deopt handler have a JRT_BLOCK/_END which can safepoint so we
>>>>>> revoke after
>>>>>> that but before we start looking at the monitors, with a
>>>>>> NoSafepointVerifier.
>>>>>>
>>>>>> Here is the previous changeset on top of jdk/jdk tip but due to
>>>>>> merge conflicts
>>>>>> some pieces did not apply:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-8221734-baseline/webrev/
>>>>>>
>>>>>> So this is what was reviewed.
>>>>>>
>>>>>> The rebase (merge conflict resolutions) and 8224795 bug fix :
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/
>>>>>>
>>>>>> Bug 8224795 fix is here:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-rebase/webrev/src/hotspot/share/code/nmethod.cpp.sdiff.html
>>>>>>
>>>>>>
>>>>>> After this we have the same functionally as the reverted change-set.
>>>>>>
>>>>>> Here is the changes for doing the revoke in deopt handler instead:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-revoke-on-deopt/webrev/
>>>>>>
>>>>>>
>>>>>> This code was messy adding the deopt revocation in 3 places made
>>>>>> it worse.
>>>>>> Here is a refactoring of that code. (also removed a dead method
>>>>>> in biasedlocking):
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/8226705-refactor/webrev/
>>>>>>
>>>>>> And here is full:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/webrev/
>>>>>>
>>>>>> Also a full squashed patch file:
>>>>>> http://cr.openjdk.java.net/~rehn/8226705/v1/full/full.patch
>>>>>>
>>>>>> Latest test run I did t1-t6, Linux/Windows x64 have passed,
>>>>>> MacOSX still
>>>>>> running.
>>>>>>
>>>>>> Thanks, Robbin
>>>>>>
>>>>>>
More information about the hotspot-dev
mailing list