RFR[13]: 8224674: NMethod state machine is not monotonic
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Thu Jul 11 19:39:24 UTC 2019
On 7/11/19 3:02 PM, Erik Österlund wrote:
> Hi Coleen,
>
> On 2019-07-11 14:23, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 7/11/19 2:18 PM, Erik Osterlund wrote:
>>> Hi Coleen,
>>>
>>>> On 11 Jul 2019, at 16:46, coleen.phillimore at oracle.com wrote:
>>>>
>>>>
>>>> Hi Erik,
>>>> I had a look at this also and it seems reasonable and I like that
>>>> 'unloaded' is less dead than 'zombie' now.
>>> Glad you like it!
>>>
>>>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/src/hotspot/share/code/nmethod.cpp.frames.html
>>>>
>>>>
>>>> 1230 guarantee(try_transition(unloaded), "Invalid nmethod
>>>> transition to unloaded");
>>>>
>>>>
>>>> This line worries me. Can you explain why another thread could not
>>>> have made this nmethod zombie already, before the handshake in zgc
>>>> and the call afterward to make_unloaded() for this nmethod? And
>>>> add a comment here?
>>> Sure, will add a comment. Like this:
>>> ”It is an important invariant that there exists no race between the
>>> sweeper and GC thread competing for making the same nmethod zombie
>>> and unloaded respectively. This is ensured by
>>> can_convert_to_zombie() returning false for any is_unloading()
>>> nmethod, informing the sweeper not to step on any GC toes”
>>
>> Yes, I like the comment. Should it be an assert instead though?
>
> Sure, why not! The comment addition and assert instead of guarantee:
> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.02/
Looks good!
Coleen
>
> Thanks,
> /Erik
>
>> Thanks,
>> Coleen
>>>
>>> Does that sound comprehensible?
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>>> On 7/11/19 9:53 AM, Erik Österlund wrote:
>>>>> Hi Dean,
>>>>>
>>>>>> On 2019-07-11 00:42, dean.long at oracle.com wrote:
>>>>>>> On 7/10/19 1:28 AM, Erik Österlund wrote:
>>>>>>> Hi Dean,
>>>>>>>
>>>>>>>> On 2019-07-09 23:31, dean.long at oracle.com wrote:
>>>>>>>>> On 7/1/19 6:12 AM, Erik Österlund wrote:
>>>>>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking
>>>>>>>>> (where unlinking code belongs), instead of after the handshake
>>>>>>>>> (intended for deleting things safely unlinked).
>>>>>>>>> Strictly speaking, moving the OSR nmethod unlinking removes
>>>>>>>>> the racing between make_not_entrant and make_unloaded, but I
>>>>>>>>> still want the monotonicity guards to make this code more robust.
>>>>>>>> I see where you added OSR nmethod unlinking, but not where you
>>>>>>>> removed it, so it's not obvious it was a "move".
>>>>>>> Sorry, bad wording on my part. I added OSR nmethod unlinking
>>>>>>> before the global handshake is run. After the handshake, we call
>>>>>>> make_unloaded() on the same is_unloading() nmethods. That
>>>>>>> function "tries" to unlink the OSR nmethod, but will just not do
>>>>>>> it as it's already unlinked at that point. So in a way, I didn't
>>>>>>> remove the call to unlink the OSR nmethod there, it just won't
>>>>>>> do anything. I preferred structuring it that way instead of
>>>>>>> trying to optimize away the call to unlink the OSR nmethod when
>>>>>>> making it unloaded, but only for the concurrent case. It seemed
>>>>>>> to introduce more conditional magic than it was worth.
>>>>>>> So in practice, the unlinking of OSR nmethods has moved for
>>>>>>> concurrent unloading to before the handshake.
>>>>>>>
>>>>>> OK, in that case, could you add a little information to the
>>>>>> "Invalidate the osr nmethod only once" comment so that in the
>>>>>> future someone isn't tempted to remove the code as redundant?
>>>>> Sure.
>>>>>
>>>>>>>> Would it make sense for nmethod::unlink_from_method() to do the
>>>>>>>> OSR unlinking, or to assert that it has already been done?
>>>>>>> An earlier version of this patch tried to do that. It is indeed
>>>>>>> possible. But it requires changing lock ranks of the OSR nmethod
>>>>>>> lock to special - 1 and moving around a bunch of code as this
>>>>>>> function is also called both when making nmethods not_entrant,
>>>>>>> zombie, and unlinking them in that case. For the first two, we
>>>>>>> conditionally unlink the nmethod based on the current state
>>>>>>> (which is the old state), whereas when I move it, the current
>>>>>>> state is the new state. So I had to change things around a bit
>>>>>>> more to figure out the right condition when to unlink it that
>>>>>>> works for all 3 callers. In the end, since this is going to 13,
>>>>>>> I thought it's more important to minimize the risk as much as I
>>>>>>> can, and leave such refactorings to 14.
>>>>>>>
>>>>>> OK.
>>>>>>
>>>>>>>> The new bailout in the middle of
>>>>>>>> nmethod::make_not_entrant_or_zombie() worries me a little,
>>>>>>>> because the code up to that point has side-effects, and we
>>>>>>>> could be bailing out in an unexpected state.
>>>>>>> Correct. In an earlier version of this patch, I moved the
>>>>>>> transition to before the side effects. But a bunch of code is
>>>>>>> using the current nmethod state to determine what to do, and
>>>>>>> that current state changed from the old to the new state. In
>>>>>>> particular, we conditionally patch in the jump based on the
>>>>>>> current (old) state, and we conditionally increment decompile
>>>>>>> count based on the current (old) state. So I ended up having to
>>>>>>> rewrite more code than I wanted to for a patch going into 13,
>>>>>>> and convince myself that I had not implicitly messed something
>>>>>>> up. It felt safer to reason about the 3 side effects up until
>>>>>>> the transitioning point:
>>>>>>>
>>>>>>> 1) Patching in the jump into VEP. Any state more dead than the
>>>>>>> current transition, would still want that jump to be there.
>>>>>>> 2) Incrementing decompile count when making it not_entrant.
>>>>>>> Seems in order to do regardless, as we had an actual request to
>>>>>>> make the nmethod not entrant because it was bad somehow.
>>>>>>> 3) Marking it as seen on stack when making it not_entrant. This
>>>>>>> will only make can_convert_to_zombie start returning false,
>>>>>>> which is harmless in general. Also, as both transitions to
>>>>>>> zombie and not_entrant are performed under the Patching_lock,
>>>>>>> the only possible race is with make_unloaded. And those nmethods
>>>>>>> are is_unloading(), which also makes can_convert_to_zombie
>>>>>>> return false (in a not racy fashion). So it would essentially
>>>>>>> make no observable difference to any single call to
>>>>>>> can_convert_to_zombie().
>>>>>>>
>>>>>>> In summary, #1 and #3 don't really observably change the state
>>>>>>> of the system, and #2 is completely harmless and probably
>>>>>>> wanted. Therefore I found that moving these things around and
>>>>>>> finding out where we use the current state(), as well as
>>>>>>> rewriting it, seemed like a slightly scarier change for 13 to me.
>>>>>>>
>>>>>>> So in general, there is some refactoring that could be done (and
>>>>>>> I have tried it) to make this nicer. But I want to minimize the
>>>>>>> risk for 13 as much as possible, and perform any risky
>>>>>>> refactorings in 14 instead.
>>>>>>> If your risk assessment is different and you would prefer moving
>>>>>>> the transition higher up (and flipping some conditions) instead,
>>>>>>> I am totally up for that too though, and I do see where you are
>>>>>>> coming from.
>>>>>>>
>>>>>> So if we fail, it means that we lost a race to a "deader" state,
>>>>>> and assuming this is the only path to the deader state, wouldn't
>>>>>> that also mean that #1, #2, and #3 would have already been done
>>>>>> by the winning thread? If so, that makes me feel better about
>>>>>> bailing out in the middle, but I'm still not 100% convinced,
>>>>>> unless we can assert that 1-3 already happened. Do you have a
>>>>>> prototype of what moving the transition higher up would look like?
>>>>> As a matter of fact I do. Here is a webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/
>>>>>
>>>>> I kind of like it. What do you think?
>>>>>
>>>>> Thanks,
>>>>> /Erik
>>>>>
>>>>>> dl
>>>>>>
>>>>>>> BTW, I have tested this change through hs-tier1-7, and it looks
>>>>>>> good.
>>>>>>>
>>>>>>> Thanks a lot Dean for reviewing this code.
>>>>>>>
>>>>>>> /Erik
>>>>>>>
>>>>>>>> dl
>>>>>>>>
>>
More information about the hotspot-dev
mailing list