RFR[13]: 8224674: NMethod state machine is not monotonic
Erik Österlund
erik.osterlund at oracle.com
Wed Jul 10 08:28:25 UTC 2019
Hi Dean,
On 2019-07-09 23:31, dean.long at oracle.com wrote:
> On 7/1/19 6:12 AM, Erik Österlund wrote:
>> For ZGC I moved OSR nmethod unlinking to before the unlinking (where
>> unlinking code belongs), instead of after the handshake (intended for
>> deleting things safely unlinked).
>> Strictly speaking, moving the OSR nmethod unlinking removes the
>> racing between make_not_entrant and make_unloaded, but I still want
>> the monotonicity guards to make this code more robust.
>
> I see where you added OSR nmethod unlinking, but not where you removed
> it, so it's not obvious it was a "move".
Sorry, bad wording on my part. I added OSR nmethod unlinking before the
global handshake is run. After the handshake, we call make_unloaded() on
the same is_unloading() nmethods. That function "tries" to unlink the
OSR nmethod, but will just not do it as it's already unlinked at that
point. So in a way, I didn't remove the call to unlink the OSR nmethod
there, it just won't do anything. I preferred structuring it that way
instead of trying to optimize away the call to unlink the OSR nmethod
when making it unloaded, but only for the concurrent case. It seemed to
introduce more conditional magic than it was worth.
So in practice, the unlinking of OSR nmethods has moved for concurrent
unloading to before the handshake.
> Would it make sense for nmethod::unlink_from_method() to do the OSR
> unlinking, or to assert that it has already been done?
An earlier version of this patch tried to do that. It is indeed
possible. But it requires changing lock ranks of the OSR nmethod lock to
special - 1 and moving around a bunch of code as this function is also
called both when making nmethods not_entrant, zombie, and unlinking them
in that case. For the first two, we conditionally unlink the nmethod
based on the current state (which is the old state), whereas when I move
it, the current state is the new state. So I had to change things around
a bit more to figure out the right condition when to unlink it that
works for all 3 callers. In the end, since this is going to 13, I
thought it's more important to minimize the risk as much as I can, and
leave such refactorings to 14.
> The new bailout in the middle of nmethod::make_not_entrant_or_zombie()
> worries me a little, because the code up to that point has
> side-effects, and we could be bailing out in an unexpected state.
Correct. In an earlier version of this patch, I moved the transition to
before the side effects. But a bunch of code is using the current
nmethod state to determine what to do, and that current state changed
from the old to the new state. In particular, we conditionally patch in
the jump based on the current (old) state, and we conditionally
increment decompile count based on the current (old) state. So I ended
up having to rewrite more code than I wanted to for a patch going into
13, and convince myself that I had not implicitly messed something up.
It felt safer to reason about the 3 side effects up until the
transitioning point:
1) Patching in the jump into VEP. Any state more dead than the current
transition, would still want that jump to be there.
2) Incrementing decompile count when making it not_entrant. Seems in
order to do regardless, as we had an actual request to make the nmethod
not entrant because it was bad somehow.
3) Marking it as seen on stack when making it not_entrant. This will
only make can_convert_to_zombie start returning false, which is harmless
in general. Also, as both transitions to zombie and not_entrant are
performed under the Patching_lock, the only possible race is with
make_unloaded. And those nmethods are is_unloading(), which also makes
can_convert_to_zombie return false (in a not racy fashion). So it would
essentially make no observable difference to any single call to
can_convert_to_zombie().
In summary, #1 and #3 don't really observably change the state of the
system, and #2 is completely harmless and probably wanted. Therefore I
found that moving these things around and finding out where we use the
current state(), as well as rewriting it, seemed like a slightly scarier
change for 13 to me.
So in general, there is some refactoring that could be done (and I have
tried it) to make this nicer. But I want to minimize the risk for 13 as
much as possible, and perform any risky refactorings in 14 instead.
If your risk assessment is different and you would prefer moving the
transition higher up (and flipping some conditions) instead, I am
totally up for that too though, and I do see where you are coming from.
BTW, I have tested this change through hs-tier1-7, and it looks good.
Thanks a lot Dean for reviewing this code.
/Erik
> dl
>
More information about the hotspot-dev
mailing list