RFR: 8244416: Remove incorrect assert during inline cache cleaning
Erik Österlund
erik.osterlund at oracle.com
Thu May 14 13:07:45 UTC 2020
Hi,
Months of stress testing triggered the following assert in inline cache
cleaning:
assert(!from->is_zombie(), "should not clean inline caches on zombies");
It used to be assert(!from->is_alive()), but after finding a race that
resulted in
false positives for unloaded nmethods, the assert was relaxed to check
only that
we shouldn't have zombies here, with a fat comment explaining why.
Unfortunately, the relaxed assert also gets false positives, so I have
chosen to
remove it, and further increase the size of that fat comment, explaning
why we can't
assert that it isn't a zombie either.
What it boils down to is a race where inline cache cleaning is performed
from
a concurrent GC thread, concurrent to the sweeper making an nmethod zombie.
The order of the race ingredients making the nmethod zombie from the
sweeper is
the following:
1. Transition state to zombie
2. Unregister nmethod form GC (importantly triggers a wait for
concurrent code cache unloading to finish)
3. clear_ic_callsites() importantly releasing all IC stubs
The race happens when the GC finds an is_alive && !is_unloading nmethod and
triggers inline cache cleaning on it before 1, but then when racingly
checking
again in the assert, we are past 1.
Since the assert will spuriously fail even though there is no actual
problem, and
we are at a point where there isn't much else it could assert, I chose
to remove it.
Other than triggering the assert, it is worth noting that this racingly
zombiefied
nmethod will have its inline caches cleaned *with* IC stubs from the
concurrent
GC thread. Subsequent IC finalization in safepoint cleanup requires live
IC stubs
to belong to is_alive() nmethods, so the IC stubs must be released
before the next
safepoint for this race to not be dangerous. That is ensured, because
the racing
sweeper will get stuck waiting for concurrent code cache unloading to
finish,
and when it wakes up, it will clear_ic_callsites(), releasing the
spurious ICStubs,
before finishing the make_zombie() function. Then it will eventually hit
a safepoint
poll, when the nmethod has been purged from IC stubs.
Bug:
https://bugs.openjdk.java.net/browse/JDK-8244416
Webrev:
http://cr.openjdk.java.net/~eosterlund/8244416/webrev.00/
Thanks,
/Erik
More information about the hotspot-dev
mailing list