RFR: 8244416: Remove incorrect assert during inline cache cleaning

Erik Österlund erik.osterlund at oracle.com
Thu May 14 13:07:45 UTC 2020


Hi,

Months of stress testing triggered the following assert in inline cache 
cleaning:

assert(!from->is_zombie(), "should not clean inline caches on zombies");

It used to be assert(!from->is_alive()), but after finding a race that 
resulted in
false positives for unloaded nmethods, the assert was relaxed to check 
only that
we shouldn't have zombies here, with a fat comment explaining why.

Unfortunately, the relaxed assert also gets false positives, so I have 
chosen to
remove it, and further increase the size of that fat comment, explaning 
why we can't
assert that it isn't a zombie either.

What it boils down to is a race where inline cache cleaning is performed 
from
a concurrent GC thread, concurrent to the sweeper making an nmethod zombie.

The order of the race ingredients making the nmethod zombie from the 
sweeper is
the following:

1. Transition state to zombie
2. Unregister nmethod form GC (importantly triggers a wait for 
concurrent code cache unloading to finish)
3. clear_ic_callsites() importantly releasing all IC stubs

The race happens when the GC finds an is_alive && !is_unloading nmethod and
triggers inline cache cleaning on it before 1, but then when racingly 
checking
again in the assert, we are past 1.

Since the assert will spuriously fail even though there is no actual 
problem, and
we are at a point where there isn't much else it could assert, I chose 
to remove it.

Other than triggering the assert, it is worth noting that this racingly 
zombiefied
nmethod will have its inline caches cleaned *with* IC stubs from the 
concurrent
GC thread. Subsequent IC finalization in safepoint cleanup requires live 
IC stubs
to belong to is_alive() nmethods, so the IC stubs must be released 
before the next
safepoint for this race to not be dangerous. That is ensured, because 
the racing
sweeper will get stuck waiting for concurrent code cache unloading to 
finish,
and when it wakes up, it will clear_ic_callsites(), releasing the 
spurious ICStubs,
before finishing the make_zombie() function. Then it will eventually hit 
a safepoint
poll, when the nmethod has been purged from IC stubs.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8244416

Webrev:
http://cr.openjdk.java.net/~eosterlund/8244416/webrev.00/

Thanks,
/Erik


More information about the hotspot-dev mailing list