RFR: 8244416: Remove incorrect assert during inline cache cleaning

Wed May 20 11:04:02 UTC 2020

Looks good!

/Per

On 5/14/20 3:07 PM, Erik Österlund wrote:
> Hi,
> 
> Months of stress testing triggered the following assert in inline cache 
> cleaning:
> 
> assert(!from->is_zombie(), "should not clean inline caches on zombies");
> 
> It used to be assert(!from->is_alive()), but after finding a race that 
> resulted in
> false positives for unloaded nmethods, the assert was relaxed to check 
> only that
> we shouldn't have zombies here, with a fat comment explaining why.
> 
> Unfortunately, the relaxed assert also gets false positives, so I have 
> chosen to
> remove it, and further increase the size of that fat comment, explaning 
> why we can't
> assert that it isn't a zombie either.
> 
> What it boils down to is a race where inline cache cleaning is performed 
> from
> a concurrent GC thread, concurrent to the sweeper making an nmethod zombie.
> 
> The order of the race ingredients making the nmethod zombie from the 
> sweeper is
> the following:
> 
> 1. Transition state to zombie
> 2. Unregister nmethod form GC (importantly triggers a wait for 
> concurrent code cache unloading to finish)
> 3. clear_ic_callsites() importantly releasing all IC stubs
> 
> The race happens when the GC finds an is_alive && !is_unloading nmethod and
> triggers inline cache cleaning on it before 1, but then when racingly 
> checking
> again in the assert, we are past 1.
> 
> Since the assert will spuriously fail even though there is no actual 
> problem, and
> we are at a point where there isn't much else it could assert, I chose 
> to remove it.
> 
> Other than triggering the assert, it is worth noting that this racingly 
> zombiefied
> nmethod will have its inline caches cleaned *with* IC stubs from the 
> concurrent
> GC thread. Subsequent IC finalization in safepoint cleanup requires live 
> IC stubs
> to belong to is_alive() nmethods, so the IC stubs must be released 
> before the next
> safepoint for this race to not be dangerous. That is ensured, because 
> the racing
> sweeper will get stuck waiting for concurrent code cache unloading to 
> finish,
> and when it wakes up, it will clear_ic_callsites(), releasing the 
> spurious ICStubs,
> before finishing the make_zombie() function. Then it will eventually hit 
> a safepoint
> poll, when the nmethod has been purged from IC stubs.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8244416
> 
> Webrev:
> http://cr.openjdk.java.net/~eosterlund/8244416/webrev.00/
> 
> Thanks,
> /Erik