[9] RFR(S): 8074553: Crash with assert(!is_unloaded()) failed: should not call follow on unloaded nmethod

Thu Mar 31 16:13:01 UTC 2016

Hi,

On 2016-03-31 17:49, Vladimir Kozlov wrote:
> On 3/31/16 6:22 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8074553
>> http://cr.openjdk.java.net/~thartmann/8074553/webrev.00/
>>
>> While the code cache sweeper processes a nmethod in
>> NMethodSweeper::process_nmethod(), safepoints may happen and the GC
>> may unload the currently processed nmethod. To prevent this, the
>> sweeper uses a NMethodMarker which saves the nmethod in
>> CodeCacheSweeperThread::_scanned_nmethod. The nmethod is then passed
>> to the GC through a CodeBlobClosure in
>> CodeCacheSweeperThread::oops_do() to keep it alive when the GC
>> iterates over all threads.
>>
>> The problem is that G1 calls nmethods_do() on all threads in the
>> remark phase (see G1RemarkThreadsClosure::do_thread()) which is not
>> overwritten by the sweeper thread. Since the currently processed
>> nmethod is not passed through nmethods_do() by any thread, it is
>> unloaded and we later hit the assert when encountering the nmethod
>> through oops_do().
>>
>> Mikael Gerdin and Stefan Karlsson (thanks again!) suggested to
>> overwrite nmethods_do() as well in CodeCacheSweeperThread and pass
>> _scanned_nmethod to the closure. I also modified
>> Threads::nmethods_do() to ignore the sweeper thread because we want to
>> avoid marking the _scanned_nmethod as seen on the stack when scanning
>> stacks from the sweeper (the nmethod may already be zombie and only
>> referenced by the sweeper).
>
> I did not get this. If you exclude CodeCacheSweeperThread in
> Threads::nmethods_do() then CodeCacheSweeperThread::nmethods_do() will
> not be called. What is the point?

The GC code in question iterates over the threads and calls nmethods_do 
on each JavaThread and the VMThread after claiming them with atomic 
operations to achieve parallelism.

There is still something a bit fishy here though.
Thread::nmethods_do is not virtual, so one must be careful to downcast 
one's Thread* to JavaThread before calling it. And since Tobias' change 
does not make that or JavaThread::nmethods_do (which actually shadows 
and does not override the methods) virtual we can't reach the new code 
unless the GC code is changed to downcast to CodeCacheSweeperThread 
before calling nmethods_do.

I still believe that what Tobias is attempting to do is a necessary fix 
but this only shows how hard this is to reproduce.

Perhaps Thread::nmethods_do should simply be removed (along with any 
calls to it) and JavaThread::nmethods_do should then be made virtual.

/Mikael

>
> Thanks,
> Vladimir
>
>>
>> Unfortunately, this bug is extremely hard to reproduce (it showed up
>> 18 times since early 2015). I was able to reproduce it only once after
>> thousands of runs and therefore not often enough to verify the fix.
>> However, I'm very confident that this solves the problem.
>>
>> Tested with JPRT and RBT (running).
>>
>> Thanks,
>> Tobias
>>