[9] RFR(S): 8074553: Crash with assert(!is_unloaded()) failed: should not call follow on unloaded nmethod

Fri Apr 1 07:28:06 UTC 2016

Hi Mikael,

On 31.03.2016 18:13, Mikael Gerdin wrote:
> Hi,
> 
> On 2016-03-31 17:49, Vladimir Kozlov wrote:
>> On 3/31/16 6:22 AM, Tobias Hartmann wrote:
>>> Hi,
>>>
>>> please review the following patch:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8074553
>>> http://cr.openjdk.java.net/~thartmann/8074553/webrev.00/
>>>
>>> While the code cache sweeper processes a nmethod in
>>> NMethodSweeper::process_nmethod(), safepoints may happen and the GC
>>> may unload the currently processed nmethod. To prevent this, the
>>> sweeper uses a NMethodMarker which saves the nmethod in
>>> CodeCacheSweeperThread::_scanned_nmethod. The nmethod is then passed
>>> to the GC through a CodeBlobClosure in
>>> CodeCacheSweeperThread::oops_do() to keep it alive when the GC
>>> iterates over all threads.
>>>
>>> The problem is that G1 calls nmethods_do() on all threads in the
>>> remark phase (see G1RemarkThreadsClosure::do_thread()) which is not
>>> overwritten by the sweeper thread. Since the currently processed
>>> nmethod is not passed through nmethods_do() by any thread, it is
>>> unloaded and we later hit the assert when encountering the nmethod
>>> through oops_do().
>>>
>>> Mikael Gerdin and Stefan Karlsson (thanks again!) suggested to
>>> overwrite nmethods_do() as well in CodeCacheSweeperThread and pass
>>> _scanned_nmethod to the closure. I also modified
>>> Threads::nmethods_do() to ignore the sweeper thread because we want to
>>> avoid marking the _scanned_nmethod as seen on the stack when scanning
>>> stacks from the sweeper (the nmethod may already be zombie and only
>>> referenced by the sweeper).
>>
>> I did not get this. If you exclude CodeCacheSweeperThread in
>> Threads::nmethods_do() then CodeCacheSweeperThread::nmethods_do() will
>> not be called. What is the point?
> 
> The GC code in question iterates over the threads and calls nmethods_do on each JavaThread and the VMThread after claiming them with atomic operations to achieve parallelism.
> 
> There is still something a bit fishy here though.
> Thread::nmethods_do is not virtual, so one must be careful to downcast one's Thread* to JavaThread before calling it. And since Tobias' change does not make that or JavaThread::nmethods_do (which actually shadows and does not override the methods) virtual we can't reach the new code unless the GC code is changed to downcast to CodeCacheSweeperThread before calling nmethods_do.

Thanks for taking a look. Of course, you are right. I just assumed that Thread::nmethods_do() is virtual and missed that it's not.

> I still believe that what Tobias is attempting to do is a necessary fix but this only shows how hard this is to reproduce.
> 
> Perhaps Thread::nmethods_do should simply be removed (along with any calls to it) and JavaThread::nmethods_do should then be made virtual.

Yes, I agree. Here is the new webrev:
http://cr.openjdk.java.net/~thartmann/8074553/webrev.01/

I removed the empty Thread::nmethods_do() and all the (static) calls to it.

Thanks,
Tobias

> 
> /Mikael
> 
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Unfortunately, this bug is extremely hard to reproduce (it showed up
>>> 18 times since early 2015). I was able to reproduce it only once after
>>> thousands of runs and therefore not often enough to verify the fix.
>>> However, I'm very confident that this solves the problem.
>>>
>>> Tested with JPRT and RBT (running).
>>>
>>> Thanks,
>>> Tobias
>>>