[9] RFR(S): 8074553: Crash with assert(!is_unloaded()) failed: should not call follow on unloaded nmethod

Mikael Gerdin mikael.gerdin at oracle.com
Fri Apr 1 08:24:16 UTC 2016


Hi Tobias,

On 2016-04-01 09:28, Tobias Hartmann wrote:
> Hi Mikael,
>
> On 31.03.2016 18:13, Mikael Gerdin wrote:
>> Hi,
>>
>> On 2016-03-31 17:49, Vladimir Kozlov wrote:
>>> On 3/31/16 6:22 AM, Tobias Hartmann wrote:
>>>> Hi,
>>>>
>>>> please review the following patch:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8074553
>>>> http://cr.openjdk.java.net/~thartmann/8074553/webrev.00/
>>>>
>>>> While the code cache sweeper processes a nmethod in
>>>> NMethodSweeper::process_nmethod(), safepoints may happen and the GC
>>>> may unload the currently processed nmethod. To prevent this, the
>>>> sweeper uses a NMethodMarker which saves the nmethod in
>>>> CodeCacheSweeperThread::_scanned_nmethod. The nmethod is then passed
>>>> to the GC through a CodeBlobClosure in
>>>> CodeCacheSweeperThread::oops_do() to keep it alive when the GC
>>>> iterates over all threads.
>>>>
>>>> The problem is that G1 calls nmethods_do() on all threads in the
>>>> remark phase (see G1RemarkThreadsClosure::do_thread()) which is not
>>>> overwritten by the sweeper thread. Since the currently processed
>>>> nmethod is not passed through nmethods_do() by any thread, it is
>>>> unloaded and we later hit the assert when encountering the nmethod
>>>> through oops_do().
>>>>
>>>> Mikael Gerdin and Stefan Karlsson (thanks again!) suggested to
>>>> overwrite nmethods_do() as well in CodeCacheSweeperThread and pass
>>>> _scanned_nmethod to the closure. I also modified
>>>> Threads::nmethods_do() to ignore the sweeper thread because we want to
>>>> avoid marking the _scanned_nmethod as seen on the stack when scanning
>>>> stacks from the sweeper (the nmethod may already be zombie and only
>>>> referenced by the sweeper).
>>>
>>> I did not get this. If you exclude CodeCacheSweeperThread in
>>> Threads::nmethods_do() then CodeCacheSweeperThread::nmethods_do() will
>>> not be called. What is the point?
>>
>> The GC code in question iterates over the threads and calls nmethods_do on each JavaThread and the VMThread after claiming them with atomic operations to achieve parallelism.
>>
>> There is still something a bit fishy here though.
>> Thread::nmethods_do is not virtual, so one must be careful to downcast one's Thread* to JavaThread before calling it. And since Tobias' change does not make that or JavaThread::nmethods_do (which actually shadows and does not override the methods) virtual we can't reach the new code unless the GC code is changed to downcast to CodeCacheSweeperThread before calling nmethods_do.
>
> Thanks for taking a look. Of course, you are right. I just assumed that Thread::nmethods_do() is virtual and missed that it's not.
>
>> I still believe that what Tobias is attempting to do is a necessary fix but this only shows how hard this is to reproduce.
>>
>> Perhaps Thread::nmethods_do should simply be removed (along with any calls to it) and JavaThread::nmethods_do should then be made virtual.
>
> Yes, I agree. Here is the new webrev:
> http://cr.openjdk.java.net/~thartmann/8074553/webrev.01/
>
> I removed the empty Thread::nmethods_do() and all the (static) calls to it.

I kind of agree with Vladimir that it's unfortunate to have to check for 
the sweeper thread in Threads::nmethods_do() but I don't have a good 
suggestion for an easy fix.

 From my point of view this fix is ready to go in but perhaps it would 
be a good idea to also think about if this could be cleaned up somehow.

For me - as a GC person - it's hard to see why the sweeper thread should 
be a JavaThread at all, it seems like more of a VM-only thread to me.

/Mikael

>
> Thanks,
> Tobias
>
>>
>> /Mikael
>>
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Unfortunately, this bug is extremely hard to reproduce (it showed up
>>>> 18 times since early 2015). I was able to reproduce it only once after
>>>> thousands of runs and therefore not often enough to verify the fix.
>>>> However, I'm very confident that this solves the problem.
>>>>
>>>> Tested with JPRT and RBT (running).
>>>>
>>>> Thanks,
>>>> Tobias
>>>>


More information about the hotspot-dev mailing list