[9] RFR(S): 8074553: Crash with assert(!is_unloaded()) failed: should not call follow on unloaded nmethod

Tobias Hartmann tobias.hartmann at oracle.com
Fri Apr 1 13:23:31 UTC 2016


Hi Mikael,

On 01.04.2016 12:59, Mikael Gerdin wrote:
> Tobias,
> 
> On 2016-04-01 11:35, Tobias Hartmann wrote:
>> Hi Mikael,
>>
>> On 01.04.2016 10:24, Mikael Gerdin wrote:
>>> Hi Tobias,
>>>
>>> On 2016-04-01 09:28, Tobias Hartmann wrote:
>>>> Hi Mikael,
>>>>
>>>> On 31.03.2016 18:13, Mikael Gerdin wrote:
>>>>> Hi,
>>>>>
>>>>> On 2016-03-31 17:49, Vladimir Kozlov wrote:
>>>>>> On 3/31/16 6:22 AM, Tobias Hartmann wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> please review the following patch:
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8074553
>>>>>>> http://cr.openjdk.java.net/~thartmann/8074553/webrev.00/
>>>>>>>
>>>>>>> While the code cache sweeper processes a nmethod in
>>>>>>> NMethodSweeper::process_nmethod(), safepoints may happen
>>>>>>> and the GC may unload the currently processed nmethod. To
>>>>>>> prevent this, the sweeper uses a NMethodMarker which saves
>>>>>>> the nmethod in CodeCacheSweeperThread::_scanned_nmethod.
>>>>>>> The nmethod is then passed to the GC through a
>>>>>>> CodeBlobClosure in CodeCacheSweeperThread::oops_do() to
>>>>>>> keep it alive when the GC iterates over all threads.
>>>>>>>
>>>>>>> The problem is that G1 calls nmethods_do() on all threads
>>>>>>> in the remark phase (see
>>>>>>> G1RemarkThreadsClosure::do_thread()) which is not
>>>>>>> overwritten by the sweeper thread. Since the currently
>>>>>>> processed nmethod is not passed through nmethods_do() by
>>>>>>> any thread, it is unloaded and we later hit the assert when
>>>>>>> encountering the nmethod through oops_do().
>>>>>>>
>>>>>>> Mikael Gerdin and Stefan Karlsson (thanks again!) suggested
>>>>>>> to overwrite nmethods_do() as well in
>>>>>>> CodeCacheSweeperThread and pass _scanned_nmethod to the
>>>>>>> closure. I also modified Threads::nmethods_do() to ignore
>>>>>>> the sweeper thread because we want to avoid marking the
>>>>>>> _scanned_nmethod as seen on the stack when scanning stacks
>>>>>>> from the sweeper (the nmethod may already be zombie and
>>>>>>> only referenced by the sweeper).
>>>>>>
>>>>>> I did not get this. If you exclude CodeCacheSweeperThread in
>>>>>> Threads::nmethods_do() then
>>>>>> CodeCacheSweeperThread::nmethods_do() will not be called.
>>>>>> What is the point?
>>>>>
>>>>> The GC code in question iterates over the threads and calls
>>>>> nmethods_do on each JavaThread and the VMThread after claiming
>>>>> them with atomic operations to achieve parallelism.
>>>>>
>>>>> There is still something a bit fishy here though.
>>>>> Thread::nmethods_do is not virtual, so one must be careful to
>>>>> downcast one's Thread* to JavaThread before calling it. And
>>>>> since Tobias' change does not make that or
>>>>> JavaThread::nmethods_do (which actually shadows and does not
>>>>> override the methods) virtual we can't reach the new code
>>>>> unless the GC code is changed to downcast to
>>>>> CodeCacheSweeperThread before calling nmethods_do.
>>>>
>>>> Thanks for taking a look. Of course, you are right. I just
>>>> assumed that Thread::nmethods_do() is virtual and missed that
>>>> it's not.
>>>>
>>>>> I still believe that what Tobias is attempting to do is a
>>>>> necessary fix but this only shows how hard this is to
>>>>> reproduce.
>>>>>
>>>>> Perhaps Thread::nmethods_do should simply be removed (along
>>>>> with any calls to it) and JavaThread::nmethods_do should then
>>>>> be made virtual.
>>>>
>>>> Yes, I agree. Here is the new webrev:
>>>> http://cr.openjdk.java.net/~thartmann/8074553/webrev.01/
>>>>
>>>> I removed the empty Thread::nmethods_do() and all the (static)
>>>> calls to it.
>>>
>>> I kind of agree with Vladimir that it's unfortunate to have to
>>> check for the sweeper thread in Threads::nmethods_do() but I don't
>>> have a good suggestion for an easy fix.
>>>
>>> From my point of view this fix is ready to go in but perhaps it
>>> would be a good idea to also think about if this could be cleaned
>>> up somehow.
>>>
>>> For me - as a GC person - it's hard to see why the sweeper thread
>>> should be a JavaThread at all, it seems like more of a VM-only
>>> thread to me.
>>
>> Yes, I agree that it is kind of counterintuitive that the sweeper
>> thread is a JavaThread. I think this is because before JDK-8046809
>> [1] sweeping was done by the CompilerThreads which are JavaThreads as
>> well.
>>
>> We could make the CodeCacheSweeperThread a subclass of NamedThread
>> similar to ConcurrentGCThread but the problem is that the GC calls
>> oops_do() only on the JavaThreads and the VMThread (see
>> Threads::oops_do() or ThreadRootsMarkingTask::do_it()). That code
>> would have to be modified to include the sweeper thread as well which
>> seems a bit hacky to me.
> 
> In a way I consider it less hacky to make it explicit that the sweeper thread is a special kind of thread and that due to its interaction with the code cache the GC may have to interact with it.
> 
> The more I look at this code the more strange stuff I seem to find.
> It appears, for example, that the VM thread and JavaThreads are the only threads who have their handle areas examined by the GC, even though other threads do have handle areas (and JNI handle areas) these are not seen by GC since it's only interested in JavaThreads and the VMThread.

Yes, this is indeed strange. Unfortunately, I'm not too familiar with this code.

I quickly tried to change the CodeCacheSweeperThread to be a NamedThread but unfortunately this affects many other components that rely on the sweeper thread being a JavaThread (for example, SA, GC and CompilerBroker). My prototype also triggers asserts like "GC active during NoGCVerifier" in the sweeper code. I would therefore prefer to not change the sweeper thread type with this fix. I filed JDK-8153271 to keep track of this.

>> What do you think?
> 
> Thinking out loud a bit it sounds like there is in fact some concept of threads which need to interact with GC, have handle areas, and so on and perhaps these threads should be based on a subclass of Thread which provides this interface, support for nmethod iteration and the "claim parity" hack which the GC uses to parallelize scanning of the threads.
> 
> Anyway, as I said I think your fix is good to go as-is but in general I think this code needs to be cleaned up :)

Thanks! I added a reference to this discussion to JDK-8153271.

Best regards,
Tobias

> 
> /Mikael
> 
>>
>> Thanks, Tobias
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8046809
>>
>>>
>>> /Mikael
>>>
>>>>
>>>> Thanks, Tobias
>>>>
>>>>>
>>>>> /Mikael
>>>>>
>>>>>>
>>>>>> Thanks, Vladimir
>>>>>>
>>>>>>>
>>>>>>> Unfortunately, this bug is extremely hard to reproduce (it
>>>>>>> showed up 18 times since early 2015). I was able to
>>>>>>> reproduce it only once after thousands of runs and
>>>>>>> therefore not often enough to verify the fix. However, I'm
>>>>>>> very confident that this solves the problem.
>>>>>>>
>>>>>>> Tested with JPRT and RBT (running).
>>>>>>>
>>>>>>> Thanks, Tobias
>>>>>>>


More information about the hotspot-dev mailing list