[9] RFR(S): 8074553: Crash with assert(!is_unloaded()) failed: should not call follow on unloaded nmethod

Mikael Gerdin mikael.gerdin at oracle.com
Fri Apr 1 13:49:15 UTC 2016


Hi,

On 2016-04-01 15:23, Tobias Hartmann wrote:
> Hi Mikael,
>
> On 01.04.2016 12:59, Mikael Gerdin wrote:
>> Tobias,
>>
>> On 2016-04-01 11:35, Tobias Hartmann wrote:
>>> Hi Mikael,
>>>
>>> On 01.04.2016 10:24, Mikael Gerdin wrote:
>>>> Hi Tobias,
>>>>
>>>> On 2016-04-01 09:28, Tobias Hartmann wrote:
>>>>> Hi Mikael,
>>>>>
>>>>> On 31.03.2016 18:13, Mikael Gerdin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2016-03-31 17:49, Vladimir Kozlov wrote:
>>>>>>> On 3/31/16 6:22 AM, Tobias Hartmann wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> please review the following patch:
>>>>>>>>
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8074553
>>>>>>>> http://cr.openjdk.java.net/~thartmann/8074553/webrev.00/
>>>>>>>>
>>>>>>>>
>>>>>>>>
While the code cache sweeper processes a nmethod in
>>>>>>>> NMethodSweeper::process_nmethod(), safepoints may
>>>>>>>> happen and the GC may unload the currently processed
>>>>>>>> nmethod. To prevent this, the sweeper uses a
>>>>>>>> NMethodMarker which saves the nmethod in
>>>>>>>> CodeCacheSweeperThread::_scanned_nmethod. The nmethod
>>>>>>>> is then passed to the GC through a CodeBlobClosure in
>>>>>>>> CodeCacheSweeperThread::oops_do() to keep it alive when
>>>>>>>> the GC iterates over all threads.
>>>>>>>>
>>>>>>>> The problem is that G1 calls nmethods_do() on all
>>>>>>>> threads in the remark phase (see
>>>>>>>> G1RemarkThreadsClosure::do_thread()) which is not
>>>>>>>> overwritten by the sweeper thread. Since the currently
>>>>>>>> processed nmethod is not passed through nmethods_do()
>>>>>>>> by any thread, it is unloaded and we later hit the
>>>>>>>> assert when encountering the nmethod through
>>>>>>>> oops_do().
>>>>>>>>
>>>>>>>> Mikael Gerdin and Stefan Karlsson (thanks again!)
>>>>>>>> suggested to overwrite nmethods_do() as well in
>>>>>>>> CodeCacheSweeperThread and pass _scanned_nmethod to
>>>>>>>> the closure. I also modified Threads::nmethods_do() to
>>>>>>>> ignore the sweeper thread because we want to avoid
>>>>>>>> marking the _scanned_nmethod as seen on the stack when
>>>>>>>> scanning stacks from the sweeper (the nmethod may
>>>>>>>> already be zombie and only referenced by the sweeper).
>>>>>>>
>>>>>>> I did not get this. If you exclude CodeCacheSweeperThread
>>>>>>> in Threads::nmethods_do() then
>>>>>>> CodeCacheSweeperThread::nmethods_do() will not be
>>>>>>> called. What is the point?
>>>>>>
>>>>>> The GC code in question iterates over the threads and
>>>>>> calls nmethods_do on each JavaThread and the VMThread after
>>>>>> claiming them with atomic operations to achieve
>>>>>> parallelism.
>>>>>>
>>>>>> There is still something a bit fishy here though.
>>>>>> Thread::nmethods_do is not virtual, so one must be careful
>>>>>> to downcast one's Thread* to JavaThread before calling it.
>>>>>> And since Tobias' change does not make that or
>>>>>> JavaThread::nmethods_do (which actually shadows and does
>>>>>> not override the methods) virtual we can't reach the new
>>>>>> code unless the GC code is changed to downcast to
>>>>>> CodeCacheSweeperThread before calling nmethods_do.
>>>>>
>>>>> Thanks for taking a look. Of course, you are right. I just
>>>>> assumed that Thread::nmethods_do() is virtual and missed
>>>>> that it's not.
>>>>>
>>>>>> I still believe that what Tobias is attempting to do is a
>>>>>> necessary fix but this only shows how hard this is to
>>>>>> reproduce.
>>>>>>
>>>>>> Perhaps Thread::nmethods_do should simply be removed
>>>>>> (along with any calls to it) and JavaThread::nmethods_do
>>>>>> should then be made virtual.
>>>>>
>>>>> Yes, I agree. Here is the new webrev:
>>>>> http://cr.openjdk.java.net/~thartmann/8074553/webrev.01/
>>>>>
>>>>> I removed the empty Thread::nmethods_do() and all the
>>>>> (static) calls to it.
>>>>
>>>> I kind of agree with Vladimir that it's unfortunate to have to
>>>> check for the sweeper thread in Threads::nmethods_do() but I
>>>> don't have a good suggestion for an easy fix.
>>>>
>>>> From my point of view this fix is ready to go in but perhaps
>>>> it would be a good idea to also think about if this could be
>>>> cleaned up somehow.
>>>>
>>>> For me - as a GC person - it's hard to see why the sweeper
>>>> thread should be a JavaThread at all, it seems like more of a
>>>> VM-only thread to me.
>>>
>>> Yes, I agree that it is kind of counterintuitive that the
>>> sweeper thread is a JavaThread. I think this is because before
>>> JDK-8046809 [1] sweeping was done by the CompilerThreads which
>>> are JavaThreads as well.
>>>
>>> We could make the CodeCacheSweeperThread a subclass of
>>> NamedThread similar to ConcurrentGCThread but the problem is that
>>> the GC calls oops_do() only on the JavaThreads and the VMThread
>>> (see Threads::oops_do() or ThreadRootsMarkingTask::do_it()). That
>>> code would have to be modified to include the sweeper thread as
>>> well which seems a bit hacky to me.
>>
>> In a way I consider it less hacky to make it explicit that the
>> sweeper thread is a special kind of thread and that due to its
>> interaction with the code cache the GC may have to interact with
>> it.
>>
>> The more I look at this code the more strange stuff I seem to
>> find. It appears, for example, that the VM thread and JavaThreads
>> are the only threads who have their handle areas examined by the
>> GC, even though other threads do have handle areas (and JNI handle
>> areas) these are not seen by GC since it's only interested in
>> JavaThreads and the VMThread.
>
> Yes, this is indeed strange. Unfortunately, I'm not too familiar with
> this code.
>
> I quickly tried to change the CodeCacheSweeperThread to be a
> NamedThread but unfortunately this affects many other components that
> rely on the sweeper thread being a JavaThread (for example, SA, GC
> and CompilerBroker). My prototype also triggers asserts like "GC
> active during NoGCVerifier" in the sweeper code. I would therefore
> prefer to not change the sweeper thread type with this fix. I filed
> JDK-8153271 to keep track of this.

Right, I didn't expect it to be an easy change. :)
Anyway, thanks for trying it out to see if it was a simple change.

>
>>> What do you think?
>>
>> Thinking out loud a bit it sounds like there is in fact some
>> concept of threads which need to interact with GC, have handle
>> areas, and so on and perhaps these threads should be based on a
>> subclass of Thread which provides this interface, support for
>> nmethod iteration and the "claim parity" hack which the GC uses to
>> parallelize scanning of the threads.
>>
>> Anyway, as I said I think your fix is good to go as-is but in
>> general I think this code needs to be cleaned up :)
>
> Thanks! I added a reference to this discussion to JDK-8153271.

Excellent.

/Mikael

>
> Best regards, Tobias
>
>>
>> /Mikael
>>
>>>
>>> Thanks, Tobias
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8046809
>>>
>>>>
>>>> /Mikael
>>>>
>>>>>
>>>>> Thanks, Tobias
>>>>>
>>>>>>
>>>>>> /Mikael
>>>>>>
>>>>>>>
>>>>>>> Thanks, Vladimir
>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately, this bug is extremely hard to reproduce
>>>>>>>> (it showed up 18 times since early 2015). I was able
>>>>>>>> to reproduce it only once after thousands of runs and
>>>>>>>> therefore not often enough to verify the fix. However,
>>>>>>>> I'm very confident that this solves the problem.
>>>>>>>>
>>>>>>>> Tested with JPRT and RBT (running).
>>>>>>>>
>>>>>>>> Thanks, Tobias
>>>>>>>>


More information about the hotspot-dev mailing list