RFR: 8213486: SIGSEGV in CompiledMethod::cleanup_inline_caches_impl with AOT
Erik Österlund
erik.osterlund at oracle.com
Wed Nov 21 20:16:14 UTC 2018
Hi Vladimir,
Thanks for the review.
/Erik
On 2018-11-21 19:33, Vladimir Kozlov wrote:
> Sounds good.
>
> Vladimir
>
> On 11/21/18 10:21 AM, Erik Osterlund wrote:
>> Hi Vladimir,
>>
>> Thank you for the background. It sounds like we agree about the solution.
>>
>> Note though as I replied to Dean that we do need to call do_unloading
>> on all AOT methods seen in iterators to clean their IC caches
>> referring to unloading nmethods. Seems like the runtime stub AOT
>> methods are hidden from iterators (according to an assert I added),
>> and hence go under the radar due to not having IC caches.
>>
>> But still the solution is to just return false for is_unloading() on
>> AOT methods and move the state fiddling to nmethod.
>>
>> /Erik
>>
>>> On 21 Nov 2018, at 18:49, Vladimir Kozlov
>>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>> From the start AOT methods were never scanned for unloading because
>>> they don't have embedded oops.
>>> We don't need to explicitly "unload" aot methods to avoid GC to scan
>>> them - they don't have oops. Oops and metadata referenced by aot
>>> methods are kept in separate arrays. Scanning aot oops and metadata
>>> arrays are done by separate code.
>>> At least this is what I remember. If you look on original aot
>>> unloading code - it did nothing:
>>> http://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/hotspot/share/aot/aotCompiledMethod.cpp#l78
>>>
>>>
>>> Based on that I think it is correct to move unloading logic to
>>> nmethod as you suggested.
>>>
>>> I still can't remember how we proved that should not be any
>>> "activations"/call to such aot methods if they are not marked as
>>> 'unloaded'.
>>>
>>> Vladimir
>>>
>>>> On 11/21/18 8:32 AM, dean.long at oracle.com wrote:
>>>> I think the problem is CompiledMethodIterator vs NMethodIterator.
>>>> CompiledMethodIterator will visit AOT methods, but NMethodIterator
>>>> will not. As long as AOT methods can't be unloaded, it makes sense
>>>> to continue to use NMethodIterator and move unloading logic from
>>>> CompiledMethod to nmethod.
>>>> dl
>>>>> On 11/21/18 6:26 AM, Erik Österlund wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Thank you for the explanation.
>>>>>
>>>>> I inserted some debugging code to be able to draw better conclusions.
>>>>> I put in a bool that tags CompiledMethods that are visited during
>>>>> code cache unloading.
>>>>>
>>>>> When I crash, I find the AOTCompiledMethod in the following state:
>>>>>
>>>>> * The CodeCache::_unloading_cycle is 2, meaning there has been at
>>>>> least one unloading that occurred since the VM started.
>>>>> * My new boolean saying if it was found during code cache walking
>>>>> said false; it has never been observed by GC code cache unloading.
>>>>> * The _is_unloading_state of the AOT method is 2, meaning it was
>>>>> from CodeCache::_unloading_cycle number 1. So there has been at
>>>>> least one code cache unloading cycle since the AOT method was created.
>>>>> * It is in the CompiledMethod state in_use.
>>>>> * The corresponding CodeToAMethod entry has _state equal to in_use,
>>>>> and the _aot pointer points right back to the AOT method as expected.
>>>>>
>>>>> So the conclusions I can draw from this is that we are looking at
>>>>> an AOTCompiledMethod that is in_use, alive, and published, but was
>>>>> created at least one unloading cycle earlier, yet was *not* found
>>>>> by the evident code cache unloading that has occurred between
>>>>> creating the AOT method and the crash. In fact it has not been
>>>>> found in any code cache unloading event at all.
>>>>>
>>>>> So I am certain about the problem being that the AOT method is not
>>>>> found during our code cache walk, and that the solution to that
>>>>> problem is to move the epoch business to nmethod instead. However,
>>>>> it would be desirable to know exactly why the AOT method isn't
>>>>> showing up in the code cache walk, but I'm afraid I can't see right
>>>>> now exactly why it has been missed out, only that it definitely has
>>>>> been missed out. :/
>>>>>
>>>>> /Erik
>>>>>
>>>>>> On 2018-11-20 19:08, Vladimir Kozlov wrote:
>>>>>> To clarify.
>>>>>>
>>>>>> 'invalid' state is set only for AOT methods which never was
>>>>>> registers (their state is 'not_set' and corresponding class was
>>>>>> not initialized yet). Such methods are never visible to iterators
>>>>>> and IC caches.
>>>>>>
>>>>>> If aot method is registered its state become 'in_use' and in such
>>>>>> case it is treated and processed as normal nmethod. And
>>>>>> AOTCompiledMethod structure is created only for such case.
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>>> On 11/20/18 3:12 AM, Erik Österlund wrote:
>>>>>>> Hi Dean,
>>>>>>>
>>>>>>> Yeah I think I misunderstood what I observed. So what I know is
>>>>>>> that in the code cache unloading, I'm not getting all is_alive()
>>>>>>> AOT methods into my iterator, which messes up the assumptions
>>>>>>> made by the epoch based scheme for AOT methods.
>>>>>>>
>>>>>>> I noticed that in AOTCodeHeap::sweep_dependent_methods(int*
>>>>>>> indexes, int methods_cnt) we make AOTCompiledMethods "invalid" in
>>>>>>> the AOT heap, making it no longer observable from the iterators.
>>>>>>> Then it calls the VM_Deoptimize vm operation after. Throughout
>>>>>>> all this, the AOTCompiledMethod is alive(), yet when the
>>>>>>> iterators ask for all is_alive() AOTCompiledMethods, it won't be
>>>>>>> visible. But I suppose IC caches may still reference these
>>>>>>> methods and check if it is_unloading, and then we blow up. There
>>>>>>> may possibly be multiple ways for is_alive() AOTCompiledMethods
>>>>>>> to not be visible from iterators yet be visible through IC caches
>>>>>>> using the "invalid" state in the .
>>>>>>>
>>>>>>> Anyway, the fix is the same: stop doing the epoch state thingey
>>>>>>> for is_unloading() on AOTCompiledMethod where it isn't needed,
>>>>>>> and doesn't seem to play well with the rather different life
>>>>>>> cycle it has, and just return false instead.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> /Erik
>>>>>>>
>>>>>>>> On 2018-11-20 00:00, dean.long at oracle.com wrote:
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>>> On 11/19/18 12:42 PM, Erik Österlund wrote:
>>>>>>>>> ...except it looks like for AOTCompiledMethods when running
>>>>>>>>> with tiered compilation, may first be is_alive(), then become
>>>>>>>>> !is_alive() for a while, and then get resurrected to is_alive()
>>>>>>>>> using make_entrant().
>>>>>>>>
>>>>>>>> this doesn't sounds quite right. AOTCompiledMethods aren't
>>>>>>>> allowed to transition to zombie (!alive), only not_used or
>>>>>>>> not_entrant, which should still have is_alive() returning true.
>>>>>>>> Maybe some code is using is_not_entrant() instead of !is_alive()?
>>>>>>>>
>>>>>>>> dl
>>>>>>>
>>>>>
>>
More information about the hotspot-dev
mailing list