RFR: 8213486: SIGSEGV in CompiledMethod::cleanup_inline_caches_impl with AOT

Wed Nov 21 20:16:14 UTC 2018

Hi Vladimir,

Thanks for the review.

/Erik

On 2018-11-21 19:33, Vladimir Kozlov wrote:
> Sounds good.
> 
> Vladimir
> 
> On 11/21/18 10:21 AM, Erik Osterlund wrote:
>> Hi Vladimir,
>>
>> Thank you for the background. It sounds like we agree about the solution.
>>
>> Note though as I replied to Dean that we do need to call do_unloading 
>> on all AOT methods seen in iterators to clean their IC caches 
>> referring to unloading nmethods. Seems like the runtime stub AOT 
>> methods are hidden from iterators (according to an assert I added), 
>> and hence go under the radar due to not having IC caches.
>>
>> But still the solution is to just return false for is_unloading() on 
>> AOT methods and move the state fiddling to nmethod.
>>
>> /Erik
>>
>>> On 21 Nov 2018, at 18:49, Vladimir Kozlov 
>>> <vladimir.kozlov at oracle.com> wrote:
>>>
>>>  From the start AOT methods were never scanned for unloading because 
>>> they don't have embedded oops.
>>> We don't need to explicitly "unload" aot methods to avoid GC to scan 
>>> them - they don't have oops. Oops and metadata referenced by aot 
>>> methods are kept in separate arrays. Scanning aot oops and metadata 
>>> arrays are done by separate code.
>>> At least this is what I remember. If you look on original aot 
>>> unloading code - it did nothing:
>>> http://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/hotspot/share/aot/aotCompiledMethod.cpp#l78 
>>>
>>>
>>> Based on that I think it is correct to move unloading logic to 
>>> nmethod as you suggested.
>>>
>>> I still can't remember how we proved that should not be any 
>>> "activations"/call to such aot methods if they are not marked as 
>>> 'unloaded'.
>>>
>>> Vladimir
>>>
>>>> On 11/21/18 8:32 AM, dean.long at oracle.com wrote:
>>>> I think the problem is CompiledMethodIterator vs NMethodIterator. 
>>>> CompiledMethodIterator will visit AOT methods, but NMethodIterator 
>>>> will not.  As long as AOT methods can't be unloaded, it makes sense 
>>>> to continue to use NMethodIterator and move unloading logic from 
>>>> CompiledMethod to nmethod.
>>>> dl
>>>>> On 11/21/18 6:26 AM, Erik Österlund wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> Thank you for the explanation.
>>>>>
>>>>> I inserted some debugging code to be able to draw better conclusions.
>>>>> I put in a bool that tags CompiledMethods that are visited during 
>>>>> code cache unloading.
>>>>>
>>>>> When I crash, I find the AOTCompiledMethod in the following state:
>>>>>
>>>>> * The CodeCache::_unloading_cycle is 2, meaning there has been at 
>>>>> least one unloading that occurred since the VM started.
>>>>> * My new boolean saying if it was found during code cache walking 
>>>>> said false; it has never been observed by GC code cache unloading.
>>>>> * The _is_unloading_state of the AOT method is 2, meaning it was 
>>>>> from CodeCache::_unloading_cycle number 1. So there has been at 
>>>>> least one code cache unloading cycle since the AOT method was created.
>>>>> * It is in the CompiledMethod state in_use.
>>>>> * The corresponding CodeToAMethod entry has _state equal to in_use, 
>>>>> and the _aot pointer points right back to the AOT method as expected.
>>>>>
>>>>> So the conclusions I can draw from this is that we are looking at 
>>>>> an AOTCompiledMethod that is in_use, alive, and published, but was 
>>>>> created at least one unloading cycle earlier, yet was *not* found 
>>>>> by the evident code cache unloading that has occurred between 
>>>>> creating the AOT method and the crash. In fact it has not been 
>>>>> found in any code cache unloading event at all.
>>>>>
>>>>> So I am certain about the problem being that the AOT method is not 
>>>>> found during our code cache walk, and that the solution to that 
>>>>> problem is to move the epoch business to nmethod instead. However, 
>>>>> it would be desirable to know exactly why the AOT method isn't 
>>>>> showing up in the code cache walk, but I'm afraid I can't see right 
>>>>> now exactly why it has been missed out, only that it definitely has 
>>>>> been missed out. :/
>>>>>
>>>>> /Erik
>>>>>
>>>>>> On 2018-11-20 19:08, Vladimir Kozlov wrote:
>>>>>> To clarify.
>>>>>>
>>>>>> 'invalid' state is set only for AOT methods which never was 
>>>>>> registers (their state is 'not_set' and corresponding class was 
>>>>>> not initialized yet). Such methods are never visible to iterators 
>>>>>> and IC caches.
>>>>>>
>>>>>> If aot method is registered its state become 'in_use' and in such 
>>>>>> case it is treated and processed as normal nmethod. And 
>>>>>> AOTCompiledMethod structure is created only for such case.
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>>> On 11/20/18 3:12 AM, Erik Österlund wrote:
>>>>>>> Hi Dean,
>>>>>>>
>>>>>>> Yeah I think I misunderstood what I observed. So what I know is 
>>>>>>> that in the code cache unloading, I'm not getting all is_alive() 
>>>>>>> AOT methods into my iterator, which messes up the assumptions 
>>>>>>> made by the epoch based scheme for AOT methods.
>>>>>>>
>>>>>>> I noticed that in AOTCodeHeap::sweep_dependent_methods(int* 
>>>>>>> indexes, int methods_cnt) we make AOTCompiledMethods "invalid" in 
>>>>>>> the AOT heap, making it no longer observable from the iterators. 
>>>>>>> Then it calls the VM_Deoptimize vm operation after. Throughout 
>>>>>>> all this, the AOTCompiledMethod is alive(), yet when the 
>>>>>>> iterators ask for all is_alive() AOTCompiledMethods, it won't be 
>>>>>>> visible. But I suppose IC caches may still reference these 
>>>>>>> methods and check if it is_unloading, and then we blow up. There 
>>>>>>> may possibly be multiple ways for is_alive() AOTCompiledMethods 
>>>>>>> to not be visible from iterators yet be visible through IC caches 
>>>>>>> using the "invalid" state in the .
>>>>>>>
>>>>>>> Anyway, the fix is the same: stop doing the epoch state thingey 
>>>>>>> for is_unloading() on AOTCompiledMethod where it isn't needed, 
>>>>>>> and doesn't seem to play well with the rather different life 
>>>>>>> cycle it has, and just return false instead.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> /Erik
>>>>>>>
>>>>>>>> On 2018-11-20 00:00, dean.long at oracle.com wrote:
>>>>>>>> Hi Erik,
>>>>>>>>
>>>>>>>>> On 11/19/18 12:42 PM, Erik Österlund wrote:
>>>>>>>>> ...except it looks like for AOTCompiledMethods when running 
>>>>>>>>> with tiered compilation, may first be is_alive(), then become 
>>>>>>>>> !is_alive() for a while, and then get resurrected to is_alive() 
>>>>>>>>> using make_entrant().
>>>>>>>>
>>>>>>>> this doesn't sounds quite right.  AOTCompiledMethods aren't 
>>>>>>>> allowed to transition to zombie (!alive), only not_used or 
>>>>>>>> not_entrant, which should still have is_alive() returning true. 
>>>>>>>> Maybe some code is using is_not_entrant() instead of !is_alive()?
>>>>>>>>
>>>>>>>> dl
>>>>>>>
>>>>>
>>