RFR 8028497: SIGSEGV at ClassLoaderData::oops_do(OopClosure*, KlassClosure*, bool)

Erik Helin erik.helin at oracle.com
Thu Mar 27 17:11:49 UTC 2014


Hi Coleen,

thanks for taking on this tricky bug! However, I think there is an issue 
with your fix :(

This case is for "normal" (non-CDS) class loading:
1. We create the mirror
2. We set up the link from the mirror to the Klass
3. We fail somewhere before setting the link from the Klass to the
    mirror, for example when allocating the Java object for the init
    lock.
4. The destructor for ClassFileParser is run and we add the Klass to the
    deallocate list.
5. A CMS concurrent cycle is started, it will treat all objects in eden
    + the survivor area as roots (including the mirror).
6. The above concurrent cycle enters the "final remark" phase and does
    class unloading which will call free_deallocate_list() which will
    return the memory for the Klass added to the deallocate list in step
    4 to Metaspace.
7. The concurrent CMS cycle ends (the mirror is still in eden or young)
8. A new concurrent CMS cycle is started, the mirror will once again be
    treated as a root since it is still in eden or young (no YC has been
    done).
9. During the initial marking, CMS will now follow the pointer to the
    Klass that the mirror is mirroring, but we returned this memory to
    Metaspace in step 6.
10. We now crash :(

Unfortunately, I don't have any suggestion on how to solve the above 
scenario at the moment.

Thanks,
Erik

On 2014-03-27 16:04, Coleen Phillimore wrote:
>
> On 3/27/14 8:39 AM, Stefan Karlsson wrote:
>>
>> On 2014-03-27 03:00, Coleen Phillimore wrote:
>>>
>>> I have a new version of this change based on code that Ioi noticed
>>> that I was missing for making restoring the method's adapters
>>> restartable if something fails during restore_unshareable_info, and
>>> the constant pool resolved references array.
>>> Please review this new code (only method.* and instanceKlass.cpp and
>>> constantPool.cpp have differences).  Reran tests with CDS and CMS.
>>>
>>> http://cr.openjdk.java.net/~coleenp/8028497_2/
>>
>> This seems good. I'd prefer if someone with more experience with CDS
>> also Reviews the change.
>
> Thanks, Ioi said he'd look at it.
>
>>
>> Small comment:
>>
>> http://cr.openjdk.java.net/~coleenp/8028497_2/src/share/vm/oops/instanceKlass.cpp.udiff.html
>>
>>
>> -    methodHandle m(THREAD, methods->at(index2));
>> -    m()->link_method(m, CHECK);
>> -    // restore method's vtable by calling a virtual function
>> -    m->restore_vtable();
>> +    Method *m = methods->at(index2);
>> +    m->restore_unshareable_info(CHECK);
>>
>>
>> Do we really want to use a raw Method * here. I know that we wrap the
>> method in a handle inside  m->restore_unshareable_info, but it's not
>> clear from the this context. I'm thinking about this issue:
>> https://bugs.openjdk.java.net/browse/JDK-8004815
>>
>> On the other hand, I guess we can't redefine a class before it has
>> been defined. But maybe it would be good to not promote the usage of
>> Method* over calls that might hit a Safepoint.
>>
>
> I think you're right, we don't use 'm' after the potential safepoint and
> we can't really redefine this class yet anyway but I'll restore the code
> because we don't want to encourage uses of unhandled Methods unless
> proven to be harmless.
>
> Thanks!
> Coleen
>> thanks,
>> StefanK
>>
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 3/26/14 5:15 PM, Coleen Phillimore wrote:
>>>>
>>>> Hi Jon,
>>>> Thank you for looking at the code changes.
>>>>
>>>> On 3/26/14 12:24 PM, Jon Masamitsu wrote:
>>>>> Coleen,
>>>>>
>>>>> Did you remove Class_obj_allocate() because it was only
>>>>> called in that one spot and  use obj_allocate() did most
>>>>> of the work?
>>>>
>>>> I deleted it because GC doesn't need to know how these javaClasses
>>>> are set up and I spent way to long looking for this code and didn't
>>>> find it where it belongs, so I removed this. This Class_obj_allocate
>>>> is an artifact of the permgen code.
>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~coleenp/8028497/src/share/vm/oops/instanceMirrorKlass.cpp.frames.html
>>>>>
>>>>>
>>>>>  373   // the size in the mirror size itself.
>>>>>
>>>>> Drop the second "size".  You know how we GC guys are
>>>>> getting with our comments :-).
>>>>
>>>> Yes, I noticed that.  Thanks - fixed.
>>>>>
>>>>> http://cr.openjdk.java.net/~coleenp/8028497/src/share/vm/oops/klass.cpp.frames.html
>>>>>
>>>>>
>>>>>  498 void Klass::restore_unshareable_info(TRAPS) {
>>>>>  499   // If an exception happened during CDS restore, some of
>>>>> these fields may already be
>>>>>  500   // set.  We leave the class on the CLD list, even if
>>>>> incomplete so that we don't
>>>>>  501   // modify the CLD list outside a safepoint.
>>>>>  502   if (class_loader_data() == NULL) {
>>>>>  503     ClassLoaderData* loader_data =
>>>>> ClassLoaderData::the_null_class_loader_data();
>>>>>
>>>>> I can see that this works with the current CDS (sharing only
>>>>> classes loader
>>>>> with the NULL class loader).  Will it work if the shared classes
>>>>> are loaded
>>>>> with an application class loader?
>>>>
>>>> I hope Ioi can answer that.  I assume so because the classes don't
>>>> change class loaders if restoring shareable info fails.
>>>>
>>>> Ioi pointed out to me (not on open list) that I missed making
>>>> link_method() and creating the constant pool resolved references
>>>> array restartable, so I'm testing a fix for that also and I'll post
>>>> an update later.   But you've seen most of it.
>>>>
>>>> Thanks!!
>>>> Coleen
>>>>>
>>>>> Rest looks good.
>>>>>
>>>>> Jon
>>>>>
>>>>> On 3/25/2014 1:42 PM, Coleen Phillimore wrote:
>>>>>> Summary: Keep class in CLD::_klasses list and mirror created for
>>>>>> CDS classes if OOM during restore_shareable_info(). This keeps
>>>>>> pointers consistent for CMS.
>>>>>>
>>>>>> This change makes restoring the mirror for shared classes
>>>>>> restartable if there is an OOM in the middle of
>>>>>> restore_unshareable_info.   The mirror field in the classes and
>>>>>> CLD link is not cleaned up anymore.  The class isn't marked loaded
>>>>>> though so must be loaded again.
>>>>>>
>>>>>> Tested with Stefan's test case with hard coded OOM in jvm, so
>>>>>> can't add the test.   Also ran all the tests against -client and
>>>>>> server with -Xshare:auto and with -XX:+UseConcMarkSweepGC.
>>>>>>
>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8028497/
>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8028497
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>
>>>>
>>>
>>
>


More information about the hotspot-dev mailing list