RFR 8028497: SIGSEGV at ClassLoaderData::oops_do(OopClosure*, KlassClosure*, bool)
Coleen Phillimore
coleen.phillimore at oracle.com
Tue Apr 8 17:58:10 UTC 2014
Thank you, Erik!
Coleen
On 4/8/14, 11:43 AM, Erik Helin wrote:
> Coleen,
>
> the change http://cr.openjdk.java.net/~coleenp/8028497_4/ looks good.
>
> Thanks for taking care of this patch!
> Erik
>
> On 2014-04-04 17:28, Coleen Phillimore wrote:
>>
>> Erik,
>> Thank you for the discussion and thorough code reviews. See below.
>> (other reviewers: there's a new version of webrev, read down a little).
>>
>> On 4/1/14 4:32 AM, Erik Helin wrote:
>>> On 2014-04-01 05:30, Coleen Phillimore wrote:
>>>>
>>>> Erik,
>>>>
>>>> You are right. The old code would use the Klass->mirror link to clear
>>>> out the mirror->Klass link so that we didn't have a pointer to a
>>>> deallocated Klass in a mirror that might still be live for CMS
>>>> marking.
>>>> The new code doesn't set the Klass->mirror link unless the mirror
>>>> fields
>>>> are completely allocated, so we could crash if we deallocated a Klass
>>>> before the mirror because the mirror would still be pointing to the
>>>> class.
>>>>
>>>> I've caught failed allocation in mirror fields so that the link is
>>>> reset
>>>> in this change. The only change is to the function create_mirror
>>>> calls
>>>> initialize_mirror_fields.
>>>>
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8028497_3/
>>>
>>> In javaClasses.cpp:
>>> void java_lang_Class::create_mirror(KlassHandle k, Handle, TRAPS) {
>>> ...
>>> initialize_mirror_fields(k, mirror, protection_domain, THREAD);
>>> if (HAS_PENDING_EXCEPTION) {
>>> // If any of the fields throws an exception like OOM remove the
>>> // klass field from the mirror so GC doesn't follow it after the
>>> // klass has been deallocated.
>>> java_lang_Class::set_klass(mirror(), k());
>>> return; ^
>>> } |
>>> |
>>> +------------------------------------------+
>>> |
>>> +-- Do you really want to use k() here? I thought the idea was to do:
>>> java_lang_Class::set_klass(mirror(), NULL);
>>>
>>
>> I had meant to change this to NULL.
>>> Setting the Klass that the mirror is mirroring will hit the following
>>> check in instanceMirrorKlass.cpp, oop_oop_iterate macro:
>>>
>>> int InstanceMirrorKlass::oop_oop_iterate##nv_suffix(...)
>>> { \
>>> ... \
>>> if_do_metadata_checked(closure, nv_suffix)
>>> { \
>>> Klass* klass =
>>> java_lang_Class::as_Klass(obj); \
>>> /* We'll get NULL for primitive mirrors.
>>> */ \
>>> if (klass != NULL)
>>> { \
>>> closure->do_klass##nv_suffix(klass); \
>>> } \
>>> }
>>>
>>> Setting the Klass that the mirror is mirroring to NULL will make the
>>> mirror look like a primitive mirror (this is the way we encode
>>> primitive mirrors). Will that be a problem?
>>>
>>
>> No, this is fine. I added a comment that this mirror isn't a primitive
>> type but it's also not representative of a class. I had internal code
>> to generate lots of these mirrors and CMS and G1 work fine because they
>> don't follow the klass. We don't want them to. We also clean up these
>> orphaned mirrors, which I also verified manually.
>>
>> I also tested heap walking with temporary code to create lots of
>> orphaned mirrors. I tried to create an internal test or a white box
>> test but wasn't able to generate the right set of circumstances to
>> exercise this code. If necessary, we can open another bug to add a
>> test but I found it noreg-hard.
>>
>> This is also tested with Stefan's TreeMapTest.java test case that also
>> required temporary code in the jvm.
>>
>> So yes, I think we want to be able to set the Klass field in the mirror
>> to null. If other bugs fall out because of this in the future, I think
>> we should fix the other bugs separately.
>>
>> New webrev with only this classfile/javaClasses.cpp change is here.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8028497_4/
>>
>> Thanks!
>> Coleen
>>> Thanks,
>>> Erik
>>>
>>>> Tested with vm.parallel_class_loading.testlist which do hit this case.
>>>>
>>>> Thanks!
>>>> Coleen
>>>>
>>>>
>>>> On 3/27/14 1:11 PM, Erik Helin wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> thanks for taking on this tricky bug! However, I think there is an
>>>>> issue with your fix :(
>>>>>
>>>>> This case is for "normal" (non-CDS) class loading:
>>>>> 1. We create the mirror
>>>>> 2. We set up the link from the mirror to the Klass
>>>>> 3. We fail somewhere before setting the link from the Klass to the
>>>>> mirror, for example when allocating the Java object for the init
>>>>> lock.
>>>>> 4. The destructor for ClassFileParser is run and we add the Klass to
>>>>> the
>>>>> deallocate list.
>>>>> 5. A CMS concurrent cycle is started, it will treat all objects in
>>>>> eden
>>>>> + the survivor area as roots (including the mirror).
>>>>> 6. The above concurrent cycle enters the "final remark" phase and
>>>>> does
>>>>> class unloading which will call free_deallocate_list() which will
>>>>> return the memory for the Klass added to the deallocate list in
>>>>> step
>>>>> 4 to Metaspace.
>>>>> 7. The concurrent CMS cycle ends (the mirror is still in eden or
>>>>> young)
>>>>> 8. A new concurrent CMS cycle is started, the mirror will once
>>>>> again be
>>>>> treated as a root since it is still in eden or young (no YC has
>>>>> been
>>>>> done).
>>>>> 9. During the initial marking, CMS will now follow the pointer to the
>>>>> Klass that the mirror is mirroring, but we returned this memory to
>>>>> Metaspace in step 6.
>>>>> 10. We now crash :(
>>>>>
>>>>> Unfortunately, I don't have any suggestion on how to solve the above
>>>>> scenario at the moment.
>>>>>
>>>>> Thanks,
>>>>> Erik
>>>>>
>>>>> On 2014-03-27 16:04, Coleen Phillimore wrote:
>>>>>>
>>>>>> On 3/27/14 8:39 AM, Stefan Karlsson wrote:
>>>>>>>
>>>>>>> On 2014-03-27 03:00, Coleen Phillimore wrote:
>>>>>>>>
>>>>>>>> I have a new version of this change based on code that Ioi noticed
>>>>>>>> that I was missing for making restoring the method's adapters
>>>>>>>> restartable if something fails during restore_unshareable_info,
>>>>>>>> and
>>>>>>>> the constant pool resolved references array.
>>>>>>>> Please review this new code (only method.* and
>>>>>>>> instanceKlass.cpp and
>>>>>>>> constantPool.cpp have differences). Reran tests with CDS and CMS.
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~coleenp/8028497_2/
>>>>>>>
>>>>>>> This seems good. I'd prefer if someone with more experience with
>>>>>>> CDS
>>>>>>> also Reviews the change.
>>>>>>
>>>>>> Thanks, Ioi said he'd look at it.
>>>>>>
>>>>>>>
>>>>>>> Small comment:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~coleenp/8028497_2/src/share/vm/oops/instanceKlass.cpp.udiff.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - methodHandle m(THREAD, methods->at(index2));
>>>>>>> - m()->link_method(m, CHECK);
>>>>>>> - // restore method's vtable by calling a virtual function
>>>>>>> - m->restore_vtable();
>>>>>>> + Method *m = methods->at(index2);
>>>>>>> + m->restore_unshareable_info(CHECK);
>>>>>>>
>>>>>>>
>>>>>>> Do we really want to use a raw Method * here. I know that we
>>>>>>> wrap the
>>>>>>> method in a handle inside m->restore_unshareable_info, but it's not
>>>>>>> clear from the this context. I'm thinking about this issue:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8004815
>>>>>>>
>>>>>>> On the other hand, I guess we can't redefine a class before it has
>>>>>>> been defined. But maybe it would be good to not promote the
>>>>>>> usage of
>>>>>>> Method* over calls that might hit a Safepoint.
>>>>>>>
>>>>>>
>>>>>> I think you're right, we don't use 'm' after the potential
>>>>>> safepoint and
>>>>>> we can't really redefine this class yet anyway but I'll restore the
>>>>>> code
>>>>>> because we don't want to encourage uses of unhandled Methods unless
>>>>>> proven to be harmless.
>>>>>>
>>>>>> Thanks!
>>>>>> Coleen
>>>>>>> thanks,
>>>>>>> StefanK
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> On 3/26/14 5:15 PM, Coleen Phillimore wrote:
>>>>>>>>>
>>>>>>>>> Hi Jon,
>>>>>>>>> Thank you for looking at the code changes.
>>>>>>>>>
>>>>>>>>> On 3/26/14 12:24 PM, Jon Masamitsu wrote:
>>>>>>>>>> Coleen,
>>>>>>>>>>
>>>>>>>>>> Did you remove Class_obj_allocate() because it was only
>>>>>>>>>> called in that one spot and use obj_allocate() did most
>>>>>>>>>> of the work?
>>>>>>>>>
>>>>>>>>> I deleted it because GC doesn't need to know how these
>>>>>>>>> javaClasses
>>>>>>>>> are set up and I spent way to long looking for this code and
>>>>>>>>> didn't
>>>>>>>>> find it where it belongs, so I removed this. This
>>>>>>>>> Class_obj_allocate
>>>>>>>>> is an artifact of the permgen code.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8028497/src/share/vm/oops/instanceMirrorKlass.cpp.frames.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 373 // the size in the mirror size itself.
>>>>>>>>>>
>>>>>>>>>> Drop the second "size". You know how we GC guys are
>>>>>>>>>> getting with our comments :-).
>>>>>>>>>
>>>>>>>>> Yes, I noticed that. Thanks - fixed.
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8028497/src/share/vm/oops/klass.cpp.frames.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 498 void Klass::restore_unshareable_info(TRAPS) {
>>>>>>>>>> 499 // If an exception happened during CDS restore, some of
>>>>>>>>>> these fields may already be
>>>>>>>>>> 500 // set. We leave the class on the CLD list, even if
>>>>>>>>>> incomplete so that we don't
>>>>>>>>>> 501 // modify the CLD list outside a safepoint.
>>>>>>>>>> 502 if (class_loader_data() == NULL) {
>>>>>>>>>> 503 ClassLoaderData* loader_data =
>>>>>>>>>> ClassLoaderData::the_null_class_loader_data();
>>>>>>>>>>
>>>>>>>>>> I can see that this works with the current CDS (sharing only
>>>>>>>>>> classes loader
>>>>>>>>>> with the NULL class loader). Will it work if the shared classes
>>>>>>>>>> are loaded
>>>>>>>>>> with an application class loader?
>>>>>>>>>
>>>>>>>>> I hope Ioi can answer that. I assume so because the classes
>>>>>>>>> don't
>>>>>>>>> change class loaders if restoring shareable info fails.
>>>>>>>>>
>>>>>>>>> Ioi pointed out to me (not on open list) that I missed making
>>>>>>>>> link_method() and creating the constant pool resolved references
>>>>>>>>> array restartable, so I'm testing a fix for that also and I'll
>>>>>>>>> post
>>>>>>>>> an update later. But you've seen most of it.
>>>>>>>>>
>>>>>>>>> Thanks!!
>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>> Rest looks good.
>>>>>>>>>>
>>>>>>>>>> Jon
>>>>>>>>>>
>>>>>>>>>> On 3/25/2014 1:42 PM, Coleen Phillimore wrote:
>>>>>>>>>>> Summary: Keep class in CLD::_klasses list and mirror created
>>>>>>>>>>> for
>>>>>>>>>>> CDS classes if OOM during restore_shareable_info(). This keeps
>>>>>>>>>>> pointers consistent for CMS.
>>>>>>>>>>>
>>>>>>>>>>> This change makes restoring the mirror for shared classes
>>>>>>>>>>> restartable if there is an OOM in the middle of
>>>>>>>>>>> restore_unshareable_info. The mirror field in the classes and
>>>>>>>>>>> CLD link is not cleaned up anymore. The class isn't marked
>>>>>>>>>>> loaded
>>>>>>>>>>> though so must be loaded again.
>>>>>>>>>>>
>>>>>>>>>>> Tested with Stefan's test case with hard coded OOM in jvm, so
>>>>>>>>>>> can't add the test. Also ran all the tests against -client
>>>>>>>>>>> and
>>>>>>>>>>> server with -Xshare:auto and with -XX:+UseConcMarkSweepGC.
>>>>>>>>>>>
>>>>>>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8028497/
>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8028497
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
More information about the hotspot-dev
mailing list