8076110: VM crash when class is redefined with Instrumentation.redefineClasses
Andreas Eriksson
andreas.eriksson at oracle.com
Fri Jun 12 18:04:25 UTC 2015
Thanks for your help Coleen.
On 2015-06-12 17:18, Coleen Phillimore wrote:
>
> Looks good to me.
> Coleen
>
> On 6/12/15 11:13 AM, Andreas Eriksson wrote:
>> So then webrev.01 is good to go?
>> Serguei, any thoughts / objections?
>>
>> http://cr.openjdk.java.net/~aeriksso/8076110/webrev.01/
>>
>> Regards,
>> Andreas
>>
>> On 2015-06-12 14:21, Coleen Phillimore wrote:
>>>
>>> Thanks, Andreas. That's the one I didn't find. The InstanceKlass
>>> is something that can't be deallocated unless at a safepoint, so
>>> your change is safe if added to InstanceKlass::deallocate_contents()
>>> but not in ConstantPool::deallocate_contents().
>>> Also because of the reason that Serguei described below, we don't
>>> call ConstantPool::deallocate_contents() if the instanceKlass is_shared.
>>>
>>> Thanks for taking on this unfortunately complicated bug.
>>> Coleen
>>>
>>> On 6/12/15 7:49 AM, Andreas Eriksson wrote:
>>>>
>>>>
>>>> On 2015-06-12 00:36, serguei.spitsyn at oracle.com wrote:
>>>>> On 6/11/15 7:32 AM, Coleen Phillimore wrote:
>>>>>>
>>>>>>
>>>>>> On 6/11/15 7:20 AM, Andreas Eriksson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Deleting the resolution errors from InstanceKlass::deallocate
>>>>>>> contents works, but I'm unsure if the deletion code should be
>>>>>>> placed inside the is_shared() if-block or not.
>>>>>>> I don't think that CDS should affect our decision to delete the
>>>>>>> cached resolution errors, but I might be missing something.
>>>>>>> Coleen, do you know if it affects anything?
>>>>>>
>>>>>> The resolution errors aren't shared in the CDS archive so it's
>>>>>> fine to delete them outside the is_shared block. The other
>>>>>> things are in the archive so have to be excluded.
>>>>>>
>>>>>> I'm still trying to figure out why deleting them inside
>>>>>> ConstantPool::deallocate_contents() deleted them outside a
>>>>>> safepoint. I didn't find that code path.
>>>>>
>>>>> The ConstantPool::deallocate_contents() is called only from the
>>>>> free_metadata(ClassLoaderData* loader_data, T md).
>>>>>
>>>>> There are just two free_metadata() calls that are related to the
>>>>> ConstantPool::deallocate_contents():
>>>>> classfile/classLoaderData.cpp:
>>>>> MetadataFactory::free_metadata(this, (ConstantPool*)m);
>>>>> oops/instanceKlass.cpp:
>>>>> MetadataFactory::free_metadata(loader_data, constants());
>>>>>
>>>>
>>>> There is at least one more place, in the ClassFileParser
>>>> destructor, at classfile/classFileParser.cpp:4271.
>>>> // Destructor to clean up if there's an error
>>>> ClassFileParser::~ClassFileParser() {
>>>> MetadataFactory::free_metadata(_loader_data, _cp);
>>>> It seems to be called when the parsing has gone wrong, which can
>>>> happen when not at safepoint.
>>>> Attached a hs_err where this has happened.
>>>>
>>>> - Andreas
>>>>
>>>>> The first one is in the ClassLoaderData::free_deallocate_list()
>>>>> which has an assert:
>>>>> assert(SafepointSynchronize::is_at_safepoint(), "only called at
>>>>> safepoint");
>>>>>
>>>>> The second call is in the InstanceKlass::deallocate_contents()
>>>>> that is also called only
>>>>> from the ClassLoaderData::free_deallocate_list().
>>>>>
>>>>> So that even if there is such a code path then it is by a mistake
>>>>> that has to be fixed.
>>>>>
>>>>> The issue with having the call to delete_resolution_error() in the
>>>>> ConstantPool::deallocate_contents() is that
>>>>> it's call in the InstanceKlass::deallocate_contents( ) is conditional:
>>>>> 379 if (constants() != NULL) {
>>>>> 380 assert (!constants()->on_stack(), "shouldn't be called if anything is onstack");
>>>>> 381 if (!constants()->is_shared()) {
>>>>> 382 MetadataFactory::free_metadata(loader_data, constants());
>>>>> 383 }
>>>>> so that the delete_resolution_error() will not be called for
>>>>> constants()->is_shared() == true.
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>>> You wouldn't happen to still have an hs_err file from that
>>>>>> experiment?
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>>
>>>>>>> New webrev:
>>>>>>> http://cr.openjdk.java.net/~aeriksso/8076110/webrev.01/
>>>>>>>
>>>>>>> Regards,
>>>>>>> Andreas
>>>>>>>
>>>>>>> On 2015-06-09 01:49, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Hi Andreas,
>>>>>>>>
>>>>>>>> I agree, it is a good idea to delete the resolution errors from
>>>>>>>> ConstantPool::deallocate_contents.
>>>>>>>> So, I'm waiting for next webrev.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>> On 6/8/15 9:53 AM, Andreas Eriksson wrote:
>>>>>>>>> Hi Coleen, thanks for looking at this.
>>>>>>>>>
>>>>>>>>> That might work, I'll make a build to test it.
>>>>>>>>>
>>>>>>>>> Just FYI:
>>>>>>>>> My first attempt to fix this was actually to delete the
>>>>>>>>> resolution errors from ConstantPool::deallocate_contents, but
>>>>>>>>> that was sometimes called when not at safepoint, which made me
>>>>>>>>> do this change instead.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Andreas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2015-06-06 00:07, Coleen Phillimore wrote:
>>>>>>>>>>
>>>>>>>>>> Thank you so much for fixing this, Andreas!
>>>>>>>>>>
>>>>>>>>>> I was wondering if you could delete the resolution errors in
>>>>>>>>>> InstanceKlass::deallocate_contents instead at the point where
>>>>>>>>>> you free the constant pool? This code is at a safepoint at
>>>>>>>>>> this time, even with the CMS collector.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>> On 6/4/15 5:23 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>>> Added the SVC list as it impacts the Serviceability as well.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Serguei
>>>>>>>>>>>
>>>>>>>>>>> On 6/4/15 4:56 AM, Andreas Eriksson wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Please review this fix of a crash because of a class
>>>>>>>>>>>> redefinition race.
>>>>>>>>>>>>
>>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8076110
>>>>>>>>>>>> Webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~aeriksso/8076110/webrev.00/
>>>>>>>>>>>>
>>>>>>>>>>>> The crash happens when the redefine occurs while the
>>>>>>>>>>>> redefined method is still running.
>>>>>>>>>>>> This is because VM_RedefineClasses::redefine_single_class
>>>>>>>>>>>> deletes the resolution errors at redefine time, but the
>>>>>>>>>>>> running method might still need them.
>>>>>>>>>>>>
>>>>>>>>>>>> This is fixed by moving the deletion of the cached
>>>>>>>>>>>> resolution errors to when the {add,purge}_previous_version
>>>>>>>>>>>> functions have come to the conclusion that no running
>>>>>>>>>>>> methods need the old class any more (and therefore no one
>>>>>>>>>>>> needs the constant pool and its associated resolution
>>>>>>>>>>>> errors either).
>>>>>>>>>>>>
>>>>>>>>>>>> A jtreg regression test that reproduces the problem using
>>>>>>>>>>>> redefineclasses and objectweb.asm is included in the change.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Andreas
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list