8076110: VM crash when class is redefined with Instrumentation.redefineClasses

serguei.spitsyn at oracle.com serguei.spitsyn at oracle.com
Fri Jun 12 15:47:24 UTC 2015


It looks good.
Thank you for taking care about this issue!

Thanks,
-Serguei

On 6/12/15 8:13 AM, Andreas Eriksson wrote:
> So then webrev.01 is good to go?
> Serguei, any thoughts / objections?
>
> http://cr.openjdk.java.net/~aeriksso/8076110/webrev.01/
>
> Regards,
> Andreas
>
> On 2015-06-12 14:21, Coleen Phillimore wrote:
>>
>> Thanks, Andreas.  That's the one I didn't find.   The InstanceKlass 
>> is something that can't be deallocated unless at a safepoint, so your 
>> change is safe if added to InstanceKlass::deallocate_contents() but 
>> not in ConstantPool::deallocate_contents().
>> Also because of the reason that Serguei described below, we don't 
>> call ConstantPool::deallocate_contents() if the instanceKlass is_shared.
>>
>> Thanks for taking on this unfortunately complicated bug.
>> Coleen
>>
>> On 6/12/15 7:49 AM, Andreas Eriksson wrote:
>>>
>>>
>>> On 2015-06-12 00:36, serguei.spitsyn at oracle.com wrote:
>>>> On 6/11/15 7:32 AM, Coleen Phillimore wrote:
>>>>>
>>>>>
>>>>> On 6/11/15 7:20 AM, Andreas Eriksson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Deleting the resolution errors from InstanceKlass::deallocate 
>>>>>> contents works, but I'm unsure if the deletion code should be 
>>>>>> placed inside the is_shared() if-block or not.
>>>>>> I don't think that CDS should affect our decision to delete the 
>>>>>> cached resolution errors, but I might be missing something.
>>>>>> Coleen, do you know if it affects anything?
>>>>>
>>>>> The resolution errors aren't shared in the CDS archive so it's 
>>>>> fine to delete them outside the is_shared block.  The other things 
>>>>> are in the archive so have to be excluded.
>>>>>
>>>>> I'm still trying to figure out why deleting them inside 
>>>>> ConstantPool::deallocate_contents() deleted them outside a 
>>>>> safepoint.  I didn't find that code path.
>>>>
>>>> The ConstantPool::deallocate_contents() is called only from the 
>>>> free_metadata(ClassLoaderData* loader_data, T md).
>>>>
>>>> There are just two free_metadata() calls that are related to the 
>>>> ConstantPool::deallocate_contents():
>>>>     classfile/classLoaderData.cpp: 
>>>> MetadataFactory::free_metadata(this, (ConstantPool*)m);
>>>>     oops/instanceKlass.cpp: 
>>>> MetadataFactory::free_metadata(loader_data, constants());
>>>>
>>>
>>> There is at least one more place, in the ClassFileParser destructor, 
>>> at classfile/classFileParser.cpp:4271.
>>> // Destructor to clean up if there's an error
>>> ClassFileParser::~ClassFileParser() {
>>>    MetadataFactory::free_metadata(_loader_data, _cp);
>>> It seems to be called when the parsing has gone wrong, which can 
>>> happen when not at safepoint.
>>> Attached a hs_err where this has happened.
>>>
>>> - Andreas
>>>
>>>> The first one is in the ClassLoaderData::free_deallocate_list() 
>>>> which has an assert:
>>>>   assert(SafepointSynchronize::is_at_safepoint(), "only called at 
>>>> safepoint");
>>>>
>>>> The second call is in the InstanceKlass::deallocate_contents() that 
>>>> is also called only
>>>> from the ClassLoaderData::free_deallocate_list().
>>>>
>>>> So that even if there is such a code path then it is by a mistake 
>>>> that has to be fixed.
>>>>
>>>> The issue with having the call to delete_resolution_error() in the 
>>>> ConstantPool::deallocate_contents() is that
>>>> it's call in the InstanceKlass::deallocate_contents( ) is conditional:
>>>>   379   if (constants() != NULL) {
>>>>   380     assert (!constants()->on_stack(), "shouldn't be called if anything is onstack");
>>>>   381     if (!constants()->is_shared()) {
>>>>   382       MetadataFactory::free_metadata(loader_data, constants());
>>>>   383     }
>>>> so that the delete_resolution_error() will not be called for 
>>>> constants()->is_shared() == true.
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>>>   You wouldn't happen to still have an hs_err file from that 
>>>>> experiment?
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>>
>>>>>> New webrev:
>>>>>> http://cr.openjdk.java.net/~aeriksso/8076110/webrev.01/
>>>>>>
>>>>>> Regards,
>>>>>> Andreas
>>>>>>
>>>>>> On 2015-06-09 01:49, serguei.spitsyn at oracle.com wrote:
>>>>>>> Hi Andreas,
>>>>>>>
>>>>>>> I agree, it is a good idea to delete the resolution errors from 
>>>>>>> ConstantPool::deallocate_contents.
>>>>>>> So, I'm waiting for next webrev.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Serguei
>>>>>>>
>>>>>>> On 6/8/15 9:53 AM, Andreas Eriksson wrote:
>>>>>>>> Hi Coleen, thanks for looking at this.
>>>>>>>>
>>>>>>>> That might work, I'll make a build to test it.
>>>>>>>>
>>>>>>>> Just FYI:
>>>>>>>> My first attempt to fix this was actually to delete the 
>>>>>>>> resolution errors from ConstantPool::deallocate_contents, but 
>>>>>>>> that was sometimes called when not at safepoint, which made me 
>>>>>>>> do this change instead.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Andreas
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2015-06-06 00:07, Coleen Phillimore wrote:
>>>>>>>>>
>>>>>>>>> Thank you so much for fixing this, Andreas!
>>>>>>>>>
>>>>>>>>> I was wondering if you could delete the resolution errors in 
>>>>>>>>> InstanceKlass::deallocate_contents instead at the point where 
>>>>>>>>> you free the constant pool?   This code is at a safepoint at 
>>>>>>>>> this time, even with the CMS collector.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>> On 6/4/15 5:23 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>>>> Added the SVC list as it impacts the Serviceability as well.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Serguei
>>>>>>>>>>
>>>>>>>>>> On 6/4/15 4:56 AM, Andreas Eriksson wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Please review this fix of a crash because of a class 
>>>>>>>>>>> redefinition race.
>>>>>>>>>>>
>>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8076110
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~aeriksso/8076110/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>> The crash happens when the redefine occurs while the 
>>>>>>>>>>> redefined method is still running.
>>>>>>>>>>> This is because VM_RedefineClasses::redefine_single_class 
>>>>>>>>>>> deletes the resolution errors at redefine time, but the 
>>>>>>>>>>> running method might still need them.
>>>>>>>>>>>
>>>>>>>>>>> This is fixed by moving the deletion of the cached 
>>>>>>>>>>> resolution errors to when the {add,purge}_previous_version 
>>>>>>>>>>> functions have come to the conclusion that no running 
>>>>>>>>>>> methods need the old class any more (and therefore no one 
>>>>>>>>>>> needs the constant pool and its associated resolution errors 
>>>>>>>>>>> either).
>>>>>>>>>>>
>>>>>>>>>>> A jtreg regression test that reproduces the problem using 
>>>>>>>>>>> redefineclasses and objectweb.asm is included in the change.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Andreas
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list