8076110: VM crash when class is redefined with Instrumentation.redefineClasses

Andreas Eriksson andreas.eriksson at oracle.com
Fri Jun 12 15:13:19 UTC 2015


So then webrev.01 is good to go?
Serguei, any thoughts / objections?

http://cr.openjdk.java.net/~aeriksso/8076110/webrev.01/

Regards,
Andreas

On 2015-06-12 14:21, Coleen Phillimore wrote:
>
> Thanks, Andreas.  That's the one I didn't find.   The InstanceKlass is 
> something that can't be deallocated unless at a safepoint, so your 
> change is safe if added to InstanceKlass::deallocate_contents() but 
> not in ConstantPool::deallocate_contents().
> Also because of the reason that Serguei described below, we don't call 
> ConstantPool::deallocate_contents() if the instanceKlass is_shared.
>
> Thanks for taking on this unfortunately complicated bug.
> Coleen
>
> On 6/12/15 7:49 AM, Andreas Eriksson wrote:
>>
>>
>> On 2015-06-12 00:36, serguei.spitsyn at oracle.com wrote:
>>> On 6/11/15 7:32 AM, Coleen Phillimore wrote:
>>>>
>>>>
>>>> On 6/11/15 7:20 AM, Andreas Eriksson wrote:
>>>>> Hi,
>>>>>
>>>>> Deleting the resolution errors from InstanceKlass::deallocate 
>>>>> contents works, but I'm unsure if the deletion code should be 
>>>>> placed inside the is_shared() if-block or not.
>>>>> I don't think that CDS should affect our decision to delete the 
>>>>> cached resolution errors, but I might be missing something.
>>>>> Coleen, do you know if it affects anything?
>>>>
>>>> The resolution errors aren't shared in the CDS archive so it's fine 
>>>> to delete them outside the is_shared block.  The other things are 
>>>> in the archive so have to be excluded.
>>>>
>>>> I'm still trying to figure out why deleting them inside 
>>>> ConstantPool::deallocate_contents() deleted them outside a 
>>>> safepoint.  I didn't find that code path.
>>>
>>> The ConstantPool::deallocate_contents() is called only from the 
>>> free_metadata(ClassLoaderData* loader_data, T md).
>>>
>>> There are just two free_metadata() calls that are related to the 
>>> ConstantPool::deallocate_contents():
>>>     classfile/classLoaderData.cpp: 
>>> MetadataFactory::free_metadata(this, (ConstantPool*)m);
>>>     oops/instanceKlass.cpp: 
>>> MetadataFactory::free_metadata(loader_data, constants());
>>>
>>
>> There is at least one more place, in the ClassFileParser destructor, 
>> at classfile/classFileParser.cpp:4271.
>> // Destructor to clean up if there's an error
>> ClassFileParser::~ClassFileParser() {
>>    MetadataFactory::free_metadata(_loader_data, _cp);
>> It seems to be called when the parsing has gone wrong, which can 
>> happen when not at safepoint.
>> Attached a hs_err where this has happened.
>>
>> - Andreas
>>
>>> The first one is in the ClassLoaderData::free_deallocate_list() 
>>> which has an assert:
>>>   assert(SafepointSynchronize::is_at_safepoint(), "only called at 
>>> safepoint");
>>>
>>> The second call is in the InstanceKlass::deallocate_contents() that 
>>> is also called only
>>> from the ClassLoaderData::free_deallocate_list().
>>>
>>> So that even if there is such a code path then it is by a mistake 
>>> that has to be fixed.
>>>
>>> The issue with having the call to delete_resolution_error() in the 
>>> ConstantPool::deallocate_contents() is that
>>> it's call in the InstanceKlass::deallocate_contents( ) is conditional:
>>>   379   if (constants() != NULL) {
>>>   380     assert (!constants()->on_stack(), "shouldn't be called if anything is onstack");
>>>   381     if (!constants()->is_shared()) {
>>>   382       MetadataFactory::free_metadata(loader_data, constants());
>>>   383     }
>>> so that the delete_resolution_error() will not be called for 
>>> constants()->is_shared() == true.
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>>> You wouldn't happen to still have an hs_err file from that experiment?
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>
>>>>> New webrev:
>>>>> http://cr.openjdk.java.net/~aeriksso/8076110/webrev.01/
>>>>>
>>>>> Regards,
>>>>> Andreas
>>>>>
>>>>> On 2015-06-09 01:49, serguei.spitsyn at oracle.com wrote:
>>>>>> Hi Andreas,
>>>>>>
>>>>>> I agree, it is a good idea to delete the resolution errors from 
>>>>>> ConstantPool::deallocate_contents.
>>>>>> So, I'm waiting for next webrev.
>>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>>
>>>>>> On 6/8/15 9:53 AM, Andreas Eriksson wrote:
>>>>>>> Hi Coleen, thanks for looking at this.
>>>>>>>
>>>>>>> That might work, I'll make a build to test it.
>>>>>>>
>>>>>>> Just FYI:
>>>>>>> My first attempt to fix this was actually to delete the 
>>>>>>> resolution errors from ConstantPool::deallocate_contents, but 
>>>>>>> that was sometimes called when not at safepoint, which made me 
>>>>>>> do this change instead.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Andreas
>>>>>>>
>>>>>>>
>>>>>>> On 2015-06-06 00:07, Coleen Phillimore wrote:
>>>>>>>>
>>>>>>>> Thank you so much for fixing this, Andreas!
>>>>>>>>
>>>>>>>> I was wondering if you could delete the resolution errors in 
>>>>>>>> InstanceKlass::deallocate_contents instead at the point where 
>>>>>>>> you free the constant pool? This code is at a safepoint at this 
>>>>>>>> time, even with the CMS collector.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> On 6/4/15 5:23 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>>> Added the SVC list as it impacts the Serviceability as well.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>> On 6/4/15 4:56 AM, Andreas Eriksson wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Please review this fix of a crash because of a class 
>>>>>>>>>> redefinition race.
>>>>>>>>>>
>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8076110
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~aeriksso/8076110/webrev.00/
>>>>>>>>>>
>>>>>>>>>> The crash happens when the redefine occurs while the 
>>>>>>>>>> redefined method is still running.
>>>>>>>>>> This is because VM_RedefineClasses::redefine_single_class 
>>>>>>>>>> deletes the resolution errors at redefine time, but the 
>>>>>>>>>> running method might still need them.
>>>>>>>>>>
>>>>>>>>>> This is fixed by moving the deletion of the cached resolution 
>>>>>>>>>> errors to when the {add,purge}_previous_version functions 
>>>>>>>>>> have come to the conclusion that no running methods need the 
>>>>>>>>>> old class any more (and therefore no one needs the constant 
>>>>>>>>>> pool and its associated resolution errors either).
>>>>>>>>>>
>>>>>>>>>> A jtreg regression test that reproduces the problem using 
>>>>>>>>>> redefineclasses and objectweb.asm is included in the change.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Andreas
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-runtime-dev mailing list