8076110: VM crash when class is redefined with Instrumentation.redefineClasses
Andreas Eriksson
andreas.eriksson at oracle.com
Fri Jun 12 18:03:59 UTC 2015
On 2015-06-12 19:29, serguei.spitsyn at oracle.com wrote:
> On 6/12/15 4:49 AM, Andreas Eriksson wrote:
>>
>>
>> On 2015-06-12 00:36, serguei.spitsyn at oracle.com wrote:
>>> On 6/11/15 7:32 AM, Coleen Phillimore wrote:
>>>>
>>>>
>>>> On 6/11/15 7:20 AM, Andreas Eriksson wrote:
>>>>> Hi,
>>>>>
>>>>> Deleting the resolution errors from InstanceKlass::deallocate
>>>>> contents works, but I'm unsure if the deletion code should be
>>>>> placed inside the is_shared() if-block or not.
>>>>> I don't think that CDS should affect our decision to delete the
>>>>> cached resolution errors, but I might be missing something.
>>>>> Coleen, do you know if it affects anything?
>>>>
>>>> The resolution errors aren't shared in the CDS archive so it's fine
>>>> to delete them outside the is_shared block. The other things are
>>>> in the archive so have to be excluded.
>>>>
>>>> I'm still trying to figure out why deleting them inside
>>>> ConstantPool::deallocate_contents() deleted them outside a
>>>> safepoint. I didn't find that code path.
>>>
>>> The ConstantPool::deallocate_contents() is called only from the
>>> free_metadata(ClassLoaderData* loader_data, T md).
>>>
>>> There are just two free_metadata() calls that are related to the
>>> ConstantPool::deallocate_contents():
>>> classfile/classLoaderData.cpp:
>>> MetadataFactory::free_metadata(this, (ConstantPool*)m);
>>> oops/instanceKlass.cpp:
>>> MetadataFactory::free_metadata(loader_data, constants());
>>>
>>
>> There is at least one more place, in the ClassFileParser destructor,
>> at classfile/classFileParser.cpp:4271.
>> // Destructor to clean up if there's an error
>> ClassFileParser::~ClassFileParser() {
>> MetadataFactory::free_metadata(_loader_data, _cp);
>> It seems to be called when the parsing has gone wrong, which can
>> happen when not at safepoint.
>> Attached a hs_err where this has happened.
>
> Good finding!
>
> The ConstantPool::deallocate_contents() does not have a call to the
> SystemDictionary::delete_resolution_error() anymore.
> So, it looks like the crash in the attached hs_err has been already fixed.
>
That was a crash that was from one of my experimantal builds while
trying to fix this bug.
I attached the hs_err file just to show what code path was taken to
ConstantPool::deallocate_contents() when not at a safepoint.
Sorry if I was unclear on this.
Thanks,
Andreas
> Thanks,
> Serguei
>
>>
>> - Andreas
>>
>>> The first one is in the ClassLoaderData::free_deallocate_list()
>>> which has an assert:
>>> assert(SafepointSynchronize::is_at_safepoint(), "only called at
>>> safepoint");
>>>
>>> The second call is in the InstanceKlass::deallocate_contents() that
>>> is also called only
>>> from the ClassLoaderData::free_deallocate_list().
>>>
>>> So that even if there is such a code path then it is by a mistake
>>> that has to be fixed.
>>>
>>> The issue with having the call to delete_resolution_error() in the
>>> ConstantPool::deallocate_contents() is that
>>> it's call in the InstanceKlass::deallocate_contents( ) is conditional:
>>> 379 if (constants() != NULL) {
>>> 380 assert (!constants()->on_stack(), "shouldn't be called if anything is onstack");
>>> 381 if (!constants()->is_shared()) {
>>> 382 MetadataFactory::free_metadata(loader_data, constants());
>>> 383 }
>>> so that the delete_resolution_error() will not be called for
>>> constants()->is_shared() == true.
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>>> You wouldn't happen to still have an hs_err file from that experiment?
>>>>
>>>> Thanks,
>>>> Coleen
>>>>>
>>>>> New webrev:
>>>>> http://cr.openjdk.java.net/~aeriksso/8076110/webrev.01/
>>>>>
>>>>> Regards,
>>>>> Andreas
>>>>>
>>>>> On 2015-06-09 01:49, serguei.spitsyn at oracle.com wrote:
>>>>>> Hi Andreas,
>>>>>>
>>>>>> I agree, it is a good idea to delete the resolution errors from
>>>>>> ConstantPool::deallocate_contents.
>>>>>> So, I'm waiting for next webrev.
>>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>>
>>>>>> On 6/8/15 9:53 AM, Andreas Eriksson wrote:
>>>>>>> Hi Coleen, thanks for looking at this.
>>>>>>>
>>>>>>> That might work, I'll make a build to test it.
>>>>>>>
>>>>>>> Just FYI:
>>>>>>> My first attempt to fix this was actually to delete the
>>>>>>> resolution errors from ConstantPool::deallocate_contents, but
>>>>>>> that was sometimes called when not at safepoint, which made me
>>>>>>> do this change instead.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Andreas
>>>>>>>
>>>>>>>
>>>>>>> On 2015-06-06 00:07, Coleen Phillimore wrote:
>>>>>>>>
>>>>>>>> Thank you so much for fixing this, Andreas!
>>>>>>>>
>>>>>>>> I was wondering if you could delete the resolution errors in
>>>>>>>> InstanceKlass::deallocate_contents instead at the point where
>>>>>>>> you free the constant pool? This code is at a safepoint at this
>>>>>>>> time, even with the CMS collector.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>> On 6/4/15 5:23 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>>> Added the SVC list as it impacts the Serviceability as well.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Serguei
>>>>>>>>>
>>>>>>>>> On 6/4/15 4:56 AM, Andreas Eriksson wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Please review this fix of a crash because of a class
>>>>>>>>>> redefinition race.
>>>>>>>>>>
>>>>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8076110
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~aeriksso/8076110/webrev.00/
>>>>>>>>>>
>>>>>>>>>> The crash happens when the redefine occurs while the
>>>>>>>>>> redefined method is still running.
>>>>>>>>>> This is because VM_RedefineClasses::redefine_single_class
>>>>>>>>>> deletes the resolution errors at redefine time, but the
>>>>>>>>>> running method might still need them.
>>>>>>>>>>
>>>>>>>>>> This is fixed by moving the deletion of the cached resolution
>>>>>>>>>> errors to when the {add,purge}_previous_version functions
>>>>>>>>>> have come to the conclusion that no running methods need the
>>>>>>>>>> old class any more (and therefore no one needs the constant
>>>>>>>>>> pool and its associated resolution errors either).
>>>>>>>>>>
>>>>>>>>>> A jtreg regression test that reproduces the problem using
>>>>>>>>>> redefineclasses and objectweb.asm is included in the change.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Andreas
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list