RFR(XS) 8213250 CDS archive creation aborts due to metaspace object allocation failure

Jiangli Zhou jiangli.zhou at oracle.com
Sun Nov 4 20:12:31 UTC 2018


On 11/3/18 3:07 PM, Ioi Lam wrote:

>
>
> On 11/2/18 10:42 PM, Jiangli Zhou wrote:
>> Hi Ioi,
>>
>>
>> On 11/2/18 5:27 PM, Ioi Lam wrote:
>>> Hi Jiangli,
>>>
>>> Thanks for the review
>>>
>>>
>>> On 11/2/18 1:24 PM, Jiangli Zhou wrote:
>>>> Hi Ioi,
>>>>
>>>> - src/hotspot/share/classfile/verifier.cpp
>>>>
>>>> You moved 
>>>> SystemDictionaryShared::finalize_verification_constraints(_klass) 
>>>> to ClassVerifier::verify_class(). We should be very careful in this 
>>>> area. Have you tested with cases where classes fail verification?
>>>
>>> void ClassVerifier::verify_class(TRAPS) {
>>>   log_info(verification)("Verifying class %s with new format", 
>>> _klass->external_name());
>>>
>>>   Array<Method*>* methods = _klass->methods();
>>>   int num_methods = methods->length();
>>>
>>>   for (int index = 0; index < num_methods; index++) {
>>>     // Check for recursive re-verification before each method.
>>>     if (was_recursively_verified())  return;
>>>
>>>     Method* m = methods->at(index);
>>>     if (m->is_native() || m->is_abstract() || m->is_overpass()) {
>>>       // If m is native or abstract, skip it.  It is checked in 
>>> class file
>>>       // parser that methods do not override a final method. 
>>> Overpass methods
>>>       // are trusted since the VM generates them.
>>>       continue;
>>>     }
>>>     verify_method(methodHandle(THREAD, m), CHECK_VERIFY(this));
>>>   }
>>>
>>>   if (DumpSharedSpaces) {
>>> SystemDictionaryShared::finalize_verification_constraints(_klass);
>>>   }
>>>
>>> If verification failed, the CHECK_VERIFY macro will force the 
>>> function to return. So we will call 
>>> finalize_verification_constraints only when verification has succeeded.
>>
>> Ok, thanks for checking.
>>
>> An alternative is to allocate from the RO space directly after 
>> relocation. That way you can avoid adding another CDS specific in the 
>> verifier code. There is also a FIXME for using the symbol offset in 
>> SharedDictionaryEntry::finalize_verification_constraints, which you 
>> could handle together.
>
> I'll leave this patch simple so it can easily be backported to JDK 11.
>
> I'll do what you suggested and deal with the FIXME in a separate RFE 
> (https://bugs.openjdk.java.net/browse/JDK-8213346).

Follow up with a separate RFE sounds good to me. For backporting to JDK 
11, the risk of the fix seems to be higher than the benefit (considering 
the nature of the issue). Maybe we should have the proper fix backported 
instead (if backporting is required)?

>
>>>
>>>
>>>> I haven't ran your code yet, we should avoid copying the 
>>>> constraints for classes that fails verification at dump time.
>>>
>>> If a class has failed verification, the class will not be archived. 
>>> So whether we have called finalize_verification_constraints on it 
>>> really doesn't matter.
>>
>> Since it doesn't do the copying when verification fails, it is okay 
>> in this case. With the work for dynamic archiving, we want to avoid 
>> unnecessary work at dump time, since performance/footprint will 
>> become higher priority with archiving at runtime.
>>>
>>>> I also need to think through any potential impact on dynamic 
>>>> archiving (possibly none).
>>>>
>>>> - src/hotspot/share/memory/metaspace.cpp
>>>>
>>>> -    if (is_init_completed() && !(DumpSharedSpaces && 
>>>> THREAD->is_VM_thread())) {
>>>> +    if (is_init_completed()) {
>>>>
>>>> Please also add an assert to make sure we don't allocate metaspace 
>>>> objects from the VM_thread at dump time now.
>>>>
>>>
>>> I think it's better to add a blanket 
>>> assert(!THREAD->is_VM_thread()), because no one should be allocating 
>>> metaspace objects inside VM thread.
>>>
>>> Adding this has a wider impact, so I will do it in a separate RFE 
>>> (https://bugs.openjdk.java.net/browse/JDK-8213331), in case we need 
>>> to back it out.
>>
>> I'd suggest including the assert for dump time with the fix so we are 
>> sure it is safe. Then you can follow up with the broader assert that 
>> you are planning for JDK-8213331.
>>
>
> I see. I've added
>
> MetaWord* Metaspace::allocate(ClassLoaderData* loader_data, size_t 
> word_size,
>                               MetaspaceObj::Type type, TRAPS) {
>   assert(!_frozen, "sanity");
> + assert(!(DumpSharedSpaces && THREAD->is_VM_thread()), "sanity");
>
> I'll just remove the (DumpSharedSpaces &&) in JDK-8213331.

Thumbs up.

Thanks,
Jiangli

>
> Thanks
> - Ioi
>
>> Thanks,
>> Jiangli
>>>
>>> Thanks
>>> - Ioi
>>>> Thanks,
>>>>
>>>> Jiangli
>>>>
>>>>
>>>> On 11/1/18 9:47 PM, Ioi Lam wrote:
>>>>> Hi,
>>>>>
>>>>> I've revised the webrev to get to the the root of the problem -- 
>>>>> we shouldn't allocate metaspace objects from within the VM Thread!
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v02/ 
>>>>>
>>>>>
>>>>> The original bug that caused the "!(DumpSharedSpaces && 
>>>>> THREAD->is_VM_thread())" check to be added in 
>>>>> Metaspace::allocate() was
>>>>>
>>>>>     https://bugs.openjdk.java.net/browse/JDK-8196626
>>>>>     UseAppCDS.java crashes with "VM thread using lock Heap_lock 
>>>>> (not allowed to block on)"
>>>>>
>>>>> The fix in JDK-8196626 is just a band-aid solution -- avoiding the 
>>>>> GC. The proper fix is not to do any allocation inside the VM 
>>>>> thread. So instead of calling finalize_verification_constraints, 
>>>>> which allocates MetaspaceObjs, very late in the CDS dumping 
>>>>> process, now we call be as soon as a class has been verified.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 11/1/18 11:07 AM, Ioi Lam wrote:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213250
>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v01/ 
>>>>>>
>>>>>>
>>>>>> When Metaspace::allocate() fails, usually we first try to GC to 
>>>>>> see if
>>>>>> some MetaspaceObjs can be freed. This is an optimization to avoid 
>>>>>> prematurely
>>>>>> expanding the metaspace.
>>>>>>
>>>>>> However, GC cannot run in the last phase of CDS which runs inside 
>>>>>> the VM thread.
>>>>>> The old code just aborts the CDS dump. The fix is to 
>>>>>> unconditionally expand the
>>>>>> metaspace without doing a GC.
>>>>>>
>>>>>> (During GC dump time, no classes are unloaded, so even if we 
>>>>>> force a GC, it would
>>>>>> not free up any metaspace anyway).
>>>>>>
>>>>>> Testing:
>>>>>>
>>>>>> Before the fix, the test case (jruby) would fail to dump once a 
>>>>>> couple of times. Now
>>>>>> it has dumped hundreds of times without failure.
>>>>>>
>>>>>> I am also running hs-tier 1/2
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-dev mailing list