RFR(XS) 8213250 CDS archive creation aborts due to metaspace object allocation failure

Ioi Lam ioi.lam at oracle.com
Sun Nov 4 22:29:11 UTC 2018



On 11/4/18 12:12 PM, Jiangli Zhou wrote:
> On 11/3/18 3:07 PM, Ioi Lam wrote:
>
>>
>>
>> On 11/2/18 10:42 PM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>>
>>> On 11/2/18 5:27 PM, Ioi Lam wrote:
>>>> Hi Jiangli,
>>>>
>>>> Thanks for the review
>>>>
>>>>
>>>> On 11/2/18 1:24 PM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>> - src/hotspot/share/classfile/verifier.cpp
>>>>>
>>>>> You moved 
>>>>> SystemDictionaryShared::finalize_verification_constraints(_klass) 
>>>>> to ClassVerifier::verify_class(). We should be very careful in 
>>>>> this area. Have you tested with cases where classes fail 
>>>>> verification?
>>>>
>>>> void ClassVerifier::verify_class(TRAPS) {
>>>>   log_info(verification)("Verifying class %s with new format", 
>>>> _klass->external_name());
>>>>
>>>>   Array<Method*>* methods = _klass->methods();
>>>>   int num_methods = methods->length();
>>>>
>>>>   for (int index = 0; index < num_methods; index++) {
>>>>     // Check for recursive re-verification before each method.
>>>>     if (was_recursively_verified())  return;
>>>>
>>>>     Method* m = methods->at(index);
>>>>     if (m->is_native() || m->is_abstract() || m->is_overpass()) {
>>>>       // If m is native or abstract, skip it.  It is checked in 
>>>> class file
>>>>       // parser that methods do not override a final method. 
>>>> Overpass methods
>>>>       // are trusted since the VM generates them.
>>>>       continue;
>>>>     }
>>>>     verify_method(methodHandle(THREAD, m), CHECK_VERIFY(this));
>>>>   }
>>>>
>>>>   if (DumpSharedSpaces) {
>>>> SystemDictionaryShared::finalize_verification_constraints(_klass);
>>>>   }
>>>>
>>>> If verification failed, the CHECK_VERIFY macro will force the 
>>>> function to return. So we will call 
>>>> finalize_verification_constraints only when verification has 
>>>> succeeded.
>>>
>>> Ok, thanks for checking.
>>>
>>> An alternative is to allocate from the RO space directly after 
>>> relocation. That way you can avoid adding another CDS specific in 
>>> the verifier code. There is also a FIXME for using the symbol offset 
>>> in SharedDictionaryEntry::finalize_verification_constraints, which 
>>> you could handle together.
>>
>> I'll leave this patch simple so it can easily be backported to JDK 11.
>>
>> I'll do what you suggested and deal with the FIXME in a separate RFE 
>> (https://bugs.openjdk.java.net/browse/JDK-8213346).
>
> Follow up with a separate RFE sounds good to me. For backporting to 
> JDK 11, the risk of the fix seems to be higher than the benefit 
> (considering the nature of the issue). Maybe we should have the proper 
> fix backported instead (if backporting is required)?
>

With JDK 11, if you have more than a few thousand classes, -Xshare:dump 
will randomly fail. The workaround is to explicitly set 
-XX:MetaspaceSize=128m.

I think we should fix it in JDK 11. The fix in the current RFR is pretty 
harmless. It just moves the allocation of the _verifier_constraints earlier.

Thanks
- Ioi


>>
>>>>
>>>>
>>>>> I haven't ran your code yet, we should avoid copying the 
>>>>> constraints for classes that fails verification at dump time.
>>>>
>>>> If a class has failed verification, the class will not be archived. 
>>>> So whether we have called finalize_verification_constraints on it 
>>>> really doesn't matter.
>>>
>>> Since it doesn't do the copying when verification fails, it is okay 
>>> in this case. With the work for dynamic archiving, we want to avoid 
>>> unnecessary work at dump time, since performance/footprint will 
>>> become higher priority with archiving at runtime.
>>>>
>>>>> I also need to think through any potential impact on dynamic 
>>>>> archiving (possibly none).
>>>>>
>>>>> - src/hotspot/share/memory/metaspace.cpp
>>>>>
>>>>> -    if (is_init_completed() && !(DumpSharedSpaces && 
>>>>> THREAD->is_VM_thread())) {
>>>>> +    if (is_init_completed()) {
>>>>>
>>>>> Please also add an assert to make sure we don't allocate metaspace 
>>>>> objects from the VM_thread at dump time now.
>>>>>
>>>>
>>>> I think it's better to add a blanket 
>>>> assert(!THREAD->is_VM_thread()), because no one should be 
>>>> allocating metaspace objects inside VM thread.
>>>>
>>>> Adding this has a wider impact, so I will do it in a separate RFE 
>>>> (https://bugs.openjdk.java.net/browse/JDK-8213331), in case we need 
>>>> to back it out.
>>>
>>> I'd suggest including the assert for dump time with the fix so we 
>>> are sure it is safe. Then you can follow up with the broader assert 
>>> that you are planning for JDK-8213331.
>>>
>>
>> I see. I've added
>>
>> MetaWord* Metaspace::allocate(ClassLoaderData* loader_data, size_t 
>> word_size,
>>                               MetaspaceObj::Type type, TRAPS) {
>>   assert(!_frozen, "sanity");
>> + assert(!(DumpSharedSpaces && THREAD->is_VM_thread()), "sanity");
>>
>> I'll just remove the (DumpSharedSpaces &&) in JDK-8213331.
>
> Thumbs up.
>
> Thanks,
> Jiangli
>
>>
>> Thanks
>> - Ioi
>>
>>> Thanks,
>>> Jiangli
>>>>
>>>> Thanks
>>>> - Ioi
>>>>> Thanks,
>>>>>
>>>>> Jiangli
>>>>>
>>>>>
>>>>> On 11/1/18 9:47 PM, Ioi Lam wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've revised the webrev to get to the the root of the problem -- 
>>>>>> we shouldn't allocate metaspace objects from within the VM Thread!
>>>>>>
>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v02/ 
>>>>>>
>>>>>>
>>>>>> The original bug that caused the "!(DumpSharedSpaces && 
>>>>>> THREAD->is_VM_thread())" check to be added in 
>>>>>> Metaspace::allocate() was
>>>>>>
>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8196626
>>>>>>     UseAppCDS.java crashes with "VM thread using lock Heap_lock 
>>>>>> (not allowed to block on)"
>>>>>>
>>>>>> The fix in JDK-8196626 is just a band-aid solution -- avoiding 
>>>>>> the GC. The proper fix is not to do any allocation inside the VM 
>>>>>> thread. So instead of calling finalize_verification_constraints, 
>>>>>> which allocates MetaspaceObjs, very late in the CDS dumping 
>>>>>> process, now we call be as soon as a class has been verified.
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/1/18 11:07 AM, Ioi Lam wrote:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213250
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v01/ 
>>>>>>>
>>>>>>>
>>>>>>> When Metaspace::allocate() fails, usually we first try to GC to 
>>>>>>> see if
>>>>>>> some MetaspaceObjs can be freed. This is an optimization to 
>>>>>>> avoid prematurely
>>>>>>> expanding the metaspace.
>>>>>>>
>>>>>>> However, GC cannot run in the last phase of CDS which runs 
>>>>>>> inside the VM thread.
>>>>>>> The old code just aborts the CDS dump. The fix is to 
>>>>>>> unconditionally expand the
>>>>>>> metaspace without doing a GC.
>>>>>>>
>>>>>>> (During GC dump time, no classes are unloaded, so even if we 
>>>>>>> force a GC, it would
>>>>>>> not free up any metaspace anyway).
>>>>>>>
>>>>>>> Testing:
>>>>>>>
>>>>>>> Before the fix, the test case (jruby) would fail to dump once a 
>>>>>>> couple of times. Now
>>>>>>> it has dumped hundreds of times without failure.
>>>>>>>
>>>>>>> I am also running hs-tier 1/2
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-dev mailing list