RFR(XS) 8213250 CDS archive creation aborts due to metaspace object allocation failure

Ioi Lam ioi.lam at oracle.com
Wed Nov 7 22:34:34 UTC 2018


Hi Jiangli,

To address your concern about changing the verifier, I have updated the 
patch to remove the change in verifier.cpp. Instead, I just moved the 
finalize_verification_constraints() call slightly ahead (to immediately 
before we enter the VM thread).

http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v03/

Note that I have to use a different way of iterating over all classes in 
SystemDictionaryShared::finalize_verification_constraints_for, because 
at this point we haven't combined all the shared classes into a single 
directory yet.

Thanks

-Ioi



On 11/4/2018 2:29 PM, Ioi Lam wrote:
>
>
> On 11/4/18 12:12 PM, Jiangli Zhou wrote:
>> On 11/3/18 3:07 PM, Ioi Lam wrote:
>>
>>>
>>>
>>> On 11/2/18 10:42 PM, Jiangli Zhou wrote:
>>>> Hi Ioi,
>>>>
>>>>
>>>> On 11/2/18 5:27 PM, Ioi Lam wrote:
>>>>> Hi Jiangli,
>>>>>
>>>>> Thanks for the review
>>>>>
>>>>>
>>>>> On 11/2/18 1:24 PM, Jiangli Zhou wrote:
>>>>>> Hi Ioi,
>>>>>>
>>>>>> - src/hotspot/share/classfile/verifier.cpp
>>>>>>
>>>>>> You moved 
>>>>>> SystemDictionaryShared::finalize_verification_constraints(_klass) 
>>>>>> to ClassVerifier::verify_class(). We should be very careful in 
>>>>>> this area. Have you tested with cases where classes fail 
>>>>>> verification?
>>>>>
>>>>> void ClassVerifier::verify_class(TRAPS) {
>>>>>   log_info(verification)("Verifying class %s with new format", 
>>>>> _klass->external_name());
>>>>>
>>>>>   Array<Method*>* methods = _klass->methods();
>>>>>   int num_methods = methods->length();
>>>>>
>>>>>   for (int index = 0; index < num_methods; index++) {
>>>>>     // Check for recursive re-verification before each method.
>>>>>     if (was_recursively_verified())  return;
>>>>>
>>>>>     Method* m = methods->at(index);
>>>>>     if (m->is_native() || m->is_abstract() || m->is_overpass()) {
>>>>>       // If m is native or abstract, skip it.  It is checked in 
>>>>> class file
>>>>>       // parser that methods do not override a final method. 
>>>>> Overpass methods
>>>>>       // are trusted since the VM generates them.
>>>>>       continue;
>>>>>     }
>>>>>     verify_method(methodHandle(THREAD, m), CHECK_VERIFY(this));
>>>>>   }
>>>>>
>>>>>   if (DumpSharedSpaces) {
>>>>> SystemDictionaryShared::finalize_verification_constraints(_klass);
>>>>>   }
>>>>>
>>>>> If verification failed, the CHECK_VERIFY macro will force the 
>>>>> function to return. So we will call 
>>>>> finalize_verification_constraints only when verification has 
>>>>> succeeded.
>>>>
>>>> Ok, thanks for checking.
>>>>
>>>> An alternative is to allocate from the RO space directly after 
>>>> relocation. That way you can avoid adding another CDS specific in 
>>>> the verifier code. There is also a FIXME for using the symbol 
>>>> offset in SharedDictionaryEntry::finalize_verification_constraints, 
>>>> which you could handle together.
>>>
>>> I'll leave this patch simple so it can easily be backported to JDK 11.
>>>
>>> I'll do what you suggested and deal with the FIXME in a separate RFE 
>>> (https://bugs.openjdk.java.net/browse/JDK-8213346).
>>
>> Follow up with a separate RFE sounds good to me. For backporting to 
>> JDK 11, the risk of the fix seems to be higher than the benefit 
>> (considering the nature of the issue). Maybe we should have the 
>> proper fix backported instead (if backporting is required)?
>>
>
> With JDK 11, if you have more than a few thousand classes, 
> -Xshare:dump will randomly fail. The workaround is to explicitly set 
> -XX:MetaspaceSize=128m.
>
> I think we should fix it in JDK 11. The fix in the current RFR is 
> pretty harmless. It just moves the allocation of the 
> _verifier_constraints earlier.
>
> Thanks
> - Ioi
>
>
>>>
>>>>>
>>>>>
>>>>>> I haven't ran your code yet, we should avoid copying the 
>>>>>> constraints for classes that fails verification at dump time.
>>>>>
>>>>> If a class has failed verification, the class will not be 
>>>>> archived. So whether we have called 
>>>>> finalize_verification_constraints on it really doesn't matter.
>>>>
>>>> Since it doesn't do the copying when verification fails, it is okay 
>>>> in this case. With the work for dynamic archiving, we want to avoid 
>>>> unnecessary work at dump time, since performance/footprint will 
>>>> become higher priority with archiving at runtime.
>>>>>
>>>>>> I also need to think through any potential impact on dynamic 
>>>>>> archiving (possibly none).
>>>>>>
>>>>>> - src/hotspot/share/memory/metaspace.cpp
>>>>>>
>>>>>> -    if (is_init_completed() && !(DumpSharedSpaces && 
>>>>>> THREAD->is_VM_thread())) {
>>>>>> +    if (is_init_completed()) {
>>>>>>
>>>>>> Please also add an assert to make sure we don't allocate 
>>>>>> metaspace objects from the VM_thread at dump time now.
>>>>>>
>>>>>
>>>>> I think it's better to add a blanket 
>>>>> assert(!THREAD->is_VM_thread()), because no one should be 
>>>>> allocating metaspace objects inside VM thread.
>>>>>
>>>>> Adding this has a wider impact, so I will do it in a separate RFE 
>>>>> (https://bugs.openjdk.java.net/browse/JDK-8213331), in case we 
>>>>> need to back it out.
>>>>
>>>> I'd suggest including the assert for dump time with the fix so we 
>>>> are sure it is safe. Then you can follow up with the broader assert 
>>>> that you are planning for JDK-8213331.
>>>>
>>>
>>> I see. I've added
>>>
>>> MetaWord* Metaspace::allocate(ClassLoaderData* loader_data, size_t 
>>> word_size,
>>>                               MetaspaceObj::Type type, TRAPS) {
>>>   assert(!_frozen, "sanity");
>>> + assert(!(DumpSharedSpaces && THREAD->is_VM_thread()), "sanity");
>>>
>>> I'll just remove the (DumpSharedSpaces &&) in JDK-8213331.
>>
>> Thumbs up.
>>
>> Thanks,
>> Jiangli
>>
>>>
>>> Thanks
>>> - Ioi
>>>
>>>> Thanks,
>>>> Jiangli
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>> Thanks,
>>>>>>
>>>>>> Jiangli
>>>>>>
>>>>>>
>>>>>> On 11/1/18 9:47 PM, Ioi Lam wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've revised the webrev to get to the the root of the problem -- 
>>>>>>> we shouldn't allocate metaspace objects from within the VM Thread!
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v02/ 
>>>>>>>
>>>>>>>
>>>>>>> The original bug that caused the "!(DumpSharedSpaces && 
>>>>>>> THREAD->is_VM_thread())" check to be added in 
>>>>>>> Metaspace::allocate() was
>>>>>>>
>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8196626
>>>>>>>     UseAppCDS.java crashes with "VM thread using lock Heap_lock 
>>>>>>> (not allowed to block on)"
>>>>>>>
>>>>>>> The fix in JDK-8196626 is just a band-aid solution -- avoiding 
>>>>>>> the GC. The proper fix is not to do any allocation inside the VM 
>>>>>>> thread. So instead of calling finalize_verification_constraints, 
>>>>>>> which allocates MetaspaceObjs, very late in the CDS dumping 
>>>>>>> process, now we call be as soon as a class has been verified.
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11/1/18 11:07 AM, Ioi Lam wrote:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213250
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v01/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> When Metaspace::allocate() fails, usually we first try to GC to 
>>>>>>>> see if
>>>>>>>> some MetaspaceObjs can be freed. This is an optimization to 
>>>>>>>> avoid prematurely
>>>>>>>> expanding the metaspace.
>>>>>>>>
>>>>>>>> However, GC cannot run in the last phase of CDS which runs 
>>>>>>>> inside the VM thread.
>>>>>>>> The old code just aborts the CDS dump. The fix is to 
>>>>>>>> unconditionally expand the
>>>>>>>> metaspace without doing a GC.
>>>>>>>>
>>>>>>>> (During GC dump time, no classes are unloaded, so even if we 
>>>>>>>> force a GC, it would
>>>>>>>> not free up any metaspace anyway).
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>>
>>>>>>>> Before the fix, the test case (jruby) would fail to dump once a 
>>>>>>>> couple of times. Now
>>>>>>>> it has dumped hundreds of times without failure.
>>>>>>>>
>>>>>>>> I am also running hs-tier 1/2
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-dev mailing list