RFR(XS) 8213250 CDS archive creation aborts due to metaspace object allocation failure

Jiangli Zhou jiangli.zhou at oracle.com
Thu Nov 8 01:46:39 UTC 2018


Hi Ioi,

This approach looks cleaner to me.

Thanks!

Jiangli


On 11/7/18 2:34 PM, Ioi Lam wrote:
> Hi Jiangli,
>
> To address your concern about changing the verifier, I have updated 
> the patch to remove the change in verifier.cpp. Instead, I just moved 
> the finalize_verification_constraints() call slightly ahead (to 
> immediately before we enter the VM thread).
>
> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v03/ 
>
>
> Note that I have to use a different way of iterating over all classes 
> in SystemDictionaryShared::finalize_verification_constraints_for, 
> because at this point we haven't combined all the shared classes into 
> a single directory yet.
>
> Thanks
>
> -Ioi
>
>
>
> On 11/4/2018 2:29 PM, Ioi Lam wrote:
>>
>>
>> On 11/4/18 12:12 PM, Jiangli Zhou wrote:
>>> On 11/3/18 3:07 PM, Ioi Lam wrote:
>>>
>>>>
>>>>
>>>> On 11/2/18 10:42 PM, Jiangli Zhou wrote:
>>>>> Hi Ioi,
>>>>>
>>>>>
>>>>> On 11/2/18 5:27 PM, Ioi Lam wrote:
>>>>>> Hi Jiangli,
>>>>>>
>>>>>> Thanks for the review
>>>>>>
>>>>>>
>>>>>> On 11/2/18 1:24 PM, Jiangli Zhou wrote:
>>>>>>> Hi Ioi,
>>>>>>>
>>>>>>> - src/hotspot/share/classfile/verifier.cpp
>>>>>>>
>>>>>>> You moved 
>>>>>>> SystemDictionaryShared::finalize_verification_constraints(_klass) 
>>>>>>> to ClassVerifier::verify_class(). We should be very careful in 
>>>>>>> this area. Have you tested with cases where classes fail 
>>>>>>> verification?
>>>>>>
>>>>>> void ClassVerifier::verify_class(TRAPS) {
>>>>>>   log_info(verification)("Verifying class %s with new format", 
>>>>>> _klass->external_name());
>>>>>>
>>>>>>   Array<Method*>* methods = _klass->methods();
>>>>>>   int num_methods = methods->length();
>>>>>>
>>>>>>   for (int index = 0; index < num_methods; index++) {
>>>>>>     // Check for recursive re-verification before each method.
>>>>>>     if (was_recursively_verified())  return;
>>>>>>
>>>>>>     Method* m = methods->at(index);
>>>>>>     if (m->is_native() || m->is_abstract() || m->is_overpass()) {
>>>>>>       // If m is native or abstract, skip it.  It is checked in 
>>>>>> class file
>>>>>>       // parser that methods do not override a final method. 
>>>>>> Overpass methods
>>>>>>       // are trusted since the VM generates them.
>>>>>>       continue;
>>>>>>     }
>>>>>>     verify_method(methodHandle(THREAD, m), CHECK_VERIFY(this));
>>>>>>   }
>>>>>>
>>>>>>   if (DumpSharedSpaces) {
>>>>>> SystemDictionaryShared::finalize_verification_constraints(_klass);
>>>>>>   }
>>>>>>
>>>>>> If verification failed, the CHECK_VERIFY macro will force the 
>>>>>> function to return. So we will call 
>>>>>> finalize_verification_constraints only when verification has 
>>>>>> succeeded.
>>>>>
>>>>> Ok, thanks for checking.
>>>>>
>>>>> An alternative is to allocate from the RO space directly after 
>>>>> relocation. That way you can avoid adding another CDS specific in 
>>>>> the verifier code. There is also a FIXME for using the symbol 
>>>>> offset in 
>>>>> SharedDictionaryEntry::finalize_verification_constraints, which 
>>>>> you could handle together.
>>>>
>>>> I'll leave this patch simple so it can easily be backported to JDK 11.
>>>>
>>>> I'll do what you suggested and deal with the FIXME in a separate 
>>>> RFE (https://bugs.openjdk.java.net/browse/JDK-8213346).
>>>
>>> Follow up with a separate RFE sounds good to me. For backporting to 
>>> JDK 11, the risk of the fix seems to be higher than the benefit 
>>> (considering the nature of the issue). Maybe we should have the 
>>> proper fix backported instead (if backporting is required)?
>>>
>>
>> With JDK 11, if you have more than a few thousand classes, 
>> -Xshare:dump will randomly fail. The workaround is to explicitly set 
>> -XX:MetaspaceSize=128m.
>>
>> I think we should fix it in JDK 11. The fix in the current RFR is 
>> pretty harmless. It just moves the allocation of the 
>> _verifier_constraints earlier.
>>
>> Thanks
>> - Ioi
>>
>>
>>>>
>>>>>>
>>>>>>
>>>>>>> I haven't ran your code yet, we should avoid copying the 
>>>>>>> constraints for classes that fails verification at dump time.
>>>>>>
>>>>>> If a class has failed verification, the class will not be 
>>>>>> archived. So whether we have called 
>>>>>> finalize_verification_constraints on it really doesn't matter.
>>>>>
>>>>> Since it doesn't do the copying when verification fails, it is 
>>>>> okay in this case. With the work for dynamic archiving, we want to 
>>>>> avoid unnecessary work at dump time, since performance/footprint 
>>>>> will become higher priority with archiving at runtime.
>>>>>>
>>>>>>> I also need to think through any potential impact on dynamic 
>>>>>>> archiving (possibly none).
>>>>>>>
>>>>>>> - src/hotspot/share/memory/metaspace.cpp
>>>>>>>
>>>>>>> -    if (is_init_completed() && !(DumpSharedSpaces && 
>>>>>>> THREAD->is_VM_thread())) {
>>>>>>> +    if (is_init_completed()) {
>>>>>>>
>>>>>>> Please also add an assert to make sure we don't allocate 
>>>>>>> metaspace objects from the VM_thread at dump time now.
>>>>>>>
>>>>>>
>>>>>> I think it's better to add a blanket 
>>>>>> assert(!THREAD->is_VM_thread()), because no one should be 
>>>>>> allocating metaspace objects inside VM thread.
>>>>>>
>>>>>> Adding this has a wider impact, so I will do it in a separate RFE 
>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8213331), in case we 
>>>>>> need to back it out.
>>>>>
>>>>> I'd suggest including the assert for dump time with the fix so we 
>>>>> are sure it is safe. Then you can follow up with the broader 
>>>>> assert that you are planning for JDK-8213331.
>>>>>
>>>>
>>>> I see. I've added
>>>>
>>>> MetaWord* Metaspace::allocate(ClassLoaderData* loader_data, size_t 
>>>> word_size,
>>>>                               MetaspaceObj::Type type, TRAPS) {
>>>>   assert(!_frozen, "sanity");
>>>> + assert(!(DumpSharedSpaces && THREAD->is_VM_thread()), "sanity");
>>>>
>>>> I'll just remove the (DumpSharedSpaces &&) in JDK-8213331.
>>>
>>> Thumbs up.
>>>
>>> Thanks,
>>> Jiangli
>>>
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>>
>>>>>> Thanks
>>>>>> - Ioi
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jiangli
>>>>>>>
>>>>>>>
>>>>>>> On 11/1/18 9:47 PM, Ioi Lam wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've revised the webrev to get to the the root of the problem 
>>>>>>>> -- we shouldn't allocate metaspace objects from within the VM 
>>>>>>>> Thread!
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v02/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> The original bug that caused the "!(DumpSharedSpaces && 
>>>>>>>> THREAD->is_VM_thread())" check to be added in 
>>>>>>>> Metaspace::allocate() was
>>>>>>>>
>>>>>>>>     https://bugs.openjdk.java.net/browse/JDK-8196626
>>>>>>>>     UseAppCDS.java crashes with "VM thread using lock Heap_lock 
>>>>>>>> (not allowed to block on)"
>>>>>>>>
>>>>>>>> The fix in JDK-8196626 is just a band-aid solution -- avoiding 
>>>>>>>> the GC. The proper fix is not to do any allocation inside the 
>>>>>>>> VM thread. So instead of calling 
>>>>>>>> finalize_verification_constraints, which allocates 
>>>>>>>> MetaspaceObjs, very late in the CDS dumping process, now we 
>>>>>>>> call be as soon as a class has been verified.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/1/18 11:07 AM, Ioi Lam wrote:
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213250
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v01/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> When Metaspace::allocate() fails, usually we first try to GC 
>>>>>>>>> to see if
>>>>>>>>> some MetaspaceObjs can be freed. This is an optimization to 
>>>>>>>>> avoid prematurely
>>>>>>>>> expanding the metaspace.
>>>>>>>>>
>>>>>>>>> However, GC cannot run in the last phase of CDS which runs 
>>>>>>>>> inside the VM thread.
>>>>>>>>> The old code just aborts the CDS dump. The fix is to 
>>>>>>>>> unconditionally expand the
>>>>>>>>> metaspace without doing a GC.
>>>>>>>>>
>>>>>>>>> (During GC dump time, no classes are unloaded, so even if we 
>>>>>>>>> force a GC, it would
>>>>>>>>> not free up any metaspace anyway).
>>>>>>>>>
>>>>>>>>> Testing:
>>>>>>>>>
>>>>>>>>> Before the fix, the test case (jruby) would fail to dump once 
>>>>>>>>> a couple of times. Now
>>>>>>>>> it has dumped hundreds of times without failure.
>>>>>>>>>
>>>>>>>>> I am also running hs-tier 1/2
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the hotspot-dev mailing list