RFR(XS) 8213250 CDS archive creation aborts due to metaspace object allocation failure

Ioi Lam ioi.lam at oracle.com
Sat Nov 3 22:07:05 UTC 2018



On 11/2/18 10:42 PM, Jiangli Zhou wrote:
> Hi Ioi,
>
>
> On 11/2/18 5:27 PM, Ioi Lam wrote:
>> Hi Jiangli,
>>
>> Thanks for the review
>>
>>
>> On 11/2/18 1:24 PM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>> - src/hotspot/share/classfile/verifier.cpp
>>>
>>> You moved 
>>> SystemDictionaryShared::finalize_verification_constraints(_klass) to 
>>> ClassVerifier::verify_class(). We should be very careful in this 
>>> area. Have you tested with cases where classes fail verification?
>>
>> void ClassVerifier::verify_class(TRAPS) {
>>   log_info(verification)("Verifying class %s with new format", 
>> _klass->external_name());
>>
>>   Array<Method*>* methods = _klass->methods();
>>   int num_methods = methods->length();
>>
>>   for (int index = 0; index < num_methods; index++) {
>>     // Check for recursive re-verification before each method.
>>     if (was_recursively_verified())  return;
>>
>>     Method* m = methods->at(index);
>>     if (m->is_native() || m->is_abstract() || m->is_overpass()) {
>>       // If m is native or abstract, skip it.  It is checked in class 
>> file
>>       // parser that methods do not override a final method. Overpass 
>> methods
>>       // are trusted since the VM generates them.
>>       continue;
>>     }
>>     verify_method(methodHandle(THREAD, m), CHECK_VERIFY(this));
>>   }
>>
>>   if (DumpSharedSpaces) {
>> SystemDictionaryShared::finalize_verification_constraints(_klass);
>>   }
>>
>> If verification failed, the CHECK_VERIFY macro will force the 
>> function to return. So we will call finalize_verification_constraints 
>> only when verification has succeeded.
>
> Ok, thanks for checking.
>
> An alternative is to allocate from the RO space directly after 
> relocation. That way you can avoid adding another CDS specific in the 
> verifier code. There is also a FIXME for using the symbol offset in 
> SharedDictionaryEntry::finalize_verification_constraints, which you 
> could handle together.

I'll leave this patch simple so it can easily be backported to JDK 11.

I'll do what you suggested and deal with the FIXME in a separate RFE 
(https://bugs.openjdk.java.net/browse/JDK-8213346).

>>
>>
>>> I haven't ran your code yet, we should avoid copying the constraints 
>>> for classes that fails verification at dump time.
>>
>> If a class has failed verification, the class will not be archived. 
>> So whether we have called finalize_verification_constraints on it 
>> really doesn't matter.
>
> Since it doesn't do the copying when verification fails, it is okay in 
> this case. With the work for dynamic archiving, we want to avoid 
> unnecessary work at dump time, since performance/footprint will become 
> higher priority with archiving at runtime.
>>
>>> I also need to think through any potential impact on dynamic 
>>> archiving (possibly none).
>>>
>>> - src/hotspot/share/memory/metaspace.cpp
>>>
>>> -    if (is_init_completed() && !(DumpSharedSpaces && 
>>> THREAD->is_VM_thread())) {
>>> +    if (is_init_completed()) {
>>>
>>> Please also add an assert to make sure we don't allocate metaspace 
>>> objects from the VM_thread at dump time now.
>>>
>>
>> I think it's better to add a blanket assert(!THREAD->is_VM_thread()), 
>> because no one should be allocating metaspace objects inside VM thread.
>>
>> Adding this has a wider impact, so I will do it in a separate RFE 
>> (https://bugs.openjdk.java.net/browse/JDK-8213331), in case we need 
>> to back it out.
>
> I'd suggest including the assert for dump time with the fix so we are 
> sure it is safe. Then you can follow up with the broader assert that 
> you are planning for JDK-8213331.
>

I see. I've added

MetaWord* Metaspace::allocate(ClassLoaderData* loader_data, size_t 
word_size,
                               MetaspaceObj::Type type, TRAPS) {
   assert(!_frozen, "sanity");
+ assert(!(DumpSharedSpaces && THREAD->is_VM_thread()), "sanity");

I'll just remove the (DumpSharedSpaces &&) in JDK-8213331.

Thanks
- Ioi

> Thanks,
> Jiangli
>>
>> Thanks
>> - Ioi
>>> Thanks,
>>>
>>> Jiangli
>>>
>>>
>>> On 11/1/18 9:47 PM, Ioi Lam wrote:
>>>> Hi,
>>>>
>>>> I've revised the webrev to get to the the root of the problem -- we 
>>>> shouldn't allocate metaspace objects from within the VM Thread!
>>>>
>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v02/ 
>>>>
>>>>
>>>> The original bug that caused the "!(DumpSharedSpaces && 
>>>> THREAD->is_VM_thread())" check to be added in Metaspace::allocate() 
>>>> was
>>>>
>>>>     https://bugs.openjdk.java.net/browse/JDK-8196626
>>>>     UseAppCDS.java crashes with "VM thread using lock Heap_lock 
>>>> (not allowed to block on)"
>>>>
>>>> The fix in JDK-8196626 is just a band-aid solution -- avoiding the 
>>>> GC. The proper fix is not to do any allocation inside the VM 
>>>> thread. So instead of calling finalize_verification_constraints, 
>>>> which allocates MetaspaceObjs, very late in the CDS dumping 
>>>> process, now we call be as soon as a class has been verified.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 11/1/18 11:07 AM, Ioi Lam wrote:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8213250
>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v01/ 
>>>>>
>>>>>
>>>>> When Metaspace::allocate() fails, usually we first try to GC to 
>>>>> see if
>>>>> some MetaspaceObjs can be freed. This is an optimization to avoid 
>>>>> prematurely
>>>>> expanding the metaspace.
>>>>>
>>>>> However, GC cannot run in the last phase of CDS which runs inside 
>>>>> the VM thread.
>>>>> The old code just aborts the CDS dump. The fix is to 
>>>>> unconditionally expand the
>>>>> metaspace without doing a GC.
>>>>>
>>>>> (During GC dump time, no classes are unloaded, so even if we force 
>>>>> a GC, it would
>>>>> not free up any metaspace anyway).
>>>>>
>>>>> Testing:
>>>>>
>>>>> Before the fix, the test case (jruby) would fail to dump once a 
>>>>> couple of times. Now
>>>>> it has dumped hundreds of times without failure.
>>>>>
>>>>> I am also running hs-tier 1/2
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>
>>>
>>
>



More information about the hotspot-dev mailing list