RFR(XS) 8213250 CDS archive creation aborts due to metaspace object allocation failure
Ioi Lam
ioi.lam at oracle.com
Thu Nov 8 04:10:47 UTC 2018
Hi Jiangli,
Thanks for the review. I will do more testing and push.
- Ioi
On 11/7/18 5:46 PM, Jiangli Zhou wrote:
> Hi Ioi,
>
> This approach looks cleaner to me.
>
> Thanks!
>
> Jiangli
>
>
> On 11/7/18 2:34 PM, Ioi Lam wrote:
>> Hi Jiangli,
>>
>> To address your concern about changing the verifier, I have updated
>> the patch to remove the change in verifier.cpp. Instead, I just moved
>> the finalize_verification_constraints() call slightly ahead (to
>> immediately before we enter the VM thread).
>>
>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v03/
>>
>>
>> Note that I have to use a different way of iterating over all classes
>> in SystemDictionaryShared::finalize_verification_constraints_for,
>> because at this point we haven't combined all the shared classes into
>> a single directory yet.
>>
>> Thanks
>>
>> -Ioi
>>
>>
>>
>> On 11/4/2018 2:29 PM, Ioi Lam wrote:
>>>
>>>
>>> On 11/4/18 12:12 PM, Jiangli Zhou wrote:
>>>> On 11/3/18 3:07 PM, Ioi Lam wrote:
>>>>
>>>>>
>>>>>
>>>>> On 11/2/18 10:42 PM, Jiangli Zhou wrote:
>>>>>> Hi Ioi,
>>>>>>
>>>>>>
>>>>>> On 11/2/18 5:27 PM, Ioi Lam wrote:
>>>>>>> Hi Jiangli,
>>>>>>>
>>>>>>> Thanks for the review
>>>>>>>
>>>>>>>
>>>>>>> On 11/2/18 1:24 PM, Jiangli Zhou wrote:
>>>>>>>> Hi Ioi,
>>>>>>>>
>>>>>>>> - src/hotspot/share/classfile/verifier.cpp
>>>>>>>>
>>>>>>>> You moved
>>>>>>>> SystemDictionaryShared::finalize_verification_constraints(_klass)
>>>>>>>> to ClassVerifier::verify_class(). We should be very careful in
>>>>>>>> this area. Have you tested with cases where classes fail
>>>>>>>> verification?
>>>>>>>
>>>>>>> void ClassVerifier::verify_class(TRAPS) {
>>>>>>> log_info(verification)("Verifying class %s with new format",
>>>>>>> _klass->external_name());
>>>>>>>
>>>>>>> Array<Method*>* methods = _klass->methods();
>>>>>>> int num_methods = methods->length();
>>>>>>>
>>>>>>> for (int index = 0; index < num_methods; index++) {
>>>>>>> // Check for recursive re-verification before each method.
>>>>>>> if (was_recursively_verified()) return;
>>>>>>>
>>>>>>> Method* m = methods->at(index);
>>>>>>> if (m->is_native() || m->is_abstract() || m->is_overpass()) {
>>>>>>> // If m is native or abstract, skip it. It is checked in
>>>>>>> class file
>>>>>>> // parser that methods do not override a final method.
>>>>>>> Overpass methods
>>>>>>> // are trusted since the VM generates them.
>>>>>>> continue;
>>>>>>> }
>>>>>>> verify_method(methodHandle(THREAD, m), CHECK_VERIFY(this));
>>>>>>> }
>>>>>>>
>>>>>>> if (DumpSharedSpaces) {
>>>>>>> SystemDictionaryShared::finalize_verification_constraints(_klass);
>>>>>>> }
>>>>>>>
>>>>>>> If verification failed, the CHECK_VERIFY macro will force the
>>>>>>> function to return. So we will call
>>>>>>> finalize_verification_constraints only when verification has
>>>>>>> succeeded.
>>>>>>
>>>>>> Ok, thanks for checking.
>>>>>>
>>>>>> An alternative is to allocate from the RO space directly after
>>>>>> relocation. That way you can avoid adding another CDS specific in
>>>>>> the verifier code. There is also a FIXME for using the symbol
>>>>>> offset in
>>>>>> SharedDictionaryEntry::finalize_verification_constraints, which
>>>>>> you could handle together.
>>>>>
>>>>> I'll leave this patch simple so it can easily be backported to JDK
>>>>> 11.
>>>>>
>>>>> I'll do what you suggested and deal with the FIXME in a separate
>>>>> RFE (https://bugs.openjdk.java.net/browse/JDK-8213346).
>>>>
>>>> Follow up with a separate RFE sounds good to me. For backporting to
>>>> JDK 11, the risk of the fix seems to be higher than the benefit
>>>> (considering the nature of the issue). Maybe we should have the
>>>> proper fix backported instead (if backporting is required)?
>>>>
>>>
>>> With JDK 11, if you have more than a few thousand classes,
>>> -Xshare:dump will randomly fail. The workaround is to explicitly set
>>> -XX:MetaspaceSize=128m.
>>>
>>> I think we should fix it in JDK 11. The fix in the current RFR is
>>> pretty harmless. It just moves the allocation of the
>>> _verifier_constraints earlier.
>>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I haven't ran your code yet, we should avoid copying the
>>>>>>>> constraints for classes that fails verification at dump time.
>>>>>>>
>>>>>>> If a class has failed verification, the class will not be
>>>>>>> archived. So whether we have called
>>>>>>> finalize_verification_constraints on it really doesn't matter.
>>>>>>
>>>>>> Since it doesn't do the copying when verification fails, it is
>>>>>> okay in this case. With the work for dynamic archiving, we want
>>>>>> to avoid unnecessary work at dump time, since
>>>>>> performance/footprint will become higher priority with archiving
>>>>>> at runtime.
>>>>>>>
>>>>>>>> I also need to think through any potential impact on dynamic
>>>>>>>> archiving (possibly none).
>>>>>>>>
>>>>>>>> - src/hotspot/share/memory/metaspace.cpp
>>>>>>>>
>>>>>>>> - if (is_init_completed() && !(DumpSharedSpaces &&
>>>>>>>> THREAD->is_VM_thread())) {
>>>>>>>> + if (is_init_completed()) {
>>>>>>>>
>>>>>>>> Please also add an assert to make sure we don't allocate
>>>>>>>> metaspace objects from the VM_thread at dump time now.
>>>>>>>>
>>>>>>>
>>>>>>> I think it's better to add a blanket
>>>>>>> assert(!THREAD->is_VM_thread()), because no one should be
>>>>>>> allocating metaspace objects inside VM thread.
>>>>>>>
>>>>>>> Adding this has a wider impact, so I will do it in a separate
>>>>>>> RFE (https://bugs.openjdk.java.net/browse/JDK-8213331), in case
>>>>>>> we need to back it out.
>>>>>>
>>>>>> I'd suggest including the assert for dump time with the fix so we
>>>>>> are sure it is safe. Then you can follow up with the broader
>>>>>> assert that you are planning for JDK-8213331.
>>>>>>
>>>>>
>>>>> I see. I've added
>>>>>
>>>>> MetaWord* Metaspace::allocate(ClassLoaderData* loader_data, size_t
>>>>> word_size,
>>>>> MetaspaceObj::Type type, TRAPS) {
>>>>> assert(!_frozen, "sanity");
>>>>> + assert(!(DumpSharedSpaces && THREAD->is_VM_thread()), "sanity");
>>>>>
>>>>> I'll just remove the (DumpSharedSpaces &&) in JDK-8213331.
>>>>
>>>> Thumbs up.
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/1/18 9:47 PM, Ioi Lam wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've revised the webrev to get to the the root of the problem
>>>>>>>>> -- we shouldn't allocate metaspace objects from within the VM
>>>>>>>>> Thread!
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v02/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The original bug that caused the "!(DumpSharedSpaces &&
>>>>>>>>> THREAD->is_VM_thread())" check to be added in
>>>>>>>>> Metaspace::allocate() was
>>>>>>>>>
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8196626
>>>>>>>>> UseAppCDS.java crashes with "VM thread using lock
>>>>>>>>> Heap_lock (not allowed to block on)"
>>>>>>>>>
>>>>>>>>> The fix in JDK-8196626 is just a band-aid solution -- avoiding
>>>>>>>>> the GC. The proper fix is not to do any allocation inside the
>>>>>>>>> VM thread. So instead of calling
>>>>>>>>> finalize_verification_constraints, which allocates
>>>>>>>>> MetaspaceObjs, very late in the CDS dumping process, now we
>>>>>>>>> call be as soon as a class has been verified.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/1/18 11:07 AM, Ioi Lam wrote:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213250
>>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk12/8213250-cds-dump-run-out-of-metaspace.v01/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When Metaspace::allocate() fails, usually we first try to GC
>>>>>>>>>> to see if
>>>>>>>>>> some MetaspaceObjs can be freed. This is an optimization to
>>>>>>>>>> avoid prematurely
>>>>>>>>>> expanding the metaspace.
>>>>>>>>>>
>>>>>>>>>> However, GC cannot run in the last phase of CDS which runs
>>>>>>>>>> inside the VM thread.
>>>>>>>>>> The old code just aborts the CDS dump. The fix is to
>>>>>>>>>> unconditionally expand the
>>>>>>>>>> metaspace without doing a GC.
>>>>>>>>>>
>>>>>>>>>> (During GC dump time, no classes are unloaded, so even if we
>>>>>>>>>> force a GC, it would
>>>>>>>>>> not free up any metaspace anyway).
>>>>>>>>>>
>>>>>>>>>> Testing:
>>>>>>>>>>
>>>>>>>>>> Before the fix, the test case (jruby) would fail to dump once
>>>>>>>>>> a couple of times. Now
>>>>>>>>>> it has dumped hundreds of times without failure.
>>>>>>>>>>
>>>>>>>>>> I am also running hs-tier 1/2
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-dev
mailing list