RFR(S): 8251860: ClassLoaderData::loaded_classes_do fails with "assert(ZAddress::is_marked(addr)) failed: Should be marked"

calvin.cheung at oracle.com calvin.cheung at oracle.com
Mon Aug 31 15:57:29 UTC 2020


Hi Ioi,

Thanks for your review!

On 8/28/20 3:54 PM, Ioi Lam wrote:
> Hi Calvin,
>
>  300 void ClassLoaderData::inc_keep_alive() {
>  301   if (has_class_mirror_holder()) {
>  302     if (!Arguments::is_dumping_archive()) {
>  303       assert(_keep_alive > 0, "Invalid keep alive increment count");
>  304     }
>  305     _keep_alive++;
>  306   }
>  307 }
>
> Why is 302 needed? I think in CollectCLDClosure, you wouldn't collect 
> any CLD whose _keep_alive count is zero.

The (!cld->is_unloading()) doesn't mean that the _keep_alive count is 
non-zero.

We currently have a few tests which would hit the assert if line 302 is 
not there. One of them is the appcds/dynamicArchive/LinkClassTest. While 
the "Parent x = new Child()" is being verified during linking, the 
VerificationType::resolve_and_check_assignability will be called. This 
will eventually call the SystemDictionary::register_loader but it will 
not add a class_loader to the ClassLoaderDataGraph. Prior to this patch, 
the _keep_alive will be incremented at 2 places:

1) java_lang_Class::set_mirror_module_field

    In this case, the _keep_alive will be incremented during vm startup.

2) ClassLoaderData::ClassLoaderData

    _keep_alive((has_class_mirror_holder || h_class_loader.is_null()) ? 
1 : 0),

Neither of the above code path will be taken for the assignability check.

>
> I think MetaspaceShared::link_and_cleanup_shared_classes() can be 
> simplified so you don't need to scan the CLDs in two passes.
>
> 1393  while (true) {
>         bool has_linked = false;
>         for (int i = 0; i < _loaded_cld->length(); i++) {
>           ClassLoaderData* cld = _loaded_cld->at(i);
>           for (Klass* klass = cld->klasses(); klass != NULL; klass = 
> klass->next_link()) {
>             if (klass->is_instance_klass()) {
>               InstanceKlass* ik = InstanceKlass::cast(ik);
>               if (inking_required(ik)) {
>                 has_linked = link_class_for_cds(ik, THREAD);
>               }
>             }
>           }
>         }
>
>         if (!has_linked) {
>           break;
>         }
>         // Class linking includes verification which may load more 
> classes.
>         // Keep scanning until we have linked no more classes.
>       }
>
I think the has_linked assignment should be "has_linked |= 
link_class_for_cds(ik, THREAD)" ?

The above single loop is cleaner but I think it is less efficient. For 
the simple case if there's no linking required after the first loop. It 
will need to do another full loop to check if all classes don't need to 
be linked in order to exit the loop. With the 2 passes approach, it only 
needs to inspect the first element of the _klasses list of each CLD to 
determine if we can exit the loop. I'm good with the single loop since 
CDS dump isn't too performance critical and it makes the code easier to 
read.

>
> Also link_class_for_cds should return the value of 
> MetaspaceShared::try_link_class(), so it has the same behavior as before.

Fixed.

Here's an updated webrev:

     http://cr.openjdk.java.net/~ccheung/jdk16/8251860/webrev.02/

thanks,
Calvin
>
> The rest of the changes looks good.
>
> Thanks
> - Ioi
>
> On 8/28/20 10:42 AM, calvin.cheung at oracle.com wrote:
>> Hi Coleen,
>>
>> Thanks for your review.
>>
>> I've re-implemented the fix with your suggestions.
>>
>> Here's an updated webrev:
>>
>>     http://cr.openjdk.java.net/~ccheung/jdk16/8251860/webrev.01/
>>
>> On 8/26/20 3:24 PM, Coleen Phillimore wrote:
>>>
>>> Hi Calvin,
>>>
>>> I think you should change this to call 
>>> ClassLoaderDataGraph::loaded_cld_do(&collect_cld) and collect 
>>> ClassLoaderData in the GrowableArray, since you then walk through 
>>> them again to link the classes in the class loader data.
>> With the ClassLoaderDataGraph::loaded_cld_do approach, some klasses 
>> in the _klasses list may already in the link state. So I have to 
>> account for that and not link those klasses again. It complicates the 
>> fix a little bit.
>>>
>>> Can linking one klass cause a new class to be added to a _klasses 
>>> list of a CLD that you've already walked?  It seems like you need 
>>> the do() loop in the current code wrapped around walking the clds, 
>>> then link the classes until no progress. You can fast exit when you 
>>> find a klass linked, I believe, since the classes are added to the 
>>> front of the _klasses list.
>> Yes, it is possible. The updated patch will check if the first klass 
>> in the _klass list of each CLD hasn't been linked, then it will loop 
>> back again to go through all the CLD's _klasses lists.
>>>
>>> Also with this change, you can remove locked_loaded_classes_do() 
>>> because it was added for CDS and apparently the comment is wrong.
>>>
>>> // This case can block but cannot do unloading (called from CDS)
>>> void ClassLoaderDataGraph::unlocked_loaded_classes_do(KlassClosure* 
>>> klass_closure) {
>>>   for (ClassLoaderData* cld = _head; cld != NULL; cld = cld->next()) {
>>>     cld->loaded_classes_do(klass_closure);
>>>   }
>>> }
>>
>> I've removed the above.
>>
>> thanks,
>>
>> Calvin
>>
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 8/26/20 5:35 PM, calvin.cheung at oracle.com wrote:
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8251860
>>>>
>>>> webrev: http://cr.openjdk.java.net/~ccheung/jdk16/8251860/webrev.00/
>>>>
>>>> Please refer to the bug report for an analysis of the crash.
>>>>
>>>> The proposed fix is to acquire the ClassLoaderDataGraph_lock before 
>>>> calling ClassLoaderDataGraph::loaded_classes_do(&link_closure). The 
>>>> link_closure will not do the actual linking but just to store the 
>>>> InstanceKlass'es in an array and increment the keep_alive counter 
>>>> of the class_loader_data of each InstanceKlass. After the call to 
>>>> ClassLoaderDataGraph::loaded_classes_do, each InstanceKlass in the 
>>>> array will be linked and the keep_alive counter of the 
>>>> class_loader_data will be decremented.
>>>>
>>>> Testing:
>>>>     - tested about 40 times on the MacOS host where the crash was seen
>>>>        (before the fix, the crash was seen about once in 20 runs)
>>>>     - Tier1 - 4.
>>>> thanks,
>>>> Calvin
>>>
>


More information about the hotspot-runtime-dev mailing list