RFR: 8056240: Investigate increased GC remark time after class unloading changes in CRM Fuse

Fri Oct 10 20:54:03 UTC 2014

Hi Stefan,

Thank you for your patience.  Some review comments:

http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/classfile/metadataOnStackMark.hpp.udiff.html

Why is there all this code here?  In order to make this parallel, can 
you just not have simply a GrowableArray per thread?  I don't see the 
purpose of all this complexity in this place.  Is it trying to add as 
chunked list?   Can we add a utility for this instead of tying it to the 
MetadataOnStackMark functionality?

In this file, what is _doit and why?   This looks really strange. Can 
you add a comment why sometimes doit is false and sometimes doit is 
true?  Is there a way to encapsulate this in a function as it's supposed 
to be a scoped object.

The MetadataOnStackMark object needs to go around the calls to 
purge_previous_versions and free_deallocate_list.  It doesn't have to be 
around the whole walk of the CLDG, it just started out that way and has 
gotten more complicated.  If you add it to the end like:

void ClassLoaderDataGraph::clean_metaspaces() {
   // mark metadata seen on the stack and code cache so we can delete 
unneeded entries.
   bool has_redefined_a_class = JvmtiExport::has_redefined_a_class();
   MetadataOnStackMark md_on_stack(has_redefined_a_class);
   if (has_redefined_a_class) {
     // purge_previous_versions also cleans weak method links. Because
     // one method's MDO can reference another method from another
     // class loader, we need to first clean weak method links for all
     // class loaders here. Below, we can then free redefined methods
     // for all class loaders.
     for (ClassLoaderData* data = _head; data != NULL; data = 
data->next()) {
       data->classes_do(InstanceKlass::purge_previous_versions);
     }
   }
   // Need two loops.
   for (ClassLoaderData* data = _head; data != NULL; data = data->next()) {
     data->free_deallocate_list();
   }
}

And only call it at the end of do_unloading this would work (I've tested 
this)

   if (has_metadata_to_deallocate) {
     clean_metaspaces();
   }

has_metadata_to_deallocate is generally true because of lambda constant 
pools.

http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp.udiff.html

+    if (JvmtiExport::has_redefined_a_class()) {
+      InstanceKlass::purge_previous_versions(ik);
+    }

purge previous versions requires the on_stack bits to be set.  Is this 
true here?  It seems very far away from the other code to me to be able 
to tell.   Why can't this go in the same place as the free_deallocate_lists?

http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/code/nmethod.cpp.udiff.html

I really don't like that the on-stack bit marks are done at the same 
time as clean_ic_if_metadata_is_dead.   They are for different 
purposes.  Also, this seems to mark metadata for methods that are 
unloaded as well as ones that survive unloading.  I guess this wouldn't 
prevent methods from being unloaded but seems unnecessary callbacks to 
mark_on_stack.   Maybe I'm not reading this right though.  I thought you 
couldn't do the unloading walk with the mark-on-stack walk?   Also, we 
actually only have to mark OLD methods in the code cache which would 
greatly reduce the callbacks. I could make this in a separate change as 
I have this change in a private repository.

I think you should have a compiler reviewer comment on the nmethod code 
though.

http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/utilities/accessFlags.cpp.udiff.html

I don't know why this needs to be different than the atomic_set_bits 
function.  I should have made on_stack be a flag in Method* rather than 
in access flags.

There's a lot here.  I'm glad it helps performance and making the code 
cache walk parallel is a really good change.

Thanks,
Coleen

On 10/2/14, 6:53 AM, Stefan Karlsson wrote:
> Hi all,
>
> (The following patch changes HotSpot code in areas concerning GC, RT, 
> and Compiler. So, it would be good to get reviews from all three teams.)
>
> Please, review this patch to optimize and parallelize the CodeCache 
> part of MetadaOnStackMark.
>
> G1 performance measurements showed longer than expected remark times 
> on an application using a lot of nmethods and Metadata. The cause for 
> this performance regression was the call to 
> CodeCache::alive_nmethods_do(nmethod::mark_on_stack); in 
> MetadataOnStackMark. This code path is only taken when class 
> redefinition is used. Class redefinition is typically used in 
> monitoring and diagnostic frameworks.
>
> With this patch we now:
> 1) Filter out duplicate Metadata* entries instead of storing a 
> Metadata* per visited metadata reference.
> 2) Collect the set of Metadata* entries in parallel. The code 
> piggy-backs on the parallel CodeCache cleaning task.
>
> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8056240
>
> Functional testing:
> JPRT, Kitchensink, parallel_class_unloading, unit tests
>
> Performance testing:
> CRM Fuse - where the regression was found
>
> The patch changes HotSpot code in areas concerning GC, RT, Compiler, 
> and Serviceability. It would be good to get some reviews from the 
> other teams, and not only from the GC team.
>
> thanks,
> StefanK