RFR: 8056240: Investigate increased GC remark time after class unloading changes in CRM Fuse

Mon Oct 13 09:15:48 UTC 2014

Hi Coleen,

On 2014-10-10 22:54, Coleen Phillimore wrote:
>
> Hi Stefan,
>
> Thank you for your patience.  Some review comments:
>
> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/classfile/metadataOnStackMark.hpp.udiff.html 
>
>
> Why is there all this code here?

It's a 45 lines data structure to collect marked Metadata pointers.

> In order to make this parallel, can you just not have simply a 
> GrowableArray per thread?

I wouldn't be able to easily reuse the memory, since the sizes of the 
GrowableArrays could be varying. Having a chunked list, makes this much 
easier.

> I don't see the purpose of all this complexity in this place.

I guess you are opposing code in metadataOnStackMark, rather than saying 
that this is complicated code?

> Is it trying to add as chunked list?

Yes.

>   Can we add a utility for this instead of tying it to the 
> MetadataOnStackMark functionality?

We could. That would make it possible to reuse the code, but the code 
would be a tiny bit more non-obvious. I can try to implement this as a 
generic template data structure.

>
> In this file, what is _doit and why?   This looks really strange. Can 
> you add a comment why sometimes doit is false and sometimes doit is 
> true?  Is there a way to encapsulate this in a function as it's 
> supposed to be a scoped object.

There is a code path where I need to not use the MetadataOnStackMark. 
Since it is a scoped object, I can't just use something like this to 
turn off the MetadataOnStackMark::
{
   if (mark_on_stack) {
     MetadataOnStackMark m;
   }
   // Execute the code in ClassLoaderDataGraph::do_unloading
}

I need to do:
{
   MetadataOnStackMark m(mark_on_stack):
   // Execute the code in ClassLoaderDataGraph::do_unloading
}

This is what _doit is used for. The alternatives I saw was:
1) Do what I did
2) Restructure ClassLoaderDataGraph::do_unloading
3) Stop using MetadataOnStackMark as a scoped object.

I thought (1) was the least intrusive change.

I can change to whatever you want here. I can rename _doit to _enabled, 
if you think that's a better name.

>
> The MetadataOnStackMark object needs to go around the calls to 
> purge_previous_versions and free_deallocate_list.  It doesn't have to 
> be around the whole walk of the CLDG, it just started out that way and 
> has gotten more complicated.  If you add it to the end like:
>
> void ClassLoaderDataGraph::clean_metaspaces() {
>   // mark metadata seen on the stack and code cache so we can delete 
> unneeded entries.
>   bool has_redefined_a_class = JvmtiExport::has_redefined_a_class();
>   MetadataOnStackMark md_on_stack(has_redefined_a_class);
>   if (has_redefined_a_class) {
>     // purge_previous_versions also cleans weak method links. Because
>     // one method's MDO can reference another method from another
>     // class loader, we need to first clean weak method links for all
>     // class loaders here. Below, we can then free redefined methods
>     // for all class loaders.
>     for (ClassLoaderData* data = _head; data != NULL; data = 
> data->next()) {
>       data->classes_do(InstanceKlass::purge_previous_versions);
>     }
>   }
>   // Need two loops.
>   for (ClassLoaderData* data = _head; data != NULL; data = 
> data->next()) {
>     data->free_deallocate_list();
>   }
> }
>
> And only call it at the end of do_unloading this would work (I've 
> tested this)
>
>   if (has_metadata_to_deallocate) {
>     clean_metaspaces();
>   }
>
> has_metadata_to_deallocate is generally true because of lambda 
> constant pools.

Sounds good to me. Should has_metadata_to_deallocate always be true in 
the current code? I don't want to add code to check if Metadata has been 
allocated. I think that should be handled as a separate RFE.

>
> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp.udiff.html 
>
>
> +    if (JvmtiExport::has_redefined_a_class()) {
> +      InstanceKlass::purge_previous_versions(ik);
> +    }
>
>
> purge previous versions requires the on_stack bits to be set.  Is this 
> true here?

Yes, that's what the MetadataOnStackMark in 
ConcurrentMark::weakRefsWorkParallelPart (concurrentMark.cpp) is used for.

>   It seems very far away from the other code to me to be able to tell.

Yes, I agree. It's non-obvious, but I couldn't figure out a good way to 
solve it. In similar situations, I've passed down the instance of the 
class, in this case MetadataOnStackMark, to the code that uses it. Just 
to ensure that we have an instance setup when calling the code. We've 
discussed that approach a couple of years ago, but you didn't like it 
then. We have similar problems with ResourceMarks and HandleMarks, but 
code using Handles or Resource allocated memory asserts that those are 
in place. Maybe we should do the same for on_stack?

> Why can't this go in the same place as the free_deallocate_lists?

Because purge_previous_versions can be called in parallel, whereas 
free_deallocate_lists needs to be called when we are sure that all calls 
to purge_previous_versions have been called. So, free_deallocate_lists 
was added as a separate phase after the parallel unloading code. I can 
add comments about this.

>
> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/code/nmethod.cpp.udiff.html 
>
>
> I really don't like that the on-stack bit marks are done at the same 
> time as clean_ic_if_metadata_is_dead.   They are for different purposes.

Yes, I agree. It's an abstraction violation, but I've done so that we 
don't have to execute the expensive reloc iter lookup code more than 
once. Maybe I should use a callback instead, so that we don't hard-wire 
the MetadataOnStackMark code into the clean_ic_if_metadata_is_dead function?

> Also, this seems to mark metadata for methods that are unloaded as 
> well as ones that survive unloading.  I guess this wouldn't prevent 
> methods from being unloaded but seems unnecessary callbacks to 
> mark_on_stack.   Maybe I'm not reading this right though.

I think you are right. On the other hand, previously we visited all 
Metadata in the entire code cache, and we didn't do this separation. I'd 
like this to be changes as a follow-up fix, so that I don't introduce a 
regression here.

>   I thought you couldn't do the unloading walk with the mark-on-stack 
> walk?

There are two separate phases of the unloading walk and a barrier in 
between those phases. The mark-on-stack walk happens in the first phase, 
and the purge_previous_version happens in the second phase. The 
free_deallocate_lists is called in a serial phase after that.

> Also, we actually only have to mark OLD methods in the code cache 
> which would greatly reduce the callbacks. I could make this in a 
> separate change as I have this change in a private repository.

Sounds good. You mean OLD as in a previous version plus the Methods that 
were put on the deallocate_free_list?

>
> I think you should have a compiler reviewer comment on the nmethod 
> code though.
>
> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/utilities/accessFlags.cpp.udiff.html 
>
>
> I don't know why this needs to be different than the atomic_set_bits 
> function.

Previously, we didn't care if the Metadata had already been marked. They 
were just blindly recorded by MetadataOnStackMark. That caused the 
number of stored elements to be orders of magnitude larger than they had 
to be. To fix this, I only record Metadata if they have not already been 
added. I need this new function to tell if the current call, by the 
current thread, was setting the mark bit. The thread that succeeds with 
this marking, will be the one that records the Metadata in the array (or 
now, chunked list).

Bertrand had comments about the usage of atomic_try_set_bits, so I have 
a version that uses a slightly altered copy of AccessFlags::atomic_set_bits.

> I should have made on_stack be a flag in Method* rather than in access 
> flags.

Sounds like a good follow-up fix.

>
> There's a lot here.  I'm glad it helps performance and making the code 
> cache walk parallel is a really good change.

Thanks Coleen.

StefanK

>
> Thanks,
> Coleen
>
>
> On 10/2/14, 6:53 AM, Stefan Karlsson wrote:
>> Hi all,
>>
>> (The following patch changes HotSpot code in areas concerning GC, RT, 
>> and Compiler. So, it would be good to get reviews from all three teams.)
>>
>> Please, review this patch to optimize and parallelize the CodeCache 
>> part of MetadaOnStackMark.
>>
>> G1 performance measurements showed longer than expected remark times 
>> on an application using a lot of nmethods and Metadata. The cause for 
>> this performance regression was the call to 
>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack); in 
>> MetadataOnStackMark. This code path is only taken when class 
>> redefinition is used. Class redefinition is typically used in 
>> monitoring and diagnostic frameworks.
>>
>> With this patch we now:
>> 1) Filter out duplicate Metadata* entries instead of storing a 
>> Metadata* per visited metadata reference.
>> 2) Collect the set of Metadata* entries in parallel. The code 
>> piggy-backs on the parallel CodeCache cleaning task.
>>
>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8056240
>>
>> Functional testing:
>> JPRT, Kitchensink, parallel_class_unloading, unit tests
>>
>> Performance testing:
>> CRM Fuse - where the regression was found
>>
>> The patch changes HotSpot code in areas concerning GC, RT, Compiler, 
>> and Serviceability. It would be good to get some reviews from the 
>> other teams, and not only from the GC team.
>>
>> thanks,
>> StefanK
>