RFR: 8056240: Investigate increased GC remark time after class unloading changes in CRM Fuse

Wed Oct 15 12:08:33 UTC 2014

Hi Coleen,

On 2014-10-14 23:00, Coleen Phillimore wrote:
>
> Stefan,   I've reviewed the new version of the code (02) and have some 
> comments.  Thank you for all the work resolving my concerns.
>
> The comments are inline.
>
> On 10/13/14, 5:15 AM, Stefan Karlsson wrote:
>> Hi Coleen,
>>
>> On 2014-10-10 22:54, Coleen Phillimore wrote:
>>>
>>> Hi Stefan,
>>>
>>> Thank you for your patience.  Some review comments:
>>>
>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/classfile/metadataOnStackMark.hpp.udiff.html 
>>>
>>>
> ...
>> I guess you are opposing code in metadataOnStackMark, rather than 
>> saying that this is complicated code?
>
> Well, both, but yes.  I want to think about MetadataOnStackMark in 
> this file.  Thank you for adding chunkedList.  It seems like it could 
> be generally useful.
>
>>
>>> Is it trying to add as chunked list?
>>
>> Yes.
>>
>>>   Can we add a utility for this instead of tying it to the 
>>> MetadataOnStackMark functionality?
>>
>> We could. That would make it possible to reuse the code, but the code 
>> would be a tiny bit more non-obvious. I can try to implement this as 
>> a generic template data structure.
>>
>>>
>>> In this file, what is _doit and why?   This looks really strange. 
>>> Can you add a comment why sometimes doit is false and sometimes doit 
>>> is true?  Is there a way to encapsulate this in a function as it's 
>>> supposed to be a scoped object.
>>
>>
> ...
>> Sounds good to me. Should has_metadata_to_deallocate always be true 
>> in the current code? I don't want to add code to check if Metadata 
>> has been allocated. I think that should be handled as a separate RFE.
>
> Thank you for adding this code this way.  I think conditional on 
> clean_alive is fine.
>>
>>>
>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/gc_implementation/g1/g1CollectedHeap.cpp.udiff.html 
>>>
>>>
>>> +    if (JvmtiExport::has_redefined_a_class()) {
>>> +      InstanceKlass::purge_previous_versions(ik);
>>> +    }
>>>
>>>
>>> purge previous versions requires the on_stack bits to be set. Is 
>>> this true here?
>>
>> Yes, that's what the MetadataOnStackMark in 
>> ConcurrentMark::weakRefsWorkParallelPart (concurrentMark.cpp) is used 
>> for.
>>
>>>   It seems very far away from the other code to me to be able to tell.
>>
>> Yes, I agree. It's non-obvious, but I couldn't figure out a good way 
>> to solve it. In similar situations, I've passed down the instance of 
>> the class, in this case MetadataOnStackMark, to the code that uses 
>> it. Just to ensure that we have an instance setup when calling the 
>> code. We've discussed that approach a couple of years ago, but you 
>> didn't like it then. We have similar problems with ResourceMarks and 
>> HandleMarks, but code using Handles or Resource allocated memory 
>> asserts that those are in place. Maybe we should do the same for 
>> on_stack?
>
> Yes, I remember some other code from the past.  I don't think it's 
> worth having a metadata-on-stack-mark assert.  It's just far away.  
> We'll have to see how hard this is to maintain going forward.
>>
>>> Why can't this go in the same place as the free_deallocate_lists?
>>
>> Because purge_previous_versions can be called in parallel, whereas 
>> free_deallocate_lists needs to be called when we are sure that all 
>> calls to purge_previous_versions have been called. So, 
>> free_deallocate_lists was added as a separate phase after the 
>> parallel unloading code. I can add comments about this.
>
> If you did purge_previous_versions and free_deallocate lists together 
> in clean_metaspaces() would it really hurt performance?

I can redo that investigation after this change has been pushed.

>
>>
>>>
>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/code/nmethod.cpp.udiff.html 
>>>
>>>
>>> I really don't like that the on-stack bit marks are done at the same 
>>> time as clean_ic_if_metadata_is_dead.   They are for different 
>>> purposes.
>>
>> Yes, I agree. It's an abstraction violation, but I've done so that we 
>> don't have to execute the expensive reloc iter lookup code more than 
>> once. Maybe I should use a callback instead, so that we don't 
>> hard-wire the MetadataOnStackMark code into the 
>> clean_ic_if_metadata_is_dead function?
>>
>>> Also, this seems to mark metadata for methods that are unloaded as 
>>> well as ones that survive unloading.  I guess this wouldn't prevent 
>>> methods from being unloaded but seems unnecessary callbacks to 
>>> mark_on_stack. Maybe I'm not reading this right though.
>>
>> I think you are right. On the other hand, previously we visited all 
>> Metadata in the entire code cache, and we didn't do this separation. 
>> I'd like this to be changes as a follow-up fix, so that I don't 
>> introduce a regression here.
>
> I think since the abstraction has been broken that metadata_do() 
> should be rewritten so that it shares the code that duplicates the 
> walks in
> nmethod::mark_metadata_on_stack_at
> nmethod::mark_metadata_on_stack_non_relocs()

You mean something like this?:
http://cr.openjdk.java.net/~stefank/8056240/webrev.02.do_metadata.delta
http://cr.openjdk.java.net/~stefank/8056240/webrev.02.do_metadata

>
>
> Rather than sharing code with CheckClass, which we didn't like because 
> it had to have a static _is_alive.

We used the static is_alive because nmethod::method_do takes a function 
pointer. We wouldn't need to do that if nmethod::method_do took a 
MetadataClosure, that could hold the _is_alive instance.

>
> Then it would be easier to check that only old methods are marked here.
>
> I can file this bug and fix it, since I have multiple bugs in this 
> right now to fix.

OK.

>>
>>>   I thought you couldn't do the unloading walk with the 
>>> mark-on-stack walk?
>>
>> There are two separate phases of the unloading walk and a barrier in 
>> between those phases. The mark-on-stack walk happens in the first 
>> phase, and the purge_previous_version happens in the second phase. 
>> The free_deallocate_lists is called in a serial phase after that.
>>
>>> Also, we actually only have to mark OLD methods in the code cache 
>>> which would greatly reduce the callbacks. I could make this in a 
>>> separate change as I have this change in a private repository.
>>
>> Sounds good. You mean OLD as in a previous version plus the Methods 
>> that were put on the deallocate_free_list?
>
> No, OLD is only methods that are redefined.  You can get other methods 
> on the deallocate_list from relocating jsr/ret from ancient 
> classfiles.  Also there are constant pools on the deallocate_list from 
> lambdas.   Perhaps it would be good to separate these things.

OK.

>>
>>>
>>> I think you should have a compiler reviewer comment on the nmethod 
>>> code though.
>>>
>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/src/share/vm/utilities/accessFlags.cpp.udiff.html 
>>>
>>>
>>> I don't know why this needs to be different than the atomic_set_bits 
>>> function.
>>
>> Previously, we didn't care if the Metadata had already been marked. 
>> They were just blindly recorded by MetadataOnStackMark. That caused 
>> the number of stored elements to be orders of magnitude larger than 
>> they had to be. To fix this, I only record Metadata if they have not 
>> already been added. I need this new function to tell if the current 
>> call, by the current thread, was setting the mark bit. The thread 
>> that succeeds with this marking, will be the one that records the 
>> Metadata in the array (or now, chunked list).
>>
>> Bertrand had comments about the usage of atomic_try_set_bits, so I 
>> have a version that uses a slightly altered copy of 
>> AccessFlags::atomic_set_bits.
>>
>>> I should have made on_stack be a flag in Method* rather than in 
>>> access flags.
>>
>> Sounds like a good follow-up fix.
>
> Yes, I'll file a bug for that also so I can clean this up.
>
> The only comment I had was to do purge_previous_versions with 
> free_deallocate_list.   But if it's shown to have bad performance, I'm 
> fine with the way it is.

OK. I'll leave it as it is, and then I can investigate this more in a 
separate cleanup CR.

>
> I think purge_previous_versions does work that could be done earlier 
> in class redefinition, which could be a third RFE from this.  If we 
> could make purge_previous_versions do little or nothing, these could 
> go together, right?

Sounds reasonable.

>
> Thanks for the code changes and comments.

Thanks for reviewing.

StefanK

>
> Coleen
>
>
>>
>>>
>>> There's a lot here.  I'm glad it helps performance and making the 
>>> code cache walk parallel is a really good change.
>>
>> Thanks Coleen.
>>
>> StefanK
>>
>>>
>>> Thanks,
>>> Coleen
>>>
>>>
>>> On 10/2/14, 6:53 AM, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> (The following patch changes HotSpot code in areas concerning GC, 
>>>> RT, and Compiler. So, it would be good to get reviews from all 
>>>> three teams.)
>>>>
>>>> Please, review this patch to optimize and parallelize the CodeCache 
>>>> part of MetadaOnStackMark.
>>>>
>>>> G1 performance measurements showed longer than expected remark 
>>>> times on an application using a lot of nmethods and Metadata. The 
>>>> cause for this performance regression was the call to 
>>>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack); in 
>>>> MetadataOnStackMark. This code path is only taken when class 
>>>> redefinition is used. Class redefinition is typically used in 
>>>> monitoring and diagnostic frameworks.
>>>>
>>>> With this patch we now:
>>>> 1) Filter out duplicate Metadata* entries instead of storing a 
>>>> Metadata* per visited metadata reference.
>>>> 2) Collect the set of Metadata* entries in parallel. The code 
>>>> piggy-backs on the parallel CodeCache cleaning task.
>>>>
>>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8056240
>>>>
>>>> Functional testing:
>>>> JPRT, Kitchensink, parallel_class_unloading, unit tests
>>>>
>>>> Performance testing:
>>>> CRM Fuse - where the regression was found
>>>>
>>>> The patch changes HotSpot code in areas concerning GC, RT, 
>>>> Compiler, and Serviceability. It would be good to get some reviews 
>>>> from the other teams, and not only from the GC team.
>>>>
>>>> thanks,
>>>> StefanK
>>>
>>
>