RFR: 8056240: Investigate increased GC remark time after class unloading changes in CRM Fuse

Fri Oct 3 13:39:59 UTC 2014

On 2014-10-03 15:38, Stefan Karlsson wrote:
> Hi Betrand,

Sorry for misspelling your name, Bertrand.

StefanK

>
> Thanks for taking the time to review this.
>
> On 2014-10-03 14:42, Bertrand Delsart wrote:
>> Hi Stefan,
>>
>> I'll do a more in depth review next week but here are a few quick 
>> comments to keep us busy in between :-)
>>
>> -1- The fact that set_on_stack() is now implemented by just a 
>> atomic_try_set_bits seems dangerous.
>>
>> We could end up not setting the JVM_ACC_ON_STACK flag.
>>
>> In your case, you may be sure that the only concurrent operation that 
>> might happen is a setting of the same flag but this looks error prone 
>> with respect to future possible usages of that method (or new 
>> concurrent operations).
>
> OK. I see your point about future-proofing this. Is this currently 
> broken in that sense that two racing threads can both fail to set the 
> on_stack bit?
>
>>
>> I would prefer to be sure the flag has really been set when exiting 
>> from set_on_stack(), as was the case before.
>>
>> More generally, the semantic of the new atomic_try_set_bits is IMHO 
>> dangerous. It returns false both if the bits were already set or if 
>> another flag was changed concurrently (in which case the bits may not 
>> be set). This should at least be clearly documented in the .hpp file.
>
>>
>>
>> An alternative would be for atomic_try_set_bits to return the current 
>> value when the cas failed or was not performed. It would be up to the 
>> caller to decide whether to loop or not (by checking whether the bits 
>> are set in the returned value).
>
> I can change the code to use a slightly altered version of 
> atomic_set_bit that returns 'true' if it was this thread that 
> succeeded in setting the bit.
>
>>
>> -2- There is one part of the change that does not seem to be 
>> described in the RFR or the CR. You added something to keep alive 
>> some metadata in case of klass redefinition (the mark_on_stack stuff 
>> in nmethod.cpp, which for instance now marks the content of the 
>> CompiledICHolder).
>>
>> Did you just change when that marking was performed or is that a 
>> marking that we were not doing previously ? (sorry, cannot review it 
>> all currently and see whether that marking was indeed removed elsewhere)
>
> Before my change, this code was done by the call to:
> CodeCache::alive_nmethods_do(nmethod::mark_on_stack);
>
> if you look at:
>   static void mark_on_stack(nmethod* nm) {
>     nm->metadata_do(Metadata::mark_on_stack);
>   }
>
> and then:
> // Iterate over metadata calling this function.   Used by RedefineClasses
> void nmethod::metadata_do(void f(Metadata*)) {
>   address low_boundary = verified_entry_point();
>   if (is_not_entrant()) {
>     low_boundary += NativeJump::instruction_size;
>     // %%% Note:  On SPARC we patch only a 4-byte trap, not a full 
> NativeJump.
>     // (See comment above.)
>   }
>   {
>     // Visit all immediate references that are embedded in the 
> instruction stream.
>     RelocIterator iter(this, low_boundary);
>     while (iter.next()) {
>       if (iter.type() == relocInfo::metadata_type ) {
>         metadata_Relocation* r = iter.metadata_reloc();
>         // In this lmetadata, we must only follow those metadatas 
> directly embedded in
>         // the code.  Other metadatas (oop_index>0) are seen as part of
>         // the metadata section below.
>         assert(1 == (r->metadata_is_immediate()) +
>                (r->metadata_addr() >= metadata_begin() && 
> r->metadata_addr() < metadata_end()),
>                "metadata must be found in exactly one place");
>         if (r->metadata_is_immediate() && r->metadata_value() != NULL) {
>           Metadata* md = r->metadata_value();
>           if (md != _method) f(md);
>         }
>       } else if (iter.type() == relocInfo::virtual_call_type) {
>         // Check compiledIC holders associated with this nmethod
>         CompiledIC *ic = CompiledIC_at(&iter);
>         if (ic->is_icholder_call()) {
>           CompiledICHolder* cichk = ic->cached_icholder();
>           f(cichk->holder_method());
>           f(cichk->holder_klass());
>         } else {
>           Metadata* ic_oop = ic->cached_metadata();
>           if (ic_oop != NULL) {
>             f(ic_oop);
>           }
>         }
>       }
>     }
>   }
>
>   // Visit the metadata section
>   for (Metadata** p = metadata_begin(); p < metadata_end(); p++) {
>     if (*p == Universe::non_oop_word() || *p == NULL)  continue; // 
> skip non-oops
>     Metadata* md = *p;
>     f(md);
>   }
>
>   // Call function Method*, not embedded in these other places.
>   if (_method != NULL) f(_method);
> }
>
> you see that the above calls to f(...) correspond to the added 
> mark_on_stack sections in nmethod.cpp.
>
> So the code is now duplicated and added to places where they don't 
> really belong, and that breaks some of the current abstraction. This 
> is done just to be able to skip having to do an extra pass of the 
> expensive "for all nmethods { for all relocInfos {} }" code. If we 
> could get the RelocIterator code to become much cheaper, then we could 
> remove all that added code and just use nmethod::metadata_do instead. 
> That would also enable us to clean up some of the code complexity that 
> has been added to the codecache/nmethod cleaning code.
>
> thanks,
> StefanK
>
>>
>> Regards,
>>
>> Bertrand
>>
>> On 02/10/14 12:53, Stefan Karlsson wrote:
>>> Hi all,
>>>
>>> (The following patch changes HotSpot code in areas concerning GC, RT,
>>> and Compiler. So, it would be good to get reviews from all three 
>>> teams.)
>>>
>>> Please, review this patch to optimize and parallelize the CodeCache 
>>> part
>>> of MetadaOnStackMark.
>>>
>>> G1 performance measurements showed longer than expected remark times on
>>> an application using a lot of nmethods and Metadata. The cause for this
>>> performance regression was the call to
>>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack); in
>>> MetadataOnStackMark. This code path is only taken when class
>>> redefinition is used. Class redefinition is typically used in 
>>> monitoring
>>> and diagnostic frameworks.
>>>
>>> With this patch we now:
>>> 1) Filter out duplicate Metadata* entries instead of storing a 
>>> Metadata*
>>> per visited metadata reference.
>>> 2) Collect the set of Metadata* entries in parallel. The code
>>> piggy-backs on the parallel CodeCache cleaning task.
>>>
>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8056240
>>>
>>> Functional testing:
>>> JPRT, Kitchensink, parallel_class_unloading, unit tests
>>>
>>> Performance testing:
>>> CRM Fuse - where the regression was found
>>>
>>> The patch changes HotSpot code in areas concerning GC, RT, Compiler, 
>>> and
>>> Serviceability. It would be good to get some reviews from the other
>>> teams, and not only from the GC team.
>>>
>>> thanks,
>>> StefanK
>>
>>
>