RFR: 8056240: Investigate increased GC remark time after class unloading changes in CRM Fuse

Fri Oct 3 13:38:18 UTC 2014

Hi Betrand,

Thanks for taking the time to review this.

On 2014-10-03 14:42, Bertrand Delsart wrote:
> Hi Stefan,
>
> I'll do a more in depth review next week but here are a few quick 
> comments to keep us busy in between :-)
>
> -1- The fact that set_on_stack() is now implemented by just a 
> atomic_try_set_bits seems dangerous.
>
> We could end up not setting the JVM_ACC_ON_STACK flag.
>
> In your case, you may be sure that the only concurrent operation that 
> might happen is a setting of the same flag but this looks error prone 
> with respect to future possible usages of that method (or new 
> concurrent operations).

OK. I see your point about future-proofing this. Is this currently 
broken in that sense that two racing threads can both fail to set the 
on_stack bit?

>
> I would prefer to be sure the flag has really been set when exiting 
> from set_on_stack(), as was the case before.
>
> More generally, the semantic of the new atomic_try_set_bits is IMHO 
> dangerous. It returns false both if the bits were already set or if 
> another flag was changed concurrently (in which case the bits may not 
> be set). This should at least be clearly documented in the .hpp file.

>
>
> An alternative would be for atomic_try_set_bits to return the current 
> value when the cas failed or was not performed. It would be up to the 
> caller to decide whether to loop or not (by checking whether the bits 
> are set in the returned value).

I can change the code to use a slightly altered version of 
atomic_set_bit that returns 'true' if it was this thread that succeeded 
in setting the bit.

>
> -2- There is one part of the change that does not seem to be described 
> in the RFR or the CR. You added something to keep alive some metadata 
> in case of klass redefinition (the mark_on_stack stuff in nmethod.cpp, 
> which for instance now marks the content of the CompiledICHolder).
>
> Did you just change when that marking was performed or is that a 
> marking that we were not doing previously ? (sorry, cannot review it 
> all currently and see whether that marking was indeed removed elsewhere)

Before my change, this code was done by the call to:
CodeCache::alive_nmethods_do(nmethod::mark_on_stack);

if you look at:
   static void mark_on_stack(nmethod* nm) {
     nm->metadata_do(Metadata::mark_on_stack);
   }

and then:
// Iterate over metadata calling this function.   Used by RedefineClasses
void nmethod::metadata_do(void f(Metadata*)) {
   address low_boundary = verified_entry_point();
   if (is_not_entrant()) {
     low_boundary += NativeJump::instruction_size;
     // %%% Note:  On SPARC we patch only a 4-byte trap, not a full 
NativeJump.
     // (See comment above.)
   }
   {
     // Visit all immediate references that are embedded in the 
instruction stream.
     RelocIterator iter(this, low_boundary);
     while (iter.next()) {
       if (iter.type() == relocInfo::metadata_type ) {
         metadata_Relocation* r = iter.metadata_reloc();
         // In this lmetadata, we must only follow those metadatas 
directly embedded in
         // the code.  Other metadatas (oop_index>0) are seen as part of
         // the metadata section below.
         assert(1 == (r->metadata_is_immediate()) +
                (r->metadata_addr() >= metadata_begin() && 
r->metadata_addr() < metadata_end()),
                "metadata must be found in exactly one place");
         if (r->metadata_is_immediate() && r->metadata_value() != NULL) {
           Metadata* md = r->metadata_value();
           if (md != _method) f(md);
         }
       } else if (iter.type() == relocInfo::virtual_call_type) {
         // Check compiledIC holders associated with this nmethod
         CompiledIC *ic = CompiledIC_at(&iter);
         if (ic->is_icholder_call()) {
           CompiledICHolder* cichk = ic->cached_icholder();
           f(cichk->holder_method());
           f(cichk->holder_klass());
         } else {
           Metadata* ic_oop = ic->cached_metadata();
           if (ic_oop != NULL) {
             f(ic_oop);
           }
         }
       }
     }
   }

   // Visit the metadata section
   for (Metadata** p = metadata_begin(); p < metadata_end(); p++) {
     if (*p == Universe::non_oop_word() || *p == NULL)  continue;  // 
skip non-oops
     Metadata* md = *p;
     f(md);
   }

   // Call function Method*, not embedded in these other places.
   if (_method != NULL) f(_method);
}

you see that the above calls to f(...) correspond to the added 
mark_on_stack sections in nmethod.cpp.

So the code is now duplicated and added to places where they don't 
really belong, and that breaks some of the current abstraction. This is 
done just to be able to skip having to do an extra pass of the expensive 
"for all nmethods { for all relocInfos {} }" code. If we could get the 
RelocIterator code to become much cheaper, then we could remove all that 
added code and just use nmethod::metadata_do instead. That would also 
enable us to clean up some of the code complexity that has been added to 
the codecache/nmethod cleaning code.

thanks,
StefanK

>
> Regards,
>
> Bertrand
>
> On 02/10/14 12:53, Stefan Karlsson wrote:
>> Hi all,
>>
>> (The following patch changes HotSpot code in areas concerning GC, RT,
>> and Compiler. So, it would be good to get reviews from all three teams.)
>>
>> Please, review this patch to optimize and parallelize the CodeCache part
>> of MetadaOnStackMark.
>>
>> G1 performance measurements showed longer than expected remark times on
>> an application using a lot of nmethods and Metadata. The cause for this
>> performance regression was the call to
>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack); in
>> MetadataOnStackMark. This code path is only taken when class
>> redefinition is used. Class redefinition is typically used in monitoring
>> and diagnostic frameworks.
>>
>> With this patch we now:
>> 1) Filter out duplicate Metadata* entries instead of storing a Metadata*
>> per visited metadata reference.
>> 2) Collect the set of Metadata* entries in parallel. The code
>> piggy-backs on the parallel CodeCache cleaning task.
>>
>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8056240
>>
>> Functional testing:
>> JPRT, Kitchensink, parallel_class_unloading, unit tests
>>
>> Performance testing:
>> CRM Fuse - where the regression was found
>>
>> The patch changes HotSpot code in areas concerning GC, RT, Compiler, and
>> Serviceability. It would be good to get some reviews from the other
>> teams, and not only from the GC team.
>>
>> thanks,
>> StefanK
>
>