RFR: 8056240: Investigate increased GC remark time after class unloading changes in CRM Fuse

Fri Oct 3 15:13:49 UTC 2014

On 2014-10-03 16:37, Bertrand Delsart wrote:
> On 03/10/14 15:38, Stefan Karlsson wrote:
>> Hi Betrand,
>>
>> Thanks for taking the time to review this.
>>
>> On 2014-10-03 14:42, Bertrand Delsart wrote:
>>> Hi Stefan,
>>>
>>> I'll do a more in depth review next week but here are a few quick
>>> comments to keep us busy in between :-)
>>>
>>> -1- The fact that set_on_stack() is now implemented by just a
>>> atomic_try_set_bits seems dangerous.
>>>
>>> We could end up not setting the JVM_ACC_ON_STACK flag.
>>>
>>> In your case, you may be sure that the only concurrent operation that
>>> might happen is a setting of the same flag but this looks error prone
>>> with respect to future possible usages of that method (or new
>>> concurrent operations).
>>
>> OK. I see your point about future-proofing this. Is this currently
>> broken in that sense that two racing threads can both fail to set the
>> on_stack bit?
>
> cmpxchg is supposed to be robust against spurious CAS failure. Hence, 
> the only danger is if indeed one of our threads concurrently modifies 
> the flags for another reason while someone is using MetadataOnStackMark
>
> This is not likely but IMHO, the gains of avoiding a loop may not be 
> worth the risks.
>
> As stated in my following e-mail, I would also agree if you just add 
> an assert(on_stack()) after the atomic_try_set_bits in set_on_stack. 
> That would allow us to detect use cases we had not anticipated.
>
>>>
>>> I would prefer to be sure the flag has really been set when exiting
>>> from set_on_stack(), as was the case before.
>>>
>>> More generally, the semantic of the new atomic_try_set_bits is IMHO
>>> dangerous. It returns false both if the bits were already set or if
>>> another flag was changed concurrently (in which case the bits may not
>>> be set). This should at least be clearly documented in the .hpp file.
>>
>>>
>>>
>>> An alternative would be for atomic_try_set_bits to return the current
>>> value when the cas failed or was not performed. It would be up to the
>>> caller to decide whether to loop or not (by checking whether the bits
>>> are set in the returned value).
>>
>> I can change the code to use a slightly altered version of
>> atomic_set_bit that returns 'true' if it was this thread that succeeded
>> in setting the bit.
> >
>>> -2- There is one part of the change that does not seem to be described
>>> in the RFR or the CR. You added something to keep alive some metadata
>>> in case of klass redefinition (the mark_on_stack stuff in nmethod.cpp,
>>> which for instance now marks the content of the CompiledICHolder).
>>>
>>> Did you just change when that marking was performed or is that a
>>> marking that we were not doing previously ? (sorry, cannot review it
>>> all currently and see whether that marking was indeed removed 
>>> elsewhere)
>>
>> Before my change, this code was done by the call to:
>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack);
>>
>> if you look at:
>>    static void mark_on_stack(nmethod* nm) {
>>      nm->metadata_do(Metadata::mark_on_stack);
>>    }
>>
>> and then:
>> // Iterate over metadata calling this function.   Used by 
>> RedefineClasses
>> void nmethod::metadata_do(void f(Metadata*)) {
>>    address low_boundary = verified_entry_point();
>>    if (is_not_entrant()) {
>>      low_boundary += NativeJump::instruction_size;
>>      // %%% Note:  On SPARC we patch only a 4-byte trap, not a full
>> NativeJump.
>>      // (See comment above.)
>>    }
>>    {
>>      // Visit all immediate references that are embedded in the
>> instruction stream.
>>      RelocIterator iter(this, low_boundary);
>>      while (iter.next()) {
>>        if (iter.type() == relocInfo::metadata_type ) {
>>          metadata_Relocation* r = iter.metadata_reloc();
>>          // In this lmetadata, we must only follow those metadatas
>> directly embedded in
>>          // the code.  Other metadatas (oop_index>0) are seen as part of
>>          // the metadata section below.
>>          assert(1 == (r->metadata_is_immediate()) +
>>                 (r->metadata_addr() >= metadata_begin() &&
>> r->metadata_addr() < metadata_end()),
>>                 "metadata must be found in exactly one place");
>>          if (r->metadata_is_immediate() && r->metadata_value() != 
>> NULL) {
>>            Metadata* md = r->metadata_value();
>>            if (md != _method) f(md);
>>          }
>>        } else if (iter.type() == relocInfo::virtual_call_type) {
>>          // Check compiledIC holders associated with this nmethod
>>          CompiledIC *ic = CompiledIC_at(&iter);
>>          if (ic->is_icholder_call()) {
>>            CompiledICHolder* cichk = ic->cached_icholder();
>>            f(cichk->holder_method());
>>            f(cichk->holder_klass());
>>          } else {
>>            Metadata* ic_oop = ic->cached_metadata();
>>            if (ic_oop != NULL) {
>>              f(ic_oop);
>>            }
>>          }
>>        }
>>      }
>>    }
>>
>>    // Visit the metadata section
>>    for (Metadata** p = metadata_begin(); p < metadata_end(); p++) {
>>      if (*p == Universe::non_oop_word() || *p == NULL) continue;  //
>> skip non-oops
>>      Metadata* md = *p;
>>      f(md);
>>    }
>>
>>    // Call function Method*, not embedded in these other places.
>>    if (_method != NULL) f(_method);
>> }
>>
>> you see that the above calls to f(...) correspond to the added
>> mark_on_stack sections in nmethod.cpp.
>>
>> So the code is now duplicated and added to places where they don't
>> really belong, and that breaks some of the current abstraction. This is
>> done just to be able to skip having to do an extra pass of the expensive
>> "for all nmethods { for all relocInfos {} }" code. If we could get the
>> RelocIterator code to become much cheaper, then we could remove all that
>> added code and just use nmethod::metadata_do instead. That would also
>> enable us to clean up some of the code complexity that has been added to
>> the codecache/nmethod cleaning code.
>
> OK. Thanks for pointing me to the original code.
>
> I was surprised to see that we needed to mark both the holder_method() 
> and the holder_klass(). There might be ways to mark only the klass but 
> this is outside the scope of this RFR, where you should stick to what 
> was done before.

When calling mark_on_stack on a Klass we end up in the empty default 
implementation:
void Metadata::set_on_stack(const bool value) {
   // nothing to set for most metadata
   // Can't inline because this materializes the vtable on some C++ 
compilers.
}

So this is a bit unnecessary ATM. I don't know if this will be used in 
the future or not.

>
> Anyway, the gains of removing one mark_on_stack may not be worth the 
> risks. I'm not familiar enough with RedefinedKlass to know how the 
> reclaimed space can be reused and be sure that the extra marking is 
> useless.
>
> Hence, I'm OK with that part... but I'll continue my review next week.

OK.

Thanks,
StefanK

>
> Bertrand.
>
>>
>> thanks,
>> StefanK
>>
>>>
>>> Regards,
>>>
>>> Bertrand
>>>
>>> On 02/10/14 12:53, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> (The following patch changes HotSpot code in areas concerning GC, RT,
>>>> and Compiler. So, it would be good to get reviews from all three 
>>>> teams.)
>>>>
>>>> Please, review this patch to optimize and parallelize the CodeCache 
>>>> part
>>>> of MetadaOnStackMark.
>>>>
>>>> G1 performance measurements showed longer than expected remark 
>>>> times on
>>>> an application using a lot of nmethods and Metadata. The cause for 
>>>> this
>>>> performance regression was the call to
>>>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack); in
>>>> MetadataOnStackMark. This code path is only taken when class
>>>> redefinition is used. Class redefinition is typically used in 
>>>> monitoring
>>>> and diagnostic frameworks.
>>>>
>>>> With this patch we now:
>>>> 1) Filter out duplicate Metadata* entries instead of storing a 
>>>> Metadata*
>>>> per visited metadata reference.
>>>> 2) Collect the set of Metadata* entries in parallel. The code
>>>> piggy-backs on the parallel CodeCache cleaning task.
>>>>
>>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8056240
>>>>
>>>> Functional testing:
>>>> JPRT, Kitchensink, parallel_class_unloading, unit tests
>>>>
>>>> Performance testing:
>>>> CRM Fuse - where the regression was found
>>>>
>>>> The patch changes HotSpot code in areas concerning GC, RT, 
>>>> Compiler, and
>>>> Serviceability. It would be good to get some reviews from the other
>>>> teams, and not only from the GC team.
>>>>
>>>> thanks,
>>>> StefanK
>>>
>>>
>>
>
>