RFR: 8056240: Investigate increased GC remark time after class unloading changes in CRM Fuse
Stefan Karlsson
stefan.karlsson at oracle.com
Fri Oct 3 15:13:49 UTC 2014
On 2014-10-03 16:37, Bertrand Delsart wrote:
> On 03/10/14 15:38, Stefan Karlsson wrote:
>> Hi Betrand,
>>
>> Thanks for taking the time to review this.
>>
>> On 2014-10-03 14:42, Bertrand Delsart wrote:
>>> Hi Stefan,
>>>
>>> I'll do a more in depth review next week but here are a few quick
>>> comments to keep us busy in between :-)
>>>
>>> -1- The fact that set_on_stack() is now implemented by just a
>>> atomic_try_set_bits seems dangerous.
>>>
>>> We could end up not setting the JVM_ACC_ON_STACK flag.
>>>
>>> In your case, you may be sure that the only concurrent operation that
>>> might happen is a setting of the same flag but this looks error prone
>>> with respect to future possible usages of that method (or new
>>> concurrent operations).
>>
>> OK. I see your point about future-proofing this. Is this currently
>> broken in that sense that two racing threads can both fail to set the
>> on_stack bit?
>
> cmpxchg is supposed to be robust against spurious CAS failure. Hence,
> the only danger is if indeed one of our threads concurrently modifies
> the flags for another reason while someone is using MetadataOnStackMark
>
> This is not likely but IMHO, the gains of avoiding a loop may not be
> worth the risks.
>
> As stated in my following e-mail, I would also agree if you just add
> an assert(on_stack()) after the atomic_try_set_bits in set_on_stack.
> That would allow us to detect use cases we had not anticipated.
>
>>>
>>> I would prefer to be sure the flag has really been set when exiting
>>> from set_on_stack(), as was the case before.
>>>
>>> More generally, the semantic of the new atomic_try_set_bits is IMHO
>>> dangerous. It returns false both if the bits were already set or if
>>> another flag was changed concurrently (in which case the bits may not
>>> be set). This should at least be clearly documented in the .hpp file.
>>
>>>
>>>
>>> An alternative would be for atomic_try_set_bits to return the current
>>> value when the cas failed or was not performed. It would be up to the
>>> caller to decide whether to loop or not (by checking whether the bits
>>> are set in the returned value).
>>
>> I can change the code to use a slightly altered version of
>> atomic_set_bit that returns 'true' if it was this thread that succeeded
>> in setting the bit.
> >
>>> -2- There is one part of the change that does not seem to be described
>>> in the RFR or the CR. You added something to keep alive some metadata
>>> in case of klass redefinition (the mark_on_stack stuff in nmethod.cpp,
>>> which for instance now marks the content of the CompiledICHolder).
>>>
>>> Did you just change when that marking was performed or is that a
>>> marking that we were not doing previously ? (sorry, cannot review it
>>> all currently and see whether that marking was indeed removed
>>> elsewhere)
>>
>> Before my change, this code was done by the call to:
>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack);
>>
>> if you look at:
>> static void mark_on_stack(nmethod* nm) {
>> nm->metadata_do(Metadata::mark_on_stack);
>> }
>>
>> and then:
>> // Iterate over metadata calling this function. Used by
>> RedefineClasses
>> void nmethod::metadata_do(void f(Metadata*)) {
>> address low_boundary = verified_entry_point();
>> if (is_not_entrant()) {
>> low_boundary += NativeJump::instruction_size;
>> // %%% Note: On SPARC we patch only a 4-byte trap, not a full
>> NativeJump.
>> // (See comment above.)
>> }
>> {
>> // Visit all immediate references that are embedded in the
>> instruction stream.
>> RelocIterator iter(this, low_boundary);
>> while (iter.next()) {
>> if (iter.type() == relocInfo::metadata_type ) {
>> metadata_Relocation* r = iter.metadata_reloc();
>> // In this lmetadata, we must only follow those metadatas
>> directly embedded in
>> // the code. Other metadatas (oop_index>0) are seen as part of
>> // the metadata section below.
>> assert(1 == (r->metadata_is_immediate()) +
>> (r->metadata_addr() >= metadata_begin() &&
>> r->metadata_addr() < metadata_end()),
>> "metadata must be found in exactly one place");
>> if (r->metadata_is_immediate() && r->metadata_value() !=
>> NULL) {
>> Metadata* md = r->metadata_value();
>> if (md != _method) f(md);
>> }
>> } else if (iter.type() == relocInfo::virtual_call_type) {
>> // Check compiledIC holders associated with this nmethod
>> CompiledIC *ic = CompiledIC_at(&iter);
>> if (ic->is_icholder_call()) {
>> CompiledICHolder* cichk = ic->cached_icholder();
>> f(cichk->holder_method());
>> f(cichk->holder_klass());
>> } else {
>> Metadata* ic_oop = ic->cached_metadata();
>> if (ic_oop != NULL) {
>> f(ic_oop);
>> }
>> }
>> }
>> }
>> }
>>
>> // Visit the metadata section
>> for (Metadata** p = metadata_begin(); p < metadata_end(); p++) {
>> if (*p == Universe::non_oop_word() || *p == NULL) continue; //
>> skip non-oops
>> Metadata* md = *p;
>> f(md);
>> }
>>
>> // Call function Method*, not embedded in these other places.
>> if (_method != NULL) f(_method);
>> }
>>
>> you see that the above calls to f(...) correspond to the added
>> mark_on_stack sections in nmethod.cpp.
>>
>> So the code is now duplicated and added to places where they don't
>> really belong, and that breaks some of the current abstraction. This is
>> done just to be able to skip having to do an extra pass of the expensive
>> "for all nmethods { for all relocInfos {} }" code. If we could get the
>> RelocIterator code to become much cheaper, then we could remove all that
>> added code and just use nmethod::metadata_do instead. That would also
>> enable us to clean up some of the code complexity that has been added to
>> the codecache/nmethod cleaning code.
>
> OK. Thanks for pointing me to the original code.
>
> I was surprised to see that we needed to mark both the holder_method()
> and the holder_klass(). There might be ways to mark only the klass but
> this is outside the scope of this RFR, where you should stick to what
> was done before.
When calling mark_on_stack on a Klass we end up in the empty default
implementation:
void Metadata::set_on_stack(const bool value) {
// nothing to set for most metadata
// Can't inline because this materializes the vtable on some C++
compilers.
}
So this is a bit unnecessary ATM. I don't know if this will be used in
the future or not.
>
> Anyway, the gains of removing one mark_on_stack may not be worth the
> risks. I'm not familiar enough with RedefinedKlass to know how the
> reclaimed space can be reused and be sure that the extra marking is
> useless.
>
> Hence, I'm OK with that part... but I'll continue my review next week.
OK.
Thanks,
StefanK
>
> Bertrand.
>
>>
>> thanks,
>> StefanK
>>
>>>
>>> Regards,
>>>
>>> Bertrand
>>>
>>> On 02/10/14 12:53, Stefan Karlsson wrote:
>>>> Hi all,
>>>>
>>>> (The following patch changes HotSpot code in areas concerning GC, RT,
>>>> and Compiler. So, it would be good to get reviews from all three
>>>> teams.)
>>>>
>>>> Please, review this patch to optimize and parallelize the CodeCache
>>>> part
>>>> of MetadaOnStackMark.
>>>>
>>>> G1 performance measurements showed longer than expected remark
>>>> times on
>>>> an application using a lot of nmethods and Metadata. The cause for
>>>> this
>>>> performance regression was the call to
>>>> CodeCache::alive_nmethods_do(nmethod::mark_on_stack); in
>>>> MetadataOnStackMark. This code path is only taken when class
>>>> redefinition is used. Class redefinition is typically used in
>>>> monitoring
>>>> and diagnostic frameworks.
>>>>
>>>> With this patch we now:
>>>> 1) Filter out duplicate Metadata* entries instead of storing a
>>>> Metadata*
>>>> per visited metadata reference.
>>>> 2) Collect the set of Metadata* entries in parallel. The code
>>>> piggy-backs on the parallel CodeCache cleaning task.
>>>>
>>>> http://cr.openjdk.java.net/~stefank/8056240/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8056240
>>>>
>>>> Functional testing:
>>>> JPRT, Kitchensink, parallel_class_unloading, unit tests
>>>>
>>>> Performance testing:
>>>> CRM Fuse - where the regression was found
>>>>
>>>> The patch changes HotSpot code in areas concerning GC, RT,
>>>> Compiler, and
>>>> Serviceability. It would be good to get some reviews from the other
>>>> teams, and not only from the GC team.
>>>>
>>>> thanks,
>>>> StefanK
>>>
>>>
>>
>
>
More information about the hotspot-dev
mailing list