Request for review (m) 8008966: NPG: Inefficient Metaspace counter functions cause large young GC regressions

Thu Mar 28 18:36:40 UTC 2013

Mikael,

Thanks for the review.  Replies below.

On 03/28/13 03:45, Mikael Gerdin wrote:
> Jon,
>
> I have some comments and I've replied to your discussion with Coleen 
> inline.
>
> In MetaspaceGC::should_expand it looks like you reverted part of the 
> change I did for JDK-8004241.
> I still believe that checking the value of capacity when deciding 
> weather to expand is wrong because should_expand is used to decide 
> when we should reserve more address space, not commit more memory out 
> of already reserved address space.

I've changed that code to be the same as in the baseline.

> bool MetaspaceGC::should_expand(VirtualSpaceList* vsl, size_t word_size) {
>
>   size_t committed_capacity_bytes = 
> MetaspaceAux::allocated_capacity_bytes();
>   // If the user wants a limit, impose one.
>   size_t max_metaspace_size_bytes = MaxMetaspaceSize;
>   size_t metaspace_size_bytes = MetaspaceSize;
>   if (!FLAG_IS_DEFAULT(MaxMetaspaceSize) &&
>       MetaspaceAux::reserved_in_bytes() >= MaxMetaspaceSize) {
>     return false;
>   }

> In addition to that, checking MaxMetaspaceSize against 
> MetaspaceAux::capacity_byte() seems unintuitive compared MaxHeapSize.
> MaxHeapSize limits the amount of address space we reserve for the Java 
> heap, that of course implies that the amount of memory we commit for 
> the Java heap is also bounded. I think that MaxMetaspaceSize should 
> work in a similar manner. It should limit the amount of address space 
> we reserve and thereby it will enforce a limit of how much memory we 
> commit as well.

This is not a separate issue, right?  Reverting to the above fixes this.
>
> In psParallelCompact, PreGCValues::fill you call 
> MetaspaceAux::capacity_words() and multiply by BytesPerWord instead of 
> calling capacity_bytes()

Fixed.

>
> Instead of adding calls to verify_capacity() after calls to purge(), 
> can't you call verify_capacity() from inside purge()?

I cannot do that for CMS because the purge() is being done
during a concurrent phase and class loading can be happening.
I didn't want to add more locking and I think that if the verification
works for the other GC's then there isn't anything special
about CMS with regard to metadata allocation.

>
> I think that the atomics you added in
> MetaspaceAux::{inc,dec}_capacity and 
> SpaceManager::inc_allocation_metrics are unnecessary since it looks 
> like all the callers hold the expand_lock(). That can probably be 
> asserted with assert_lock_strong calls.

I'll try it.

>
> For the overall used versus capacity in MetaspaceCounters I think it's 
> ok to that we will report the amount of memory in chunks which have 
> are not on the free list as the "used" value, at least for now.

I'm working on fixing this in response to Coleen's comments.

>
>
>
>
> On 2013-03-28 00:13, Jon Masamitsu wrote:
>>
>> On 3/27/2013 2:14 PM, Coleen Phillimore wrote:
>>>
>>> Hi Jon,
>>>
>>> In metaspace.hpp maybe capacity_in_bytes should be called
>>> capacity_bytes_slow() or something like that to distinguish it from
>>> capacity_bytes() which is fast.
>>>
>>> *+   // Total capacity in all Metaspaces*
>>>     static size_t capacity_in_bytes() {
>>> *       return capacity_in_bytes(Metaspace::ClassType) +*
>>> *              capacity_in_bytes(Metaspace::NonClassType);*
>>> *+  #ifdef PRODUCT*
>>> *+      // Use capacity_bytes() in PRODUCT instead of this function.*
>>> *+     guarantee(false, "Should not call capacity_in_bytes() in the
>>> PRODUCT");*
>>> *+ #endif
>>>
>>> *
>>
>> I like the name capacity_bytes_slow() so made the change
>> capacity_in_bytes -> capacity_bytes_slow
>>>
>>> Or maybe since capacity_in_bytes() is used for the other counters,
>>> change the capacity_bytes() name to allocated_capacity() or something
>>> like that.   Just so the two names are more different than the
>>> presence of "_in".
>>
>> I will keep this second suggestion in mind.  I'm going to be making some
>> other name changes
>> so may use allocated_capacity.
>>
>>>
>>> metaspace.cpp:
>>>
>>> line 270 and 283 are missing an 'n' in accounting.   I like the
>>> promise of a cleanup.   Even with the comment, it's hard to keep these
>>> straight.
>>>
>>> 1199-1201 is the same code as above it.
>>
>> Fixed.
>>
>>>
>>> 2520 capacity_in_bytes(mdtype) is still called for PrintGCDetails
>>> which iterates over the CLD graph.   This seems too expensive for GC
>>> printing.   It also calls used_in_bytes() so iterates twice. Then x2
>>> for class vs. data metaspace.   This wasn't part of the GC slowdown
>>> that was observed?
>>
>> Yes this this is part of the slow down.  I don't have a replacement yet
>> for used_in_bytes() yet so thought I
>> would fix this when I replaced used_in_bytes().   I can diff out this 
>> code
>> (the code that does data_space and class_space output separately) or
>> I can fix it.  Should I just fix it now?
>
> How about this:
>
> Move the running count of allocated chunk bytes to the ChunkManager or 
> VirtualSpaceList classes, then allocated chunks  of ClassType and 
> NonClassType are accounted separately and we can simply add them 
> together for the total.

I think I'm fixing the this problem.  Let me know when I
send out the fix.

>
>
>
>>
>>>
>>> 2935.  I don't understand why we are checking UseMallocOnly since we
>>> don't use malloc for metaspaces ever.
>>
>> That was probably a mis-merge when I started with the malloc-chunks
>> patch.  I'll
>> delete it.
>
> No, that's much much older :)
> As I remember it's from before the metachunk allocation scheme was 
> integrated into hotspot-metadata.
> But I agree that it should be removed, now it doesn't make any sense 
> whatsoever.

Done.

>
>>
>>>
>>> I can't comment on the CMS change.  It looks like you just moved it.
>>
>> Yes, I just moved a class up so that I could use it in the "Resizing"
>> phase.
>
> For JDK-8009894 I'm planning on adding some more code to the 
> "Resizing" case. As part of that Stefan wanted me to break out part of 
> that case in to a method in order to not make collect_in_background 
> even bigger.

I was (holding my breathe) looking for your changes when I merged.
I guested that you were holding them back.

> Do you think it's time for a CMSCollector::resize() and moving of some 
> of the code in the case there?

I originally put the code to trace the resizing in there because
I was doing the purge() at that point and wanted to have some
indication when the purge was happening.  When I merged I
accepted the change that had the purge() in  sweep() so adding
the tracing code is less interesting.  I can delete it and wait for
you changes.  In fact I will delete it.

Jon
>
> /Mikael
>
>>
>> Thanks.
>>
>> Jon
>>
>>>
>>> Coleen
>>>
>>>
>>> On 3/27/2013 12:13 AM, Jon Masamitsu wrote:
>>>>
>>>> Replace the use of a method that calculated the total capacity of
>>>> the Metaspaces by iterating over all the Metaspaces by maintaining
>>>> the sum of Metaspace capacities as the Metachunks are
>>>> allocated to each Metaspace.  Also maintain a sum for each
>>>> Metaspace as the Metachunks are allocated to that Metaspace.
>>>>
>>>> Change should_expand() and compute_new_space() to
>>>> calculate quantities in bytes.
>>>>
>>>> Remove calls to methods that calculated totals for
>>>> "used" and "free" by iterating over the Metaspaces
>>>> in product mode.  In some cases substitute the use
>>>> of capacity for used.
>>>>
>>>> http://cr.openjdk.java.net/~jmasa/8008966/webrev.00/
>>>>
>>>> Thanks.
>>>>
>>>> Jon
>>>
>>>
>>