RFR: 8034852: Shrinking of Metaspace high-water-mark causes incorrect OutOfMemoryErrors or back-to-back GCs

Tue May 6 03:31:32 UTC 2014

On 5/5/2014 7:06 AM, Erik Helin wrote:
> Hi Jon,
>
> thanks for having a look at the patch!
>
> On 2014-05-01 19:34, Jon Masamitsu wrote:
>> Erik,
>>
>> Change looks good.
>
> Thanks!
>
> On 2014-05-01 19:34, Jon Masamitsu wrote:
>> Please add a comment explaining why we're using committed now and why
>> capacity did not work.  Basically an abbreviated version of your
>> explanation
>> below.
>
> The old code in compute_new_size was the last code in metaspace making 
> use of "capacity" as a concept. My idea was to remove "capacity" 
> completely from the metaspace code after this change.
>
> Do you think that we still should add a comment discussing a concept
> (capacity) that the code no longer will use? Is there a possibility
> that people that are new to the code will become confused?

Capacity is a concept that is present in the GC code.
I really, really don't want to see capacity come back
into the metaspace code and adding a comment is
the only way I can think of to prevent it.

Jon

>
> Thanks,
> Erik
>
> On 2014-05-01 19:34, Jon Masamitsu wrote:
>> Thanks.
>>
>> Jon
>>
>> On 4/30/2014 4:18 AM, Erik Helin wrote:
>>> Hi all,
>>>
>>> this patch solves a rather tricky problem with the sizing of Metaspace.
>>>
>>> The issue happens when the GC threshold for Metaspace (called
>>> "capacity_until_GC" in the code) becomes less than the committed
>>> memory for Metaspace. Any calls to Metaspace::allocate that requires
>>> committing more memory will then fail in
>>> MetaspaceGC::allowed_expansion, because capacity_until_GC() <
>>> MetaspaceAux::committed_memory(). The effect will be a
>>> full GC and after the GC we try to expand and allocate. After the
>>> expansion and before the allocation, one of two things can happen:
>>>  1. capacity_until_GC is larger than the committed memory after the
>>>     expansion. The allocation will now succeed, but the next allocation
>>>     requiring a new chunk will *again* trigger a full GC. This pattern
>>>     will repeat itself for each new allocation request requiring a new
>>>     chunk.
>>>  2. capacity_until_GC is still less than the committed memory even
>>>     after the expansion. We throw a Java OOME (incorrectly).
>>>
>>> How can the GC threshold for Metaspace be less than the committed
>>> memory? The problem is that MetaspaceGC::compute_new_size uses the 
>>> field
>>> _allocated_capacity for describing the amount of memory in Metaspace
>>> that is "in use". _allocated_capacity does not consider the memory in
>>> the chunk free lists to be "in use", since memory in the chunk free
>>> lists are supposed to be available for new allocations. The problem is
>>> that the chunk free lists can become fragmented, and then the memory
>>> is not available for all kinds of allocations.
>>>
>>> This patch change MetaspaceGC::compute_new_size to use
>>> MetaspaceAux::committed_memory for describing how much memory that is
>>> "in use". The effect will be that memory in the chunk free lists 
>>> will no
>>> longer be considered "in use" (but will of course be used for future
>>> allocations where possible). This will prevent capacity_until_GC from
>>> shrinking below the committed memory "by definiton", since
>>> capacity_until_GC can't be lower than the memory that is "in use".
>>>
>>> Based on the results from the perf testing (see below), this change
>>> has no performance impact.
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~ehelin/8034852/webrev.00/
>>>
>>> Testing:
>>> - JPRT
>>> - Ad-hoc testing:
>>>   - Kitchensink
>>>   - Dacapo
>>>   - Medrec
>>>   - runThese
>>>   - Parallel Class Loading testlist
>>>   - Metaspace testlist
>>>   - GC nightly testlist
>>> - Perf testing:
>>>   - SPECjbb2005
>>>   - SPECjbb2013
>>>   - Derby
>>> - Derby regression tests
>>>
>>> Thanks,
>>> Erik
>>