Request for review (m) - 8008508: CMS does not correctly reduce heap size after a Full GC

Thu Mar 7 07:42:23 UTC 2013

Thanks Jon. Please proceed as appropriate wrt the high-level comment
in my review. If and when CMS policy wants to be different,
I suppose this could be suitably revisited.

thanks!
- ramki

On Mon, Mar 4, 2013 at 3:18 PM, Jon Masamitsu <jon.masamitsu at oracle.com> wrote:
> Ramki,
>
> I think that originally CMSGen and TenuredGen shared
> nearly the same policy but didn't share enough code.
>
> The existing CMS policy is
>
> 1) if the incremental collection failed, expand to reserved space.
>
> else
>
> 2) use MinHeapFreeRatio to decide if generation
> should be expanded and by how much
>
> else
>
> 3) don't know how to shrink so don't
>
> I think that the CardGeneration::compute_new_size()
> (formerly the TenuredGeneration::compute_new_size())
> does the same as 2) and that for a full GC  we
> know how to shrink the generation and the
> CardGeneration::compute_new_size() does that
> when CMS does a full GC.
>
> So for the CMS compute_new_size(), keep 1) and
> replace 2) and 3) with a call to CardGeneration::compute_new_size().
>
> I'll go back and verify that the original
> CMS compute_new_size() uses MinHeapFreeRatio
> the same way as CardGeneration::compute_new_size()
> does.  If they are the same, I'll try to refactor it so
> that they actually share more code.
>
> A user that wants more head room in the CMS generation would
> have to specify a MinHeapFreeRatio and that value would
> be used by CardGeneration::compute_new_size() with my
> change just as it is used in the original CMS compute_new_size().
>
> Jon
>
>
>
> On 03/04/13 11:02, Srinivas Ramakrishna wrote:
>>
>> KInd of.... What I am saying is that the sizing policy should be based
>> on what CMS wants, not
>> what CardGeneration might want. Besides how does card generation know
>> what specific policy
>> a concrete subclass would want. The policy should be determined by the
>> "leaf class" (or suitably
>> over-ridden as needed) -- in this case by CMS. I see that you have
>> here decided that the policy
>> provided by CardGeneration is sufficient for when a concurrent mode
>> failure happens. I was not sure
>> if that would necessarily be the case, and perhaps more thought is
>> needed to come up with
>> a good _policy_ specific for CMS, and use the _mechanics_ provided by
>> CardGeneration to
>> affect that change. CMS _should_ distinguish between whether a
>> concurrent mode failure occurred
>> or not to determine  how its heap should be sized, but it should
>> probably be a policy that is
>> cognizant of the historical metrics of CMS, which may or may not be
>> used by or available
>> to CardGeneration, and make decisions specific to CMS, which
>> CardGeneration does not know about.
>>
>> May be I am splitting hairs here, but this is basically where
>> mechanism, if common, might be
>> provided by superclasses but policy by concrete subclasses (or
>> suitably overridden when
>> appropriate -- may be in this case you decided that when there is a
>> compacting collection,
>> the policy provided by the superclass is good enough; I was not sure
>> if that was a deliberate
>> decision -- I have not looked closely at either policy to see if it
>> makes sense. Consider, for example,
>> that CMS usually has larger free space needs than contiguous space
>> collectors, so it makes
>> sense to me that its sizing policies would be different and keep more
>> free space available based
>> on the application's historical behaviour, than would contiguously
>> allocating collectors.)
>>
>> Does my concern make sense?
>>
>> -- ramki
>>
>> On Mon, Mar 4, 2013 at 10:21 AM, Jon Masamitsu<jon.masamitsu at oracle.com>
>> wrote:
>>>
>>> Ramki,
>>>
>>> Thanks for taking a look.  Let's talk about the high level
>>> comment before the details.
>>>
>>> Are you saying that the amount of desired free space in the
>>> cms generation after a collection should be independent of
>>> whether the collection was a concurrent collection or a
>>> full STW collection?
>>>
>>> Jon
>>>
>>>
>>>
>>> On 03/02/13 00:42, Srinivas Ramakrishna wrote:
>>>>
>>>> Hi Jon --
>>>>
>>>> I'll give a couple of code-level comments, but there's a high-level
>>>> comment at the end, which
>>>> might be worth pondering over.
>>>>
>>>> On Tue, Feb 26, 2013 at 3:47 PM, Jon Masamitsu<jon.masamitsu at oracle.com>
>>>> wrote:
>>>>>
>>>>> I've updated the webrev for review comments (thanks, JohnCu)
>>>>>
>>>>> http://cr.openjdk.java.net/~jmasa/8008508/webrev.01/
>>>>>
>>>> concurrent MarkSweepGeneration.cpp:
>>>>
>>>>    949   // compute expansion delta needed for reaching desired free
>>>> percentage
>>>>    950   if (free_percentage<   desired_free_percentage) {
>>>>    951     size_t desired_capacity = (size_t)(used() / ((double) 1 -
>>>> desired_free_percentage));
>>>>    952     assert(desired_capacity>= capacity(), "invalid expansion
>>>> size");
>>>>    953     size_t expand_bytes = MAX2(desired_capacity - capacity(),
>>>> MinHeapDeltaBytes);
>>>>
>>>>    954     if (expand_bytes>   0) {
>>>>
>>>> The if test at 954 (in your changed code) will always be true because
>>>> MinHeapDeltaBytes>   0
>>>> and we are taking the max at line 953.
>>>>
>>>> On reading your shrinking code further below at lines 989-994, it
>>>> struck me that there's a small
>>>> existing bug in the shrink/expand code here. It's probably more of a
>>>> factor in the shrink code though
>>>> because we can live with more free space than we'd ideally want, but
>>>> not less than we ideally want,
>>>> and we certainly do not want to chatter around the desired free
>>>> percentage (hence the delta).
>>>>
>>>> Your shrinking arm looks like this:-
>>>>
>>>>    989   } else {
>>>>    990     size_t desired_capacity = (size_t)(used() / ((double) 1 -
>>>> desired_free_percentage));
>>>>    991     assert(desired_capacity<= capacity(), "invalid expansion
>>>> size");
>>>>    992     size_t shrink_bytes = MAX2(capacity() - desired_capacity,
>>>> MinHeapDeltaBytes);
>>>>    993     shrink_free_list_by(shrink_bytes);
>>>>    994   }
>>>>
>>>> I'd modify it to:-
>>>>
>>>>    989   } else {
>>>>    990     size_t desired_capacity = (size_t)(used() / ((double) 1 -
>>>> desired_free_percentage));
>>>>    991     assert(desired_capacity<= capacity(), "invalid expansion
>>>> size");
>>>>               size_t shrink_bytes = capacity() - desired_capacity;
>>>>               // Don't shrink unless the delta is greater than the
>>>> minimum shrink we want
>>>>    992     if (shrink_bytes>= MinHeapDeltaBytes) {
>>>>    993         shrink_free_list_by(shrink_bytes);
>>>>               }
>>>>    994   }
>>>>
>>>> Note the asymmetry between expansion and shrinking, where we expand by
>>>> the min if we feel we are below the desired free, but we don't shrink
>>>> even if we are above desired free unless the gap exceeds the min
>>>> delta.
>>>>
>>>> This seems to be the correct policy.  (Of course none of that is moot
>>>> until shrink_free_list_by() starts doing shrinking, but probably best
>>>> to fix the policy now, so that it doesn't cause chatter later.
>>>>
>>>> concurrentMarkSweepGeneration.hpp:
>>>>
>>>> 1298   virtual void compute_new_size();
>>>> 1299   void compute_new_size_free_list();
>>>>
>>>> I think it would be good to add a 1-line doc comment for each of the
>>>> above methods.
>>>>
>>>> vmStructs.cpp:
>>>>
>>>> I think you also have to adjust the fields _capacity_at_prologue and
>>>> _used_at_prologue into the superclass. I'd also move the decls up to
>>>> CardGeneration, since that's how the original code is laid out.
>>>>
>>>> Finally, a high level comment. While I see the reason for these
>>>> mechanics, I worry a little bit
>>>> about the somewhat schizophrenic expansion/shrinkage policy, as it
>>>> swings between one policy
>>>> used when we are presumably doing well and keeping up with the
>>>> application's
>>>> allocation/promotion rates and not running into concurrent mode
>>>> failure, but then suddenly switching
>>>> to a completely different expansion/shrinking policy when we do a
>>>> compaction (presumably because
>>>> we had a concurrent mode failure). I'd much rather have divorced the
>>>> two, by using a more uniform
>>>> policy specifically for CMS, perhaps predicated by whether this was
>>>> happening after a
>>>> compaction or not, but keeping it separate from the policy that might
>>>> be used for stop-world
>>>>    kind of collectors (hence TenuredGeneraion).
>>>>
>>>> I'd appreciate it if you could give some thought to this final high
>>>> level comment regarding the
>>>> general policy approach taken here. (Also it is somewhat troubling to
>>>> push policy so high up
>>>> into the class hierarchy, so that every subclass must live with it;
>>>> I'd much rather the policy
>>>> were separated from the mechanics of what would be done when an
>>>> expansion or shrinkage
>>>> was needed.)
>>>>
>>>> -- ramki