CMS cycles triggered by Perm almost being full

Wed Oct 2 17:11:32 UTC 2013

Hi Kirk -- Inline below ...

On Wed, Oct 2, 2013 at 8:14 AM, Kirk Pepperdine <kirk at kodewerk.com> wrote:
> Hi Ramki, (and Thomas)
>
> The IOF is set to 80% and tenured is not close to that value. Here is a
> typical record.
>
> 33.910: [GC [1 CMS-initial-mark: 187905K(360448K)] 218048K(507904K),
> 0.0121320 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
>
> So 19 of 30 is below the 80% mark.

That's around 50%.  This might be the very first CMS cycle. That cycle
is initiated
"early" at a default setting of 50% i believe, to bootstrap the
statistics for the "automatic
trigger levelling" that CMS otherwise does.

Also, as Mikael stated, if UseCMSInitiatingOccupancyOnly is not set,
the IOF becomes a high water
mark for triggering collections. If the autonomics decides it must
trigger a cycle earlier, it might do
so. I'd suggest looking at the verbose option that tells you why CMS
is initiating a cycle. Perhaps
as Mikael noted, the rate of promotion is deemed high enough that a
cycle needs to be started
sooner than you might otherwise expect.

>
>  (concurrent mode interrupted): 195585K->188751K(360448K), 1.1356885 secs]
> 321137K->188751K(507904K), [CMS Perm : 50152K->49867K(50240K)], 1.1360110
> secs] [Times: user=1.02 sys=0.01, real=1.14 secs]
>
> Shows that Perm is quite full. This CMF was a result of a System.gc() during
> the abortable-preclean.

Was the perm gen expanded ? Could you include a more complete gc log
that shows the
issue?

>
> PrintFlagsFinal shows
>
>     uintx MaxPermHeapExpansion          = 5439488         {product}
>     uintx MaxPermSize                               = 85983232        {pd
> product}
>
> To Mikael's point, I looked at that bit of code this morning and it seemed
> that everything would work if CMSClassUnloadingEnabled was set. But we seem
> to degenerate into this endless cycle of CMS collections that can't solve
> the problem with CMSClassUnloadingEnabled is not set. Of course the answer
> is to set CMSClassUnloadingEnabled however... without it being set the
> behaviour is confusing for people to diagnose and it seems as if the JVM
> should call for a full gc with the reason being perm space needs to be
> resized but CMS can't do that for what ever reason.

I  don't understand your point about "perm space needs to be resized".
That assumes
that perm occupancy is feeding into the decision to trigger a CMS
cycle but, as Mikael
pointed out, it isn't.

>
> On the interesting distraction, I believe the 7.390 second cycle time is due
> to the constant unit of work that CMS has facing it since the JVM/app is
> idle. That 7.390 seconds includes ~2 seconds between reset and initial-mark.

That seems reasonable. I think as Mikael pointed out the data on promotions
stops being updated once the app goes idle, so a sudden increase in
the promotion
rates just before the app went idle locks the decision into place for as long
as the stats aren't updated. My guess is that if a young gc were to occur, the
stats would be updated and CMS would snap out of it.

So the real fix would be to have the stats decay even when no minor
gc's are occuring.
In fact, IIRC, there's a bug filed on this on CMS many years ago. It
took me a while to
remember.

So Mikael is spot on when he suspected "I'm not sure how well that
code scales back if an application suddenly goes completely idle." I
don't think it does. There is no time-decay built into the ergonomics,
if no new data is collected, and if
there is no young gc we stay locked into an earlier decision. The fix
would be to decay that value with time (perhaps at each new CMS cycle
by looking at when the last sample was collected for promotion rate,
or something like that).

-- ramki

>
> Thanks
> Kirk
>
> On 2013-10-02, at 10:51 AM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:
>
> Kirk,
>
> Did you confirm that this isn't just a case of back to back CMS collections
> (hence the periodicity?)
> because of the old gen being "too full" ? Recall that the expansion of the
> old gen is not well coordinated
> with the initiating occupancy, so we may be in a regime where we get back to
> back collections
> because of exceeding the initiating occupancy, but the heap isn't full
> enough to need to grow per
> the MinHeapFreeRatio figure.
>
> If perm gen collection is disabled, perhaps we stay at this high occupancy.
> However, when you do an
> explicit full gc that collects the perm gen, it also causes some old gen
> objects that had been kept alive from
> now dead but previously "zombie" perm gen references to be collected, and
> the occupancy of the old gen
> drops down.
>
> If that were so, of course you would have seen the change (drop) in the
> occupancy of the old generation (with the
> change in occupancy of the perm gen being just a red herring).
>
> -- ramki
>
>
>
> On Wed, Oct 2, 2013 at 1:41 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>
> wrote:
>>
>> Thomas, my recollection is from a while back and I haven't looked at the
>> code recently, but
>> CMS does increase perm size as the heap fills up. However, as you may be
>> implying, the heap size increase is not tied to the CMS perm trigger
>> setting.
>> Thus, it's possible that the occupancy of the perm gen is such that it
>> exceeds
>> the perm trigger ratio, but is not sufficiently large that the perm gen
>> will be increased
>> in size.
>>
>> As to Kirk's original question, well, it seems strange that the Perm
>> trigger ratio is being used
>> as a CMS trigger when CMS class unloading is disabled. I thought that the
>> CMS initiating
>> trigger was tied to the occupancy of the perm only when class unloading
>> was enabled,
>> and was otherwise only affected by the occupancy of the old generation.
>> You can enable
>> a flag that makes CMS more talkative about why it's starting a CMS
>> collection and that
>> might provide a clue. Perhaps there's a bug in the trigger.
>>
>> -- ramki
>>
>>
>> On Wed, Oct 2, 2013 at 12:12 AM, Thomas Viessmann
>> <thomas.viessmann at oracle.com> wrote:
>>>
>>> Hi Kirk,
>>>
>>> Disclaimer: I'm in support and all I can share is my experience:
>>>
>>> You always need a full GC if heap resizing is needed.
>>> CMS  AFAIHS cannot initiate an increase of perm.
>>> Sooner or later it will bail out.
>>> I always start with the heap sizing first. Making sure that
>>> all areas have sufficient capacity and a fixed size in order
>>> to avoid constant CMS runs or even bail outs.
>>>
>>> Thanks and Regards
>>>
>>> Thomas
>>>
>>>
>>> On 10/02/13 04:38, Kirk Pepperdine wrote:
>>>
>>> Hi,
>>>
>>> I've just witnessed in 1.7.0_17-b02 (Solaris AMD) CMS cycles being
>>> initiated every 7.390 seconds. The system is idle and there are no
>>> foreground (ParNew) collections running. Perm occupancy is quite close to
>>> it's configured size so it's quite likely that the cause of the CMS cycle is
>>> this. However, Class unloading is not enabled and thus the CMS cycle doesn't
>>> "fix" the trigger by cleaning out perm or being able to enlarge it
>>> (configured size < max size) and there isn't any pressure for a Full
>>> collection (CMF??). Triggering a collection (System.gc()) of course "fixes"
>>> the problem (facilitates a perm space expansion).
>>>
>>> Ok, so there are work arounds for this but it really confused the person
>>> who contacted me with the problem and he's no slouch when it comes to GC.
>>> I've advised him to turn on perm space sweeping with class unloading. That
>>> said, it seems that CMS should know that it's not going to be able to fix
>>> the problem that triggered to to run and it should degrade into a CMF,
>>> reason perm space needs resizing. My questions are, have I missed something?
>>> Should this be filed as a bug? Or, is this as intended?
>>>
>>> On a side note I found the 7.390 second period to be an interesting
>>> distraction.
>>>
>>> Regards,
>>> Kirk
>>>
>>>
>>> --
>>> <oracle_sig_logo.gif>
>>>
>>> THOMAS VIESSMANN | Senior Principal Technical Support Engineer - Java
>>> Phone: +498914302496 | Mobile: +491743005467
>>> Oracle Customer Technical Support - Java
>>>
>>> ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen
>>>
>>> ORACLE Deutschland B.V. & Co. KG
>>> Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
>>> Registergericht: Amtsgericht Muenchen, HRA 95603
>>> Geschäftsführere: Juergen Kunz
>>>
>>> Komplementärin: ORACLE Deutschland Verwaltung B.V.
>>> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
>>> Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
>>> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
>>>
>>> ________________________________
>>> ________________________________
>>> <green-for-email-sig_0.gif> Oracle is committed to developing practices
>>> and products that help protect the environment
>>
>>
>
>