CMS cycles triggered by Perm almost being full

Wed Oct 2 19:44:14 UTC 2013

Hi,

comments inlined
>>> 
>>> 
>>> Also, as Mikael stated, if UseCMSInitiatingOccupancyOnly is not set,
>>> the IOF becomes a high water
>>> mark for triggering collections. If the autonomics decides it must
>>> trigger a cycle earlier, it might do
>>> so. I'd suggest looking at the verbose option that tells you why CMS
>>> is initiating a cycle. Perhaps
>>> as Mikael noted, the rate of promotion is deemed high enough that a
>>> cycle needs to be started
>>> sooner than you might otherwise expect.
>> 
>> Right, there is no promotion happening because the app is 100% idle. There are no ParNew's happening, only back to back CMS cycles that last for about 5 seconds and are triggered after a 2 second pause. Hence the 7 second cycle time.

Gak, silly me, I completely forgot about CMSWaitDuration and I'd bet you'd never guess that the default value is 2000.

> 

> Ah, sorry, I wouldn't be familiar with the perm(metadatapsace)-related
> triggers in 8.
> There may have been something in the Q&A following Coleen's perm
> removal talk at J1
> perhaps. I haven't looked at the code,so  I'll let Jon/Mikael etc.
> speak to that.

Code looks very familiar. The difference is instead of permspace, I see metaspace.. so a 3 character change and cool, we have a whole new technology to sort through ;-)

- Kirk

> 
>> 
>>> 
>>>> 
>>>> On the interesting distraction, I believe the 7.390 second cycle time is due
>>>> to the constant unit of work that CMS has facing it since the JVM/app is
>>>> idle. That 7.390 seconds includes ~2 seconds between reset and initial-mark.
>>> 
>>> That seems reasonable. I think as Mikael pointed out the data on promotions
>>> stops being updated once the app goes idle, so a sudden increase in
>>> the promotion
>>> rates just before the app went idle locks the decision into place for as long
>>> as the stats aren't updated. My guess is that if a young gc were to occur, the
>>> stats would be updated and CMS would snap out of it.
>> 
>> Makes sense but how does this fit with what is happening. At the end of the cycle the collector decides when it needs to run again? And if it's wrong (ie promotion rates are higher than anticipated) the cycle can be triggered sooner?
>> 
> 
> Yes. The decision is made either at the end of a young gc or, if a
> young gc doesn't happen soon enough, then the
> CMS thread checks after a certain waiting period (CMSWaitDuration?)
> whether it should run based on the triggering criteria.
> Of course the input data to those criteria is stuck in the past
> because no new samples on
> promotion rate have been collected in the absence of minor gc's. So we
> keep relying on
> obsolete data and keep running periodically.
> 
>>> 
>>> So the real fix would be to have the stats decay even when no minor
>>> gc's are occuring.
>>> In fact, IIRC, there's a bug filed on this on CMS many years ago. It
>>> took me a while to
>>> remember.
>>> 
>>> So Mikael is spot on when he suspected "I'm not sure how well that
>>> code scales back if an application suddenly goes completely idle." I
>>> don't think it does. There is no time-decay built into the ergonomics,
>>> if no new data is collected, and if
>>> there is no young gc we stay locked into an earlier decision. The fix
>>> would be to decay that value with time (perhaps at each new CMS cycle
>>> by looking at when the last sample was collected for promotion rate,
>>> or something like that).
>> 
>> Again, makes sense and seems reasonable....
>> 
>> -- Kirk
>> 
> 
> -- ramki