G1 performance

Thu May 9 05:46:58 UTC 2013

Hi Ramki,

I've noticed quite a bit of problems with G1 performance but I've not been able to quantify them in a way that make me able report what is actually happening. In one instance I calculated a GC frequency that required a certain size of Eden to achieve in order to see the required pause time yet not place so much pressure on context switching. The application had pretty close to a constant rate of memory churn but it would occasionally and necessarily experience spikes at which time we were looking for Eden to be able to expand to cope. So, the strategy was start with 100MB Eden and then let it adapt to 20MB. It always stopped at 40M and consequently we were never able to hit pause time goals. From this app I wrote a bench that I've been periodically running to see if I can understand why it's not adapting. So far I've simply not been able to get enough runs in to see what is causing the overhead.

As for the application, we went with CMS because I was able to manipulate the configuration to have it meet the pause time goal but the fear is CMF and/or OOME due to the occasional spikes in load. We tried the parallel collector but the problem with that is the adaptive sizing policy does not take into account premature promotion rates and thus it always leaves survivors undersized leading to too frequent full GCs. Last year I mentioned that I would be interested in looking at an adaptive size policy rewrite that corrected this problem and it was indicated that it had already been done. Unfortunately it hasn't shown up AFAICT. So, I'll reiterate the offer to fix adaptive sizing in light that it is needed but quite often needs to be turned off due to the premature promotion/too frequent full GC problem.

So, back to the app the conclusion I had to come to  is that there isn't a suitable collector for this particular application in OpenJDK today. CMS offered the fewest problems but each is a very worrisome given the environment in which the application has been deployed. I'm happy to report in things from my bench when I finally get them sorted.

Regards,
Kirk

On 2013-05-08, at 10:31 PM, Srinivas Ramakrishna <ysr1729 at gmail.com> wrote:

> I have been playing with G1 a bit, and have noticed a nearly 10%
> overhead with G1, which is independent of the measured GC overheads.
> It's possible that this is because of the G1 write barriers inhibiting
> certain JIT optimizations. I have also seen, although this needs to be
> established more thoroughly, that G1's performance degrades the longer
> it runs, in the sense that minor GC pause times become progressively
> worse.
> 
> Has this kind of performance behaviour been observed by others on this
> list? Or internally at Oracle in performance testing of G1? Or by
> other power users of G1 out there?
> 
> Basically, all of the experiments I have done seem to indicate that
> CMS performs better than G1, but unfortunately the potential
> fragmentation problem with CMS (followed by the promotion failure
> handling and the single-threaded compaction to follow) make it
> unsuitable in certain situations.
> 
> thanks!
> -- ramki