YGC time increasing suddenly

Fri Dec 20 01:26:37 PST 2013

Hi Luciano --

Look at the rate of promotion. It sounds as if the rate of promotion and of
survival suddenly jumps up where you start
seeing longer pauses. If you believe the promotion and survivor volume
should not increase like that, one possibility
is "nepotism", where objects that may have been prematurely promoted and
then died, cause the objects they are
referring to, to be kept alive. Try to do the following experiment.

(1) In the one case, let the system run as it has always run. Notice the
increase in young gc time. Let the system
     continue to run until a full gc occurs, and see if the young gc time
drops

(2) In a second case, let the system run util you see the increased young
gc times. At this time induce an explicit full gc
     (via jconsole) and see if the young gc times fall.

(3) Repeat (2), but with -XX:+ExplicitGCInvokesConcurrent

Finally, one question, do you see the increased in young gc times after an
explicit full gc? Or are you saying that the
minor gc at 1992 seconds that you show below is the last "quick" minor GC,
and the next one onwards is slow?

If you share a complete GC log, we might be able to help. The age 15 cohort
in the last quick minor GC probably holds the
clue and once promoted, allows nepotism to set in. If that is the case,
then -XX:+NeverPromote (if
the flag still exists) might be worth experimenting with (although it
carries its own dangers, and isn't really a fix, just
a means to figuring out If this might be the issue).

Basically, I think something like a member of a linked list is getting into
the old generation before it does
and then proceeds to keep its successor alive, and so on. A full gc may or
may not fix that kind of issue.
You could try to clear the links of list members when you pull them out of
the list and that might reduce the
occurrence of such phenomena. I vaguely recall this behaviour with
LinkedBlockingQueue's because of which
Doug Lea was forced to hold his nose and clear the references when removing
members from the list to
prevent this kind of false retention. (Ah, I found a reference:
http://thread.gmane.org/gmane.comp.java.jsr.166-concurrency/5758
and https://bugs.openjdk.java.net/browse/JDK-6805775 )

-- ramki

On Wed, Dec 18, 2013 at 10:58 AM, Luciano Molinari <lucmolinari at gmail.com>wrote:

> Hi everybody,
>
> We have a standalone Java app that receives requests through RMI and
> almost all the objects created by it are short (< ~100ms) lived objects.
> This app is running on a 24 cores server with 16 GB RAM (Red Hat Linux).
> During our performance tests (10k requests/second) we started to face a
> problem where the throughput decreases suddenly just a few minutes
> after the app was started.
> So, I started to investigate GC behaviour and to make some adjustments
> (increase memory, use CMS...) and now we are able to run our app
> properly for about 35 minutes. At this point the time spent during young
> collections grows sharply although no Full GC is executed (old gen is only
> ~4% full).
>
> I've done tests with many different parameters, but currently I'm using
> the following ones:
> java -server -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps
> -XX:PrintFLSStatistics=1 -XX:SurvivorRatio=4
> -XX:ParallelGCThreads=8 -XX:PermSize=256m -XX:+UseParNewGC
> -XX:MaxPermSize=256m -Xms7g -Xmx7g -XX:NewSize=4608m -XX:MaxNewSize=4608m
> -XX:MaxTenuringThreshold=15 -Dsun.rmi.dgc.client.gcInterval=3600000
> -Dsun.rmi.dgc.server.gcInterval=3600000
>  -Djava.rmi.server.hostname=IP_ADDRESS
>
> If I use this same configuration (without CMS) the same problem occurs
> after 20minutes, so it doesn't seem to be related to CMS. Actually, as I
> mentioned above, CMS (Full GC) isn't executed during the tests.
>
> Some logs I've collected:
>
> 1992.748: [ParNew
> Desired survivor size 402653184 bytes, new threshold 15 (max 15)
> - age   1:    9308728 bytes,    9308728 total
> - age   2:       3448 bytes,    9312176 total
> - age   3:       1080 bytes,    9313256 total
> - age   4:         32 bytes,    9313288 total
> - age   5:      34768 bytes,    9348056 total
> - age   6:         32 bytes,    9348088 total
> - age  15:       2712 bytes,    9350800 total
> : 3154710K->10313K(3932160K), 0.0273150 secs] 3215786K->71392K(6553600K)
>
> //14 YGC happened during this window
>
> 2021.165: [ParNew
> Desired survivor size 402653184 bytes, new threshold 15 (max 15)
> - age   1:    9459544 bytes,    9459544 total
> - age   2:    3648200 bytes,   13107744 total
> - age   3:    3837976 bytes,   16945720 total
> - age   4:    3472448 bytes,   20418168 total
> - age   5:    3586896 bytes,   24005064 total
> - age   6:    3475560 bytes,   27480624 total
> - age   7:    3520952 bytes,   31001576 total
> - age   8:    3612088 bytes,   34613664 total
> - age   9:    3355160 bytes,   37968824 total
> - age  10:    3823032 bytes,   41791856 total
> - age  11:    3304576 bytes,   45096432 total
> - age  12:    3671288 bytes,   48767720 total
> - age  13:    3558696 bytes,   52326416 total
> - age  14:    3805744 bytes,   56132160 total
> - age  15:    3429672 bytes,   59561832 total
> : 3230658K->77508K(3932160K), 0.1143860 secs] 3291757K->142447K(6553600K)
>
> Besides the longer time to perform collection, I also realized that all 15
> ages started to have larger values.
>
> I must say I'm a little confused about this scenario. Does anyone have
> some tip?
>
> Thanks in advance,
> --
> Luciano
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131220/90b6f3c2/attachment.html