Very long young gc pause (ParNew with CMS)

Tue Jan 10 05:43:36 UTC 2012

Apparently pgpgin/pgpgout may not be that accurate to determine swap file
usage:
http://help.lockergnome.com/linux/pgpgin-pgpgout-measure--ftopict506279.html

May need to use vmstat and look at si/so instead.
On Jan 10, 2012 12:24 AM, "Chi Ho Kwok" <chkwok at digibites.nl> wrote:

> Hi Florian,
>
> Uh, you might want to try sar -r as well, that reports memory usage (and
> man sar for other reporting options, and -f /var/log/sysstat/saXX where xx
> is the day for older data is useful as well). Page in / out means reading
> or writing to the swap file, usual cause here is one or more huge
> background task / cron jobs taking up too much memory forcing other things
> to swap out. You can try reducing the size of the heap and see if it helps
> if you're just a little bit short, but otherwise I don't think you can
> solve this with just VM options.
>
>
> Here's the relevant section from the manual:
>
>        -B     Report paging statistics. Some of the metrics below are
>> available only with post 2.5 kernels. The following values are displayed:
>>               pgpgin/s
>>                      Total  number of kilobytes the system paged in from
>> disk per second.  Note: With old kernels (2.2.x) this value is a number of
>> blocks
>>                      per second (and not kilobytes).
>>               pgpgout/s
>>                      Total number of kilobytes the system paged out to
>> disk per second.  Note: With old kernels (2.2.x) this value is a number  of
>>  blocks
>>                      per second (and not kilobytes).
>>               fault/s
>>                      Number  of  page faults (major + minor) made by the
>> system per second.  This is not a count of page faults that generate I/O,
>> because
>>                      some page faults can be resolved without I/O.
>>               majflt/s
>>                      Number of major faults the system has made per
>> second, those which have required loading a memory page from disk.
>
>
> I'm not sure what kernel you're on, but pgpgin / out being high is a bad
> thing. Sar seems to report that all faults are minor, but that conflicts
> with the first two columns.
>
>
> Chi Ho Kwok
>
> On Mon, Jan 9, 2012 at 8:47 PM, Florian Binder <java at java4.info> wrote:
>
>>  Yes!
>> You are right!
>> I have a lot of page faults when gc is taking so much time.
>>
>> For example (sar -B):
>> 00:00:01     pgpgin/s pgpgout/s   fault/s  majflt/s
>> 00:50:01         0,01     45,18    162,29      0,00
>> 01:00:01         0,02     46,58    170,45      0,00
>> 01:10:02     25313,71  27030,39  27464,37      0,02
>> 01:20:02     23456,85  25371,28  13621,92      0,01
>> 01:30:01     22778,76  22918,60  10136,71      0,03
>> 01:40:11     19020,44  22723,65   8617,42      0,15
>> 01:50:01         5,52     44,22    147,26      0,05
>>
>> What is this meaning and how can I avoid it?
>>
>>
>> Flo
>>
>>
>>
>> Am 09.01.2012 20:33, schrieb Chi Ho Kwok:
>>
>> Just making sure the obvious case is covered: is it just me or is 6s real
>> > 3.5s user+sys with 13 threads just plain weird? That means there was 0.5
>> thread actually running on the average during that collection.
>>
>>  Do a sar -B (requires package sysstat) and see if there were any major
>> pagefaults (or indirectly via cacti and other monitoring tools via memory
>> usage, load average etc, or even via cat /proc/vmstat and pgmajfault), I've
>> seen those cause these kind of times during GC.
>>
>>
>>  Chi Ho Kwok
>>
>>  On Mon, Jan 9, 2012 at 12:08 PM, Florian Binder <java at java4.info> wrote:
>>
>>> Hi everybody,
>>>
>>> I am using CMS (with ParNew) gc and have very long (> 6 seconds) young
>>> gc pauses.
>>> As you can see in the log below the old-gen-heap consists of one large
>>> block, the new Size has 256m, it uses 13 worker threads and it has to
>>> copy 27505761 words (~210mb) directly from eden to old gen.
>>> I have seen that this problem occurs only after about one week of
>>> uptime. Even thought we make a full (compacting) gc every night.
>>> Since real-time > user-time I assume it might be a synchronization
>>> problem. Can this be true?
>>>
>>> Do you have any Ideas how I can speed up this gcs?
>>>
>>> Please let me know, if you need more informations.
>>>
>>> Thank you,
>>> Flo
>>>
>>>
>>> ##### java -version #####
>>> java version "1.6.0_29"
>>> Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
>>> Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)
>>>
>>> ##### The startup parameters: #####
>>> -Xms28G -Xmx28G
>>> -XX:+UseConcMarkSweepGC \
>>> -XX:CMSMaxAbortablePrecleanTime=10000 \
>>> -XX:SurvivorRatio=8 \
>>> -XX:TargetSurvivorRatio=90 \
>>> -XX:MaxTenuringThreshold=31 \
>>> -XX:CMSInitiatingOccupancyFraction=80 \
>>> -XX:NewSize=256M \
>>>
>>> -verbose:gc \
>>> -XX:+PrintFlagsFinal \
>>> -XX:PrintFLSStatistics=1 \
>>> -XX:+PrintGCDetails \
>>> -XX:+PrintGCDateStamps \
>>> -XX:-TraceClassUnloading \
>>> -XX:+PrintGCApplicationConcurrentTime \
>>> -XX:+PrintGCApplicationStoppedTime \
>>> -XX:+PrintTenuringDistribution \
>>> -XX:+CMSClassUnloadingEnabled \
>>> -Dsun.rmi.dgc.server.gcInterval=9223372036854775807 \
>>> -Dsun.rmi.dgc.client.gcInterval=9223372036854775807 \
>>>
>>> -Djava.awt.headless=true
>>>
>>> ##### From the out-file (as of +PrintFlagsFinal): #####
>>> ParallelGCThreads                         = 13
>>>
>>> ##### The gc.log-excerpt: #####
>>> Application time: 20,0617700 seconds
>>> 2011-12-22T12:02:03.289+0100: [GC Before GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1183290963
>>> Max   Chunk Size: 1183290963
>>> Number of Blocks: 1
>>> Av.  Block  Size: 1183290963
>>> Tree      Height: 1
>>> Before GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 0
>>> Max   Chunk Size: 0
>>> Number of Blocks: 0
>>> Tree      Height: 0
>>> [ParNew
>>> Desired survivor size 25480392 bytes, new threshold 1 (max 31)
>>> - age   1:   28260160 bytes,   28260160 total
>>> : 249216K->27648K(249216K), 6,1808130 secs]
>>> 20061765K->20056210K(29332480K)After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 1155785202
>>> Max   Chunk Size: 1155785202
>>> Number of Blocks: 1
>>> Av.  Block  Size: 1155785202
>>> Tree      Height: 1
>>> After GC:
>>> Statistics for BinaryTreeDictionary:
>>> ------------------------------------
>>> Total Free Space: 0
>>> Max   Chunk Size: 0
>>> Number of Blocks: 0
>>> Tree      Height: 0
>>> , 6,1809440 secs] [Times: user=3,08 sys=0,51, real=6,18 secs]
>>> Total time for which application threads were stopped: 6,1818730 seconds
>>> _______________________________________________
>>> hotspot-gc-use mailing list
>>> hotspot-gc-use at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>
>>
>>
>>
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20120110/7c061fcb/attachment.htm>
-------------- next part --------------
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use