How to read the output of +PrintTenuringDistribution

Fri Oct 25 02:43:27 PDT 2013

Thanks Tao and Srinivas for the responses.

Yes, it's from two consecutive GCs.

If this is a bug in jvm itself, here are some more details if that helps.

The log is from Apache Cassandra, a distributed database, running on an
8-core machine. The jvm options are shown as below:

================
root     10267 31.7 58.9 139587620 9683720 ?   SLl  Oct17 3562:09
/usr/lib/jvm/java-7-oracle/bin/java -ea
-javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1000M
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -verbose:gc *-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC* -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=4
-XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
-XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
-Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=cassandra-1.lan
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true -cp
bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.2.10.jar:bin/../lib/apache-cassandra-clientutil-1.2.10.jar:bin/../lib/apache-cassandra-thrift-1.2.10.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.6.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:bin/../lib/guava-13.0.1.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jbcrypt-0.3m.jar:bin/../lib/jline-1.0.jar:bin/../lib/jna-platform.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/lz4-1.1.0.jar:bin/../lib/metrics-core-2.2.0.jar:bin/../lib/netty-3.6.6.Final.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.7.2.jar:bin/../lib/slf4j-log4j12-1.7.2.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.5.jar:bin/../lib/snaptree-0.1.jar
org.apache.cassandra.service.CassandraDaemon

================

Actually this problem happens quite often. All the GCs below are
consecutive. I just copied the output of tail -f here.

2013-10-25T16:42:08.768+0800: 677135.570: [GC2013-10-25T16:42:08.769+0800:
677135.570: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
 - age   1:   16930360 bytes,   16930360 total
*- age   2:   10648456 bytes,  * 27578816 total
- age   3:   13359440 bytes,   40938256 total
- age   4:   10518080 bytes,   51456336 total
: 737954K->56968K(853376K), 0.1159900 secs] 4369426K->3692926K(6120832K),
0.1168410 secs] [Times: user=0.70 sys=0.00, real=0.12 secs]
2013-10-25T16:43:10.423+0800: 677197.225: [GC2013-10-25T16:43:10.424+0800:
677197.225: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:   12756896 bytes,   12756896 total
- age   2:   10679104 bytes,   23436000 total
*- age   3:   11557408 bytes,  * 34993408 total
- age   4:   12170432 bytes,   47163840 total
: 739720K->53771K(853376K), 0.1204730 secs] 4375678K->3697974K(6120832K),
0.1213100 secs] [Times: user=0.72 sys=0.00, real=0.12 secs]
2013-10-25T16:43:15.412+0800: 677202.213: [GC2013-10-25T16:43:15.412+0800:
677202.214: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:    2347520 bytes,    2347520 total
- age   2:   10569072 bytes,   12916592 total
*- age   3:   10655296 bytes, *  23571888 total
- age   4:   10500536 bytes,   34072424 total
: 736523K->41743K(853376K), 0.1057610 secs] 4380726K->3694848K(6120832K),
0.1064420 secs] [Times: user=0.72 sys=0.00, real=0.11 secs]
2013-10-25T16:43:20.705+0800: 677207.507: [GC2013-10-25T16:43:20.706+0800:
677207.507: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:    2224608 bytes,    2224608 total
- age   2:    1864384 bytes,    4088992 total
- age   3:   11297576 bytes,   15386568 total
*- age   4:   16933488 bytes,   *32320056 total
: 724495K->40915K(853376K), 0.1025910 secs] 4377600K->3702271K(6120832K),
0.1033820 secs] [Times: user=0.78 sys=0.00, real=0.10 secs]
2013-10-25T16:44:55.234+0800: 677302.036: [GC2013-10-25T16:44:55.235+0800:
677302.036: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:   23557744 bytes,   23557744 total
- age   2:     576560 bytes,   24134304 total
- age   3:     694352 bytes,   24828656 total
- age   4:   11104504 bytes,   35933160 total
: 723667K->41493K(853376K), 0.1091820 secs] 4385023K->3712159K(6120832K),
0.1100190 secs] [Times: user=0.79 sys=0.00, real=0.11 secs]
2013-10-25T16:46:24.655+0800: 677391.456: [GC2013-10-25T16:46:24.655+0800:
677391.457: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:   16052176 bytes,   16052176 total
*- age   2:   13515752 bytes,  * 29567928 total
- age   3:     568304 bytes,   30136232 total
- age   4:     685288 bytes,   30821520 total
: 724245K->35895K(853376K), 0.1193780 secs] 4394911K->3715432K(6120832K),
0.1201740 secs] [Times: user=0.71 sys=0.00, real=0.12 secs]
2013-10-25T16:48:08.928+0800: 677495.730: [GC2013-10-25T16:48:08.929+0800:
677495.730: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:   11795376 bytes,   11795376 total
- age   2:   16954104 bytes,   28749480 total
*- age   3:   16548488 bytes,   *45297968 total
- age   4:     564904 bytes,   45862872 total
: 718647K->50696K(853376K), 0.1075270 secs] 4398184K->3730910K(6120832K),
0.1083250 secs] [Times: user=0.71 sys=0.00, real=0.11 secs]
2013-10-25T16:49:49.994+0800: 677596.795: [GC2013-10-25T16:49:49.994+0800:
677596.795: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:   10775376 bytes,   10775376 total
*- age   2:    8955800 bytes,   *19731176 total
- age   3:   11600936 bytes,   31332112 total
- age   4:   15432856 bytes,   46764968 total
: 733448K->52432K(853376K), 0.1129030 secs] 4413662K->3733199K(6120832K),
0.1136890 secs] [Times: user=0.68 sys=0.00, real=0.11 secs]
2013-10-25T16:51:30.850+0800: 677697.651: [GC2013-10-25T16:51:30.851+0800:
677697.652: [ParNew
Desired survivor size 87359488 bytes, new threshold 4 (max 4)
- age   1:   13071608 bytes,   13071608 total
- age   2:    7039704 bytes,   20111312 total
*- age   3:    9879160 bytes,  * 29990472 total
- age   4:   10465456 bytes,   40455928 total
: 735184K->52264K(853376K), 0.1201120 secs] 4415951K->3746066K(6120832K),
0.1210270 secs] [Times: user=0.70 sys=0.00, real=0.12 secs]

Thanks,

Ji Cheng

On Fri, Oct 25, 2013 at 3:50 PM, Srinivas Ramakrishna <ysr1729 at gmail.com>wrote:

>
> Sorry, time for bed... I am not thinking straight...
>
> Scratch the part about the old generation being CMS reducing the incidence
> of the problem.
> The age table is only incremented when the target goes to a survivor
> space, so the probability of the race is independent of whether the old
> generation is CMS or not.
>
> I think the fix outlined above should take care of the bad accounting. And
> would avoid (in extreme cases)
> the adaprtive tenuring algorithm from being too pessimistic and using a
> lower tenuring threshold than the
> more accurate accounting would have yielded.
>
> -- ramki
>
>
>
> On Fri, Oct 25, 2013 at 12:46 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>wrote:
>
>> Never mind my question, born of a bit of confusion on my part.
>>
>> I think my explanation works in both cases, although the probability of
>> the event is increased if the old gen collector is not
>> CMS.
>>
>> Here's the issue: In the code for copying a target object into the
>> survivor space or into old gen, several threads may race to
>> claim an object. In the case where the object's age is under the tenuring
>> threshold, or if the older generation is not CMS,
>> we will first copy the object then claim the object by swapping in the
>> forwarding pointer to the copy. The other copies
>> are discarded and the winning thread continues. The problem is that the
>> age table is incremented by all of the threads
>> racing to do the copying. The fix is that only the winner of the race
>> should increment the age table to avoid multiple increments.
>>
>> That should fix the problem you are seeing. The problem could be more
>> acute in certain kinds of object graph structures, and also when the old
>> generation is not CMS then the possibility of such races is slightly
>> increased because it's present also when
>> copying into the old generation.
>>
>> (I can't recall why we don't always first claim the object and then do
>> the copying and then update the forwarding pointer, as is
>> done when the target space is the CMS space. I'll let others reconstruct
>> that reason, if that reason, probably a performance reason, is still
>> relevant today....)
>>
>> -- ramki
>>
>>
>>
>> On Fri, Oct 25, 2013 at 12:27 AM, Srinivas Ramakrishna <ysr1729 at gmail.com
>> > wrote:
>>
>>> Hi Ji --
>>>
>>> Are you using ParNew by itself without CMS in the old gen, or are you
>>> using CMS in the old gen.
>>> If the former, I have a possible explanation (although you will need to
>>> evaluate the statistical probability
>>> of an event based on the configuration and object demographics to
>>> determine if it's plausible in your
>>> case).
>>>
>>> If, however, you are using CMS in the old gen, then I don't have an
>>> explanation.
>>>
>>> So, what is your config? :-)
>>>
>>> -- ramki
>>>
>>>
>>>
>>> On Thu, Oct 24, 2013 at 11:40 PM, Tao Mao <tao.mao at oracle.com> wrote:
>>>
>>>>  Hi Ji,
>>>>
>>>> From what you've reported, it definitely looks weird. Are these two
>>>> GC's consecutive two GC's (i.e. no other GC/s in between)?
>>>>
>>>> Thanks.
>>>> Tao
>>>>
>>>>
>>>> On 10/24/13 9:45 PM, Ji Cheng wrote:
>>>>
>>>> Hello,
>>>>
>>>>  I have gc log enabled with -XX:+PrintTenuringDistribution. But I'm
>>>> quite confused with the tenuring distribution below.
>>>>
>>>>  =============
>>>>  2013-10-19T19:46:30.244+0800: 169797.045:
>>>> [GC2013-10-19T19:46:30.244+0800: 169797.045: [ParNew
>>>> Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>>>> - age   1:   10532656 bytes,   10532656 total
>>>> - age   2:   14082976 bytes,   24615632 total
>>>> - age   3:   15155296 bytes,   39770928 total
>>>> - age   4:   13938272 bytes,   53709200 total
>>>> : 758515K->76697K(853376K), 0.0748620 secs]
>>>> 4693076K->4021899K(6120832K), 0.0756370 secs] [Times: user=0.42 sys=0.00,
>>>> real=0.07 secs]
>>>> 2013-10-19T19:47:10.909+0800: 169837.710:
>>>> [GC2013-10-19T19:47:10.909+0800: 169837.711: [ParNew
>>>> Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>>>> - age   1:    9167144 bytes,    9167144 total
>>>> - age   2:    9178824 bytes,   18345968 total
>>>> - age   3:   16101552 bytes,   34447520 total
>>>> - age   4:   21369776 bytes,   55817296 total
>>>> : 759449K->63442K(853376K), 0.0776450 secs]
>>>> 4704651K->4020310K(6120832K), 0.0783500 secs] [Times: user=0.43 sys=0.00,
>>>> real=0.07 secs]
>>>>  =============
>>>>
>>>>  From What I read, there are 10532656 bytes in age 1 (survived from 1
>>>> GC) in the first gc. In the second gc, 9178824 bytes in age 2 (survived
>>>> from 2 GCs). This is fine since some objects died between the first and
>>>> second GC.
>>>>
>>>>  But in the second GC, 16101552 bytes are in age 3 while only 14082976
>>>> bytes in age 2 in the first GC. I don't why this number is increasing.
>>>> Shouldn't all bytes in age n come from age n-1 in the previous GC? Or I
>>>> misinterpreted those numbers?
>>>>
>>>>  btw, the jvm version is 1.7.0_40.
>>>>
>>>>  Thanks.
>>>>
>>>>  Ji Cheng
>>>>
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>> _______________________________________________
>>>> hotspot-gc-use mailing list
>>>> hotspot-gc-use at openjdk.java.net
>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131025/e7d64394/attachment-0001.html