How to read the output of +PrintTenuringDistribution

Ji Cheng memoleaf at gmail.com
Sun Oct 27 05:36:57 PDT 2013


Hi Srinivas,

Thanks a lot for your help and detailed explanation. =)

I didn't file a bug report. I think it's easier for Tao or your colleagues
to file it on the new jira system (I can't create an account).


Ji Cheng


On Sat, Oct 26, 2013 at 2:07 AM, Srinivas Ramakrishna <ysr1729 at gmail.com>wrote:

> Hi Ji, Tao --
>
> Yes, as I outlined this is a bug in the JVM's parnew object copying and
> age table book-keeping
> because of which the agetable entries can end up being incremented more
> than they should be
> when two or more GC threads race to copy an object to a survivor space and
> only one succeeds
> in doing the copy (but all the racers increment in their local tables).
> This causes the counts to go awry.
> (There is no corresponding bug in Parallel Scavenge because it doesn't
> keep an age table.)
>
> Since the incrementing is done to a per-thread local age table, there are
> two possible solutions to avoid
> this overcounting: Either have the thread decrement the its count when it
> loses a race, or have the
> thread increment its count only when it is sure that it has won the race.
>
> One can look at the structure of the code to see which of these looks like
> a better solution to fix the
> book-keeping badness.
>
> I am assuming Tao or colleagues will create a bug for this and fix it. Or
> that you have already called in
> an official bug for this.
>
> Thanks for reporting the problem!
> -- ramki
>
>
>
> On Fri, Oct 25, 2013 at 2:43 AM, Ji Cheng <memoleaf at gmail.com> wrote:
>
>> Thanks Tao and Srinivas for the responses.
>>
>> Yes, it's from two consecutive GCs.
>>
>> If this is a bug in jvm itself, here are some more details if that helps.
>>
>> The log is from Apache Cassandra, a distributed database, running on an
>> 8-core machine. The jvm options are shown as below:
>>
>> ================
>> root     10267 31.7 58.9 139587620 9683720 ?   SLl  Oct17 3562:09
>> /usr/lib/jvm/java-7-oracle/bin/java -ea
>> -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1000M
>> -XX:+HeapDumpOnOutOfMemoryError -Xss256k -verbose:gc *-XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC* -XX:+CMSParallelRemarkEnabled
>> -XX:SurvivorRatio=4 -XX:MaxTenuringThreshold=4
>> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+UseTLAB -XX:+UseCondCardMark -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
>> -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc.log
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
>> -Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=cassandra-1.lan
>> -Dcom.sun.management.jmxremote.port=7199
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dlog4j.configuration=log4j-server.properties
>> -Dlog4j.defaultInitOverride=true -cp
>> bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.2.10.jar:bin/../lib/apache-cassandra-clientutil-1.2.10.jar:bin/../lib/apache-cassandra-thrift-1.2.10.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.6.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:bin/../lib/guava-13.0.1.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jbcrypt-0.3m.jar:bin/../lib/jline-1.0.jar:bin/../lib/jna-platform.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/lz4-1.1.0.jar:bin/../lib/metrics-core-2.2.0.jar:bin/../lib/netty-3.6.6.Final.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.7.2.jar:bin/../lib/slf4j-log4j12-1.7.2.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.5.jar:bin/../lib/snaptree-0.1.jar
>> org.apache.cassandra.service.CassandraDaemon
>>
>> ================
>>
>> Actually this problem happens quite often. All the GCs below are
>> consecutive. I just copied the output of tail -f here.
>>
>>
>> 2013-10-25T16:42:08.768+0800: 677135.570:
>> [GC2013-10-25T16:42:08.769+0800: 677135.570: [ParNew
>> Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>>  - age   1:   16930360 bytes,   16930360 total
>> *- age   2:   10648456 bytes,  * 27578816 total
>> - age   3:   13359440 bytes,   40938256 total
>> - age   4:   10518080 bytes,   51456336 total
>> : 737954K->56968K(853376K), 0.1159900 secs] 4369426K->3692926K(6120832K),
>> 0.1168410 secs] [Times: user=0.70 sys=0.00, real=0.12 secs]
>> 2013-10-25T16:43:10.423+0800: 677197.225:
>> [GC2013-10-25T16:43:10.424+0800: 677197.225: [ParNew
>>  Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:   12756896 bytes,   12756896 total
>> - age   2:   10679104 bytes,   23436000 total
>> *- age   3:   11557408 bytes,  * 34993408 total
>> - age   4:   12170432 bytes,   47163840 total
>> : 739720K->53771K(853376K), 0.1204730 secs] 4375678K->3697974K(6120832K),
>> 0.1213100 secs] [Times: user=0.72 sys=0.00, real=0.12 secs]
>> 2013-10-25T16:43:15.412+0800: 677202.213:
>> [GC2013-10-25T16:43:15.412+0800: 677202.214: [ParNew
>>  Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:    2347520 bytes,    2347520 total
>> - age   2:   10569072 bytes,   12916592 total
>> *- age   3:   10655296 bytes, *  23571888 total
>> - age   4:   10500536 bytes,   34072424 total
>> : 736523K->41743K(853376K), 0.1057610 secs] 4380726K->3694848K(6120832K),
>> 0.1064420 secs] [Times: user=0.72 sys=0.00, real=0.11 secs]
>> 2013-10-25T16:43:20.705+0800: 677207.507:
>> [GC2013-10-25T16:43:20.706+0800: 677207.507: [ParNew
>>  Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:    2224608 bytes,    2224608 total
>> - age   2:    1864384 bytes,    4088992 total
>> - age   3:   11297576 bytes,   15386568 total
>> *- age   4:   16933488 bytes,   *32320056 total
>> : 724495K->40915K(853376K), 0.1025910 secs] 4377600K->3702271K(6120832K),
>> 0.1033820 secs] [Times: user=0.78 sys=0.00, real=0.10 secs]
>> 2013-10-25T16:44:55.234+0800: 677302.036:
>> [GC2013-10-25T16:44:55.235+0800: 677302.036: [ParNew
>> Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:   23557744 bytes,   23557744 total
>> - age   2:     576560 bytes,   24134304 total
>> - age   3:     694352 bytes,   24828656 total
>> - age   4:   11104504 bytes,   35933160 total
>> : 723667K->41493K(853376K), 0.1091820 secs] 4385023K->3712159K(6120832K),
>> 0.1100190 secs] [Times: user=0.79 sys=0.00, real=0.11 secs]
>> 2013-10-25T16:46:24.655+0800: 677391.456:
>> [GC2013-10-25T16:46:24.655+0800: 677391.457: [ParNew
>>  Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:   16052176 bytes,   16052176 total
>> *- age   2:   13515752 bytes,  * 29567928 total
>> - age   3:     568304 bytes,   30136232 total
>> - age   4:     685288 bytes,   30821520 total
>> : 724245K->35895K(853376K), 0.1193780 secs] 4394911K->3715432K(6120832K),
>> 0.1201740 secs] [Times: user=0.71 sys=0.00, real=0.12 secs]
>> 2013-10-25T16:48:08.928+0800: 677495.730:
>> [GC2013-10-25T16:48:08.929+0800: 677495.730: [ParNew
>>  Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:   11795376 bytes,   11795376 total
>> - age   2:   16954104 bytes,   28749480 total
>> *- age   3:   16548488 bytes,   *45297968 total
>> - age   4:     564904 bytes,   45862872 total
>> : 718647K->50696K(853376K), 0.1075270 secs] 4398184K->3730910K(6120832K),
>> 0.1083250 secs] [Times: user=0.71 sys=0.00, real=0.11 secs]
>> 2013-10-25T16:49:49.994+0800: 677596.795:
>> [GC2013-10-25T16:49:49.994+0800: 677596.795: [ParNew
>>  Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:   10775376 bytes,   10775376 total
>> *- age   2:    8955800 bytes,   *19731176 total
>> - age   3:   11600936 bytes,   31332112 total
>> - age   4:   15432856 bytes,   46764968 total
>> : 733448K->52432K(853376K), 0.1129030 secs] 4413662K->3733199K(6120832K),
>> 0.1136890 secs] [Times: user=0.68 sys=0.00, real=0.11 secs]
>> 2013-10-25T16:51:30.850+0800: 677697.651:
>> [GC2013-10-25T16:51:30.851+0800: 677697.652: [ParNew
>>  Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>> - age   1:   13071608 bytes,   13071608 total
>> - age   2:    7039704 bytes,   20111312 total
>> *- age   3:    9879160 bytes,  * 29990472 total
>> - age   4:   10465456 bytes,   40455928 total
>> : 735184K->52264K(853376K), 0.1201120 secs] 4415951K->3746066K(6120832K),
>> 0.1210270 secs] [Times: user=0.70 sys=0.00, real=0.12 secs]
>>
>>
>>
>> Thanks,
>>
>> Ji Cheng
>>
>>
>>
>> On Fri, Oct 25, 2013 at 3:50 PM, Srinivas Ramakrishna <ysr1729 at gmail.com>wrote:
>>
>>>
>>> Sorry, time for bed... I am not thinking straight...
>>>
>>> Scratch the part about the old generation being CMS reducing the
>>> incidence of the problem.
>>> The age table is only incremented when the target goes to a survivor
>>> space, so the probability of the race is independent of whether the old
>>> generation is CMS or not.
>>>
>>> I think the fix outlined above should take care of the bad accounting.
>>> And would avoid (in extreme cases)
>>> the adaprtive tenuring algorithm from being too pessimistic and using a
>>> lower tenuring threshold than the
>>> more accurate accounting would have yielded.
>>>
>>> -- ramki
>>>
>>>
>>>
>>> On Fri, Oct 25, 2013 at 12:46 AM, Srinivas Ramakrishna <
>>> ysr1729 at gmail.com> wrote:
>>>
>>>> Never mind my question, born of a bit of confusion on my part.
>>>>
>>>> I think my explanation works in both cases, although the probability of
>>>> the event is increased if the old gen collector is not
>>>> CMS.
>>>>
>>>> Here's the issue: In the code for copying a target object into the
>>>> survivor space or into old gen, several threads may race to
>>>> claim an object. In the case where the object's age is under the
>>>> tenuring threshold, or if the older generation is not CMS,
>>>> we will first copy the object then claim the object by swapping in the
>>>> forwarding pointer to the copy. The other copies
>>>> are discarded and the winning thread continues. The problem is that the
>>>> age table is incremented by all of the threads
>>>> racing to do the copying. The fix is that only the winner of the race
>>>> should increment the age table to avoid multiple increments.
>>>>
>>>> That should fix the problem you are seeing. The problem could be more
>>>> acute in certain kinds of object graph structures, and also when the old
>>>> generation is not CMS then the possibility of such races is slightly
>>>> increased because it's present also when
>>>> copying into the old generation.
>>>>
>>>> (I can't recall why we don't always first claim the object and then do
>>>> the copying and then update the forwarding pointer, as is
>>>> done when the target space is the CMS space. I'll let others
>>>> reconstruct that reason, if that reason, probably a performance reason, is
>>>> still relevant today....)
>>>>
>>>> -- ramki
>>>>
>>>>
>>>>
>>>> On Fri, Oct 25, 2013 at 12:27 AM, Srinivas Ramakrishna <
>>>> ysr1729 at gmail.com> wrote:
>>>>
>>>>> Hi Ji --
>>>>>
>>>>> Are you using ParNew by itself without CMS in the old gen, or are you
>>>>> using CMS in the old gen.
>>>>> If the former, I have a possible explanation (although you will need
>>>>> to evaluate the statistical probability
>>>>> of an event based on the configuration and object demographics to
>>>>> determine if it's plausible in your
>>>>> case).
>>>>>
>>>>> If, however, you are using CMS in the old gen, then I don't have an
>>>>> explanation.
>>>>>
>>>>> So, what is your config? :-)
>>>>>
>>>>> -- ramki
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 24, 2013 at 11:40 PM, Tao Mao <tao.mao at oracle.com> wrote:
>>>>>
>>>>>>  Hi Ji,
>>>>>>
>>>>>> From what you've reported, it definitely looks weird. Are these two
>>>>>> GC's consecutive two GC's (i.e. no other GC/s in between)?
>>>>>>
>>>>>> Thanks.
>>>>>> Tao
>>>>>>
>>>>>>
>>>>>> On 10/24/13 9:45 PM, Ji Cheng wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>  I have gc log enabled with -XX:+PrintTenuringDistribution. But I'm
>>>>>> quite confused with the tenuring distribution below.
>>>>>>
>>>>>>  =============
>>>>>>  2013-10-19T19:46:30.244+0800: 169797.045:
>>>>>> [GC2013-10-19T19:46:30.244+0800: 169797.045: [ParNew
>>>>>> Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>>>>>> - age   1:   10532656 bytes,   10532656 total
>>>>>> - age   2:   14082976 bytes,   24615632 total
>>>>>> - age   3:   15155296 bytes,   39770928 total
>>>>>> - age   4:   13938272 bytes,   53709200 total
>>>>>> : 758515K->76697K(853376K), 0.0748620 secs]
>>>>>> 4693076K->4021899K(6120832K), 0.0756370 secs] [Times: user=0.42 sys=0.00,
>>>>>> real=0.07 secs]
>>>>>> 2013-10-19T19:47:10.909+0800: 169837.710:
>>>>>> [GC2013-10-19T19:47:10.909+0800: 169837.711: [ParNew
>>>>>> Desired survivor size 87359488 bytes, new threshold 4 (max 4)
>>>>>> - age   1:    9167144 bytes,    9167144 total
>>>>>> - age   2:    9178824 bytes,   18345968 total
>>>>>> - age   3:   16101552 bytes,   34447520 total
>>>>>> - age   4:   21369776 bytes,   55817296 total
>>>>>> : 759449K->63442K(853376K), 0.0776450 secs]
>>>>>> 4704651K->4020310K(6120832K), 0.0783500 secs] [Times: user=0.43 sys=0.00,
>>>>>> real=0.07 secs]
>>>>>>  =============
>>>>>>
>>>>>>  From What I read, there are 10532656 bytes in age 1 (survived from
>>>>>> 1 GC) in the first gc. In the second gc, 9178824 bytes in age 2 (survived
>>>>>> from 2 GCs). This is fine since some objects died between the first and
>>>>>> second GC.
>>>>>>
>>>>>>  But in the second GC, 16101552 bytes are in age 3 while only
>>>>>> 14082976 bytes in age 2 in the first GC. I don't why this number is
>>>>>> increasing. Shouldn't all bytes in age n come from age n-1 in the previous
>>>>>> GC? Or I misinterpreted those numbers?
>>>>>>
>>>>>>  btw, the jvm version is 1.7.0_40.
>>>>>>
>>>>>>  Thanks.
>>>>>>
>>>>>>  Ji Cheng
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> hotspot-gc-use mailing list
>>>>>> hotspot-gc-use at openjdk.java.net
>>>>>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131027/8f6c5a36/attachment-0001.html 


More information about the hotspot-gc-use mailing list