Unreasonably long ParNew, CMS pause times
Eagle Kessler
eagle.kessler at gmail.com
Thu Jul 22 11:36:50 PDT 2010
We've switched to the throughput collector with a max pause time goal, and
that seems to be behaving well. I'm still curious about the behavior that
I'm seeing with CMS, though.
I ran some tests with -XX:+PrintTenuringDistribution and without setting
MaxTenuringThreshold, and I'm still seeing the behavior. Both of these tests
were executed under constant load:
512mb heap (128mb young gen), MTT=0:
62.537: [GC 62.537: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0064317 secs] 143985K->16058K(523328K), 0.0065156
secs] [Times: user=0.03 sys=0.01, real=0.01 secs]
95.633: [GC 95.633: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0030186 secs] 145210K->16614K(523328K), 0.0030929
secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
127.882: [GC 127.882: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0042069 secs] 145766K->17285K(523328K), 0.0042783
secs] [Times: user=0.01 sys=0.02, real=0.00 secs]
158.086: [GC 158.086: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0045986 secs] 146437K->17955K(523328K), 0.0046896
secs] [Times: user=0.02 sys=0.02, real=0.01 secs]
Rising to
5021.415: [GC 5021.415: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0604269 secs] 225694K->97083K(523328K), 0.0605133
secs] [Times: user=0.10 sys=0.61, real=0.06 secs]
5053.611: [GC 5053.611: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0569357 secs] 226235K->97636K(523328K), 0.0570316
secs] [Times: user=0.10 sys=0.60, real=0.05 secs]
5084.401: [GC 5084.401: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0591064 secs] 226788K->98088K(523328K), 0.0591840
secs] [Times: user=0.09 sys=0.62, real=0.06 secs]
Similarly, 512mb heap, 128mb young gen, and no MTT:
83.708: [GC 83.708: [ParNew
Desired survivor size 3702784 bytes, new threshold 15 (max 15)
- age 1: 626008 bytes, 626008 total
- age 2: 736336 bytes, 1362344 total
: 120669K->1759K(123840K), 0.0034469 secs] 135502K->16593K(517056K),
0.0035234 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
109.627: [GC 109.627: [ParNew
Desired survivor size 3702784 bytes, new threshold 15 (max 15)
- age 1: 634688 bytes, 634688 total
- age 2: 268624 bytes, 903312 total
- age 3: 736016 bytes, 1639328 total
: 118367K->2275K(123840K), 0.0036776 secs] 133201K->17109K(517056K),
0.0037455 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
137.495: [GC 137.495: [ParNew
Desired survivor size 3702784 bytes, new threshold 15 (max 15)
- age 1: 479728 bytes, 479728 total
- age 2: 269064 bytes, 748792 total
- age 3: 267904 bytes, 1016696 total
- age 4: 735952 bytes, 1752648 total
: 118883K->2396K(123840K), 0.0040557 secs] 133717K->17230K(517056K),
0.0041302 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
165.090: [GC 165.090: [ParNew
Desired survivor size 3702784 bytes, new threshold 15 (max 15)
- age 1: 403856 bytes, 403856 total
- age 2: 267736 bytes, 671592 total
- age 3: 268616 bytes, 940208 total
- age 4: 267904 bytes, 1208112 total
- age 5: 729072 bytes, 1937184 total
: 119004K->2668K(123840K), 0.0046744 secs] 133838K->17501K(517056K),
0.0047473 secs] [Times: user=0.04 sys=0.02, real=0.00 secs]
Rising to
4981.917: [GC 4981.918: [ParNew
Desired survivor size 3702784 bytes, new threshold 14 (max 15)
- age 1: 533872 bytes, 533872 total
- age 2: 269336 bytes, 803208 total
- age 3: 268048 bytes, 1071256 total
- age 4: 268272 bytes, 1339528 total
- age 5: 265880 bytes, 1605408 total
- age 6: 241704 bytes, 1847112 total
- age 7: 241112 bytes, 2088224 total
- age 8: 239680 bytes, 2327904 total
- age 9: 233632 bytes, 2561536 total
- age 10: 231040 bytes, 2792576 total
- age 11: 231040 bytes, 3023616 total
- age 12: 232256 bytes, 3255872 total
- age 13: 232256 bytes, 3488128 total
- age 14: 231040 bytes, 3719168 total
: 122652K->4440K(123840K), 0.0654827 secs] 173869K->56113K(517056K),
0.0655671 secs] [Times: user=0.17 sys=0.62, real=0.07 secs]
5009.679: [GC 5009.679: [ParNew
Desired survivor size 3702784 bytes, new threshold 14 (max 15)
- age 1: 673136 bytes, 673136 total
- age 2: 271704 bytes, 944840 total
- age 3: 269232 bytes, 1214072 total
- age 4: 268088 bytes, 1482160 total
- age 5: 264528 bytes, 1746688 total
- age 6: 244280 bytes, 1990968 total
- age 7: 238320 bytes, 2229288 total
- age 8: 241112 bytes, 2470400 total
- age 9: 233416 bytes, 2703816 total
- age 10: 231040 bytes, 2934856 total
- age 11: 231040 bytes, 3165896 total
- age 12: 231040 bytes, 3396936 total
- age 13: 232256 bytes, 3629192 total
- age 14: 232256 bytes, 3861448 total
: 121048K->5058K(123840K), 0.0675964 secs] 172721K->56957K(517056K),
0.0677232 secs] [Times: user=0.17 sys=0.66, real=0.06 secs]
5037.742: [GC 5037.742: [ParNew
Desired survivor size 3702784 bytes, new threshold 14 (max 15)
- age 1: 582296 bytes, 582296 total
- age 2: 268128 bytes, 850424 total
- age 3: 271528 bytes, 1121952 total
- age 4: 269192 bytes, 1391144 total
- age 5: 264952 bytes, 1656096 total
- age 6: 242496 bytes, 1898592 total
- age 7: 240752 bytes, 2139344 total
- age 8: 238320 bytes, 2377664 total
- age 9: 234776 bytes, 2612440 total
- age 10: 231040 bytes, 2843480 total
- age 11: 231040 bytes, 3074520 total
- age 12: 231040 bytes, 3305560 total
- age 13: 231040 bytes, 3536600 total
- age 14: 232256 bytes, 3768856 total
: 121666K->5991K(123840K), 0.0649960 secs] 173565K->58116K(517056K),
0.0650795 secs] [Times: user=0.17 sys=0.64, real=0.06 secs]
I'll look into why 232k lives through seven minutes of collections, but in
both cases a full collection (triggered through JMX) brought the heap down
to ~12mb and removed the "accumulated" ParNew time. Any idea why I'm seeing
the slow increase in ParNew collection times and/or more tests that I should
be running to diagnose it? I can provide the full logs if you'd like, I've
abbreviated them here to avoid sending long messages to the entire list.
On Wed, Jul 21, 2010 at 2:16 PM, Y. S. Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:
>
>
> On 07/21/10 13:41, Eagle Kessler wrote:
>
>> Checking the configs, it looks like we are explicitly setting MTT to 0:
>>
>> # Min, max, total JVM size (-Xms -Xmx)
>> JVM_SIZE="-server -Xms2g -Xmx2g"
>>
>> # New Generation Sizes (-XX:NewSize -XX:MaxNewSize)
>>
>> JVM_SIZE_NEW="-XX:NewSize=512m -XX:MaxNewSize=512m"
>>
>> # Perm Generation Sizes (-XX:PermSize -XX:MaxPermSize)
>> JVM_SIZE_PERM="-XX:PermSize=128m -XX:MaxPermSize=128m"
>>
>> # Type of Garbage Collector to use
>>
>> JVM_GC_TYPE="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC"
>>
>> JVM_GC_OPTS="-XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128"
>>
>> I agree that in this case we definitely want to be using survivor spaces
>> (I'm unfortunately not in charge of the default GC settings yet). However, I
>> didn't know that running without survivor spaces could cause this kind of
>> behavior. Why does running without survivor spaces cause such a large
>> performance issue?
>>
>
> See below.
>
>
>
>> Regardless, I'll ask that -XX:+PrintTenuringDistribution be added to the
>> configs, along with a non-zero MTT, and see if the issue persists. The
>>
>
> And of course SurvivorRatio a more reasonable value like 6 or 8,
> depending on expected survival rate etc.
>
>
> rising ParNew times seem like they would be unrelated to the tenuring
>> threshold, though, wouldn't they?
>>
>
> No, it's related in the sense that MTT=0 was resulting in very very
> short-lived
> data to promote into the old gen. This can cause object dempgraphics (size
> and
> age dirstribution) to be very non-stationary, and confuse the heuristics
> for sizing the free list inventory. It of course also places large pressure
> on the old gen allocator, increases fragmentation, increases mutation rates
> and so on.
>
> The only thing it might not affect much is the initial mark pauses,
> which will probably stay as they were before.
>
> -- ramki
>
>
>> On Wed, Jul 21, 2010 at 12:40 PM, Y. S. Ramakrishna <
>> y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>>
>> oh, and the "(max 0)" seems to imply you have somehow ended up
>> asking for
>> MaxTenuringThreshold=0 which can be disastrous. It is possible that the
>> default settings for CMS on certain very old JVM's did indeed result in
>> MTT=0 (but memory is fading of that era), so if using such older
>> vintage
>> of JVM's, explicitly setting MTT and SurvivorRatio higher is a good
>> idea.
>>
>> -- ramki
>>
>>
>> On 07/21/10 12:34, Y. S. Ramakrishna wrote:
>>
>> Yes, like what i think Matt is getting at, i'd configure
>> sufficiently
>> large survivor spaces. Even if you expect most of your objects
>> to die young,
>> you'd want survivor spaces large enough to keep at least age 1
>> objects in
>> the survivor space. If as you state no medium- to ling-lived
>> state is
>> retained, your data is mostly short-lived and you'll be able to
>> do without
>> any promotion at all. Your problem here is that somehow your
>> survivor
>> spaces may have disappeared. (+PrintHeapAtGC will tell you, and
>> of course examining yr JVM options should shed more light on
>> that apparent
>> disappearance.)
>>
>> -- ramki
>>
>> On 07/21/10 10:09, Eagle Kessler wrote:
>>
>> Hello all,
>> I've got a web service that I'm seeing some very strange
>> behavior on around it's garbage collection, and was
>> wondering if someone here could explain why it might be
>> happening. The service itself is fairly simple: Take in data
>> from a few sources and merge them with the existing data in
>> the database. It stores nearly no state while doing this,
>> and indeed heap dumps taken 1, 24, and 72 hours after boot
>> indicate that we have a consistent ~12mb of live data (in a
>> 2GB heap, but I hope that's not what is causing this).
>>
>> The GC logs, though, don't look very happy at all. After our
>> start up period, they settle into a fairly consistent pattern:
>>
>> 1041.159: [GC 1041.159: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>>
>> : 516224K->0K(520256K), 0.0170322 secs]
>> 537266K->22659K(2093120K), 0.0171330 secs] [Times: user=0.04
>> sys=0.01, real=0.02 secs] 1606.500: [GC 1606.500: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>>
>> : 516224K->0K(520256K), 0.0173235 secs]
>> 538883K->24214K(2093120K), 0.0174138 secs] [Times: user=0.04
>> sys=0.03, real=0.02 secs] 2040.773: [GC 2040.773: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>>
>> : 516224K->0K(520256K), 0.0196319 secs]
>> 540438K->25737K(2093120K), 0.0197275 secs] [Times: user=0.05
>> sys=0.02, real=0.02 secs]
>> Which we would be very nice if it kept going like that.
>> However, by the first time the CMS collector runs, things
>> aren't working nearly as well:
>>
>> 214182.439: [GC 214182.439: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>> : 516224K->0K(520256K), 1.0146996 secs]
>> 1297278K->782575K(2093120K), 1.0148799 secs] [Times:
>> user=1.21 sys=0.58, real=1.01 secs]
>> 214247.437: [GC 214247.438: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>> : 516224K->0K(520256K), 1.2310274 secs]
>> 1298799K->784188K(2093120K), 1.2311534 secs] [Times:
>> user=1.46 sys=0.69, real=1.23 secs]
>> 214313.642: [GC 214313.642: [ParNew
>> Desired survivor size 2064384 bytes, new threshold 0 (max 0)
>> : 516224K->0K(520256K), 1.2183258 secs]
>> 1300412K->785710K(2093120K), 1.2184848 secs] [Times:
>> user=1.45 sys=0.65, real=1.22 secs]
>>
>> The increasing sys time is a bit worrying, but it seems like
>> the actual GC time is rising as well, even though we aren't
>> collecting any more young-gen garbage. At this point, CMS
>> went off
>>
>> 214380.695: [GC [1 CMS-initial-mark: 787188K(1572864K)]
>> 787195K(2093120K), 1.8929842 secs] [Times: user=1.50
>> sys=0.02, real=1.89 secs]
>> 214382.589: [CMS-concurrent-mark-start]
>> 214383.056: [CMS-concurrent-mark: 0.467/0.467 secs] [Times:
>> user=1.81 sys=0.01, real=0.47 secs] 214383.056:
>> [CMS-concurrent-preclean-start]
>> 214383.064: [CMS-concurrent-preclean: 0.008/0.008 secs]
>> [Times: user=0.01 sys=0.00, real=0.01 secs]
>> 214383.064: [CMS-concurrent-abortable-preclean-start]
>> CMS: abort preclean due to time 214388.133:
>> [CMS-concurrent-abortable-preclean: 0.242/5.069 secs]
>> [Times: user=5.02 sys=0.02, real=5.07 secs] 214388.159:
>> [GC[YG occupancy: 51943 K (520256 K)]214388.159: [Rescan
>> (parallel) , 1.5403455 secs]214389.699: [weak refs
>> processing, 0.0050170 secs] [1 CMS-remark:
>> 787188K(1572864K)] 839132K(2093120K), 1.5458536 secs]
>> [Times: user=1.80 sys=0.71, real=1.55 secs]
>> 214389.705: [CMS-concurrent-sweep-start]
>> 214390.439: [CMS-concurrent-sweep: 0.671/0.734 secs] [Times:
>> user=1.35 sys=0.00, real=0.73 secs] 214390.439:
>> [CMS-concurrent-reset-start]
>> 214390.621: [CMS-concurrent-reset: 0.183/0.183 secs] [Times:
>> user=0.20 sys=0.02, real=0.18 secs]
>>
>> It seems like a) initial-mark shouldn't take 1.8 seconds, b)
>> if we really do only have 12mb of live data, CMS should have
>> collected a lot more than it did (the next ParNew collection
>> reported ~545MB of old gen in use), and c) 50% heap usage
>> with very little promotion seems very early for the
>> collector to go off.
>>
>> The next CMS cycle is at 434,973 seconds, by which point the
>> young gen collections are taking 3 seconds (user 3.59, sys
>> 1.60, real 3.09). The initial mark takes 4.82 seconds (user
>> 3.82, sys 0.02, real 4.82), and sweeps down to 1.1gb of used
>> old gen. I haven't yet confirmed it, but given the previous
>> heap dumps I'd guess that they will claim 12mb of live
>> objects and 1.1gb of dead objects. The current young gen
>> collections (at 497,601 seconds) are taking ~3.7 seconds
>> (4.33 user, 2.03 sys) Any idea what could be going on here?
>> We're running JDK 1.6_16.
>>
>> -- Eagle Kessler
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>
>>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> <mailto:hotspot-gc-use at openjdk.java.net>
>>
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
>>
>>
>>
>>
>> --
>> Eagle Kessler
>>
>
--
Eagle Kessler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100722/643576e8/attachment-0001.html
More information about the hotspot-gc-use
mailing list