Spikes in duration of the minor GC caused by some unknown JVM activity before GC
Bartek Markocki
bartosz.markocki at gmail.com
Tue Dec 3 04:49:58 PST 2013
Hello all,
We have been performing performance tests of a new release of one of our
application. What caught our attention was sporadic spikes in the minor GC
times.
By default the application uses ParallelGC so we were unable to distinguish
between enlarged minor GC times and something else. Therefore we switched
to ParNew and added printing safepoint statistics together with tracing
safepoint cleanup time.
Based on the above we got data that shows that the duration of the STW is
not driven by the duration of the minor GC but something else.
Here is the extract from the GC log file:
2013-12-02T02:16:29.993-0600: 45476.110: Total time for which application
threads were stopped: 0.0304740 seconds
2013-12-02T02:16:30.772-0600: 45476.889: Application time: 0.7794150 seconds
{Heap before GC invocations=53652 (full 13):
par new generation total 629120K, used 565117K [0x538f0000, 0x7e390000,
0x7e390000)
eden space 559232K, 100% used [0x538f0000, 0x75b10000, 0x75b10000)
from space 69888K, 8% used [0x75b10000, 0x760cf500, 0x79f50000)
to space 69888K, 0% used [0x79f50000, 0x79f50000, 0x7e390000)
tenured generation total 1398144K, used 1077208K [0x7e390000,
0xd38f0000, 0xd38f0000)
the space 1398144K, 77% used [0x7e390000, 0xbff86080, 0xbff86200,
0xd38f0000)
compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000,
0xf38f0000)
the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800,
0xe38f0000)
No shared spaces configured.
2013-12-02T02:16:30.787-0600: 45476.904: [GC2013-12-02T02:16:32.776-0600:
45478.893: [ParNew
Desired survivor size 35782656 bytes, new threshold 15 (max 15)
- age 1: 1856072 bytes, 1856072 total
- age 2: 170120 bytes, 2026192 total
- age 3: 232696 bytes, 2258888 total
- age 4: 180136 bytes, 2439024 total
- age 5: 235120 bytes, 2674144 total
- age 6: 242976 bytes, 2917120 total
- age 7: 231728 bytes, 3148848 total
- age 8: 149976 bytes, 3298824 total
- age 9: 117904 bytes, 3416728 total
- age 10: 126936 bytes, 3543664 total
- age 11: 126624 bytes, 3670288 total
- age 12: 114256 bytes, 3784544 total
- age 13: 146760 bytes, 3931304 total
- age 14: 163808 bytes, 4095112 total
- age 15: 171664 bytes, 4266776 total
: 565117K->5057K(629120K), 0.0629070 secs] 1642325K->1082393K(2027264K),
2.0523080 secs] [Times: user=0.07 sys=0.00, real=2.05 secs]
Heap after GC invocations=53653 (full 13):
par new generation total 629120K, used 5057K [0x538f0000, 0x7e390000,
0x7e390000)
eden space 559232K, 0% used [0x538f0000, 0x538f0000, 0x75b10000)
from space 69888K, 7% used [0x79f50000, 0x7a4404b8, 0x7e390000)
to space 69888K, 0% used [0x75b10000, 0x75b10000, 0x79f50000)
tenured generation total 1398144K, used 1077336K [0x7e390000,
0xd38f0000, 0xd38f0000)
the space 1398144K, 77% used [0x7e390000, 0xbffa6080, 0xbffa6200,
0xd38f0000)
compacting perm gen total 262144K, used 97801K [0xd38f0000, 0xe38f0000,
0xf38f0000)
the space 262144K, 37% used [0xd38f0000, 0xd9872690, 0xd9872800,
0xe38f0000)
No shared spaces configured.
}
2013-12-02T02:16:32.839-0600: 45478.957: Total time for which application
threads were stopped: 2.0675060 seconds
Please notice the difference in the times of start of the STW and GC.
And here is the output from the safepoint statistics:
45476.086: [deflating idle monitors, 0.0010070 secs]
45476.087: [updating inline caches, 0.0000000 secs]
45476.087: [compilation policy safepoint handler, 0.0005410 secs]
45476.088: [sweeping nmethods, 0.0000020 secs]
vmop [threads: total initially_running
wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
45476.078: GenCollectForAllocation [ 294
3 6 ] [ 4 1 6 1 21 ] 3
45476.902: [deflating idle monitors, 0.0010780 secs]
45476.903: [updating inline caches, 0.0000010 secs]
45476.903: [compilation policy safepoint handler, 0.0005200 secs]
45476.904: [sweeping nmethods, 0.0000020 secs]
vmop [threads: total initially_running
wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
45476.891: GenCollectForAllocation [ 294
2 3 ] [ 9 3 13 1 2052 ] 2
So if my reading is correct we miss around 1.9 seconds - before the GC
started.
Based on the above it is unclear to me were this time was spent.
The test was running for 40 hours.
Over those 40 hours we had 26 incidents where the difference > 200ms.
The above incident is the longest.
Here they are:
3534.445: GenCollectForAllocation [ 307
0 2 ] [ 0 0 0 1 1935 ] 0
6403.822: GenCollectForAllocation [ 294
2 4 ] [ 0 0 0 1 1683 ] 1
9146.663: GenCollectForAllocation [ 292
2 3 ] [ 24 0 25 2 1773 ] 2
12351.684: GenCollectForAllocation [ 293
4 10 ] [ 2 0 2 1 2044 ] 2
15746.592: GenCollectForAllocation [ 294
1 2 ] [ 0 2 2 1 1697 ] 0
16574.963: GenCollectForAllocation [ 295
1 1 ] [ 0 0 0 1 1224 ] 0
18337.686: GenCollectForAllocation [ 293
1 2 ] [ 1 0 1 1 418 ] 0
19142.518: GenCollectForAllocation [ 295
0 1 ] [ 0 0 0 1 1626 ] 0
20563.826: GenCollectForAllocation [ 296
6 6 ] [ 7 2 9 1 233 ] 4
22611.752: GenCollectForAllocation [ 294
4 4 ] [ 0 0 0 1 1584 ] 1
26043.520: GenCollectForAllocation [ 295
2 6 ] [ 6 4 11 1 1883 ] 1
29584.480: GenCollectForAllocation [ 292
3 5 ] [ 0 0 0 1 1788 ] 2
33119.441: GenCollectForAllocation [ 293
2 4 ] [ 3 0 3 1 1853 ] 2
34800.660: GenCollectForAllocation [ 294
2 4 ] [ 0 0 0 1 725 ] 0
36444.246: GenCollectForAllocation [ 293
1 0 ] [ 0 0 4 1 1815 ] 0
36656.730: GenCollectForAllocation [ 294
1 3 ] [ 0 0 0 1 905 ] 0
39751.609: GenCollectForAllocation [ 294
2 4 ] [ 3 0 3 1 2207 ] 1
41836.305: GenCollectForAllocation [ 293
2 2 ] [ 1 0 1 1 286 ] 1
43323.840: GenCollectForAllocation [ 293
0 1 ] [ 0 0 0 1 2006 ] 0
45476.891: GenCollectForAllocation [ 294
2 3 ] [ 9 3 13 1 2052 ] 2
46288.453: GenCollectForAllocation [ 295
0 2 ] [ 0 4 5 1 211 ] 0
47016.430: GenCollectForAllocation [ 294
4 4 ] [ 0 0 0 1 2408 ] 0
48662.230: GenCollectForAllocation [ 293
1 4 ] [ 0 0 0 1 315 ] 1
48907.250: GenCollectForAllocation [ 296
3 6 ] [ 0 0 0 1 421 ] 0
50662.195: GenCollectForAllocation [ 294
3 4 ] [ 0 0 0 1 2043 ] 1
54128.828: GenCollectForAllocation [ 295
2 2 ] [ 1 1 3 2 2660 ] 2
57729.141: GenCollectForAllocation [ 298
1 4 ] [ 0 0 0 1 1926 ] 0
We are on java7 u40, 32bit version on RHEL5.5.
The JVM settings are:
-server -Xmx2048m -Xms2048m -XX:PermSize=256m -XX:MaxPermSize=512m
-Dsun.rmi.dgc.client.gcInterval=1800000
-Dsun.rmi.dgc.server.gcInterval=1800000 -XX:+DisableExplicitGC
-XX:+UseParNewGC -XX:MaxGCPauseMillis=3000
-verbose:gc -Xloggc:... -XX:+PrintGCDetails -XX:+PrintHeapAtGC
-XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
-XX:+TraceSafepointCleanupTime -XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-Djava.net.preferIPv4Stack=true
The question of the day is: what else can we do to diagnose the above?
Thanks in advance,
Bartek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131203/ba6b7c60/attachment.html
More information about the hotspot-gc-use
mailing list