Long safepoint pause directly after GC log file rotation in 1.7.0_80

Rainer Jung rainer.jung at kippdata.de
Tue Nov 10 09:13:35 UTC 2015


Addition: the longest pause that was experienced was more than 2400 
seconds ...

And: platform is Solaris Sparc (T4). But we don't know whether it is 
platform dependent.

It also happens on test systems, so I'll write a script that calls 
pstack when detection is detected to find out, what the threads are 
doing or where they are hanging.

Regards,

Rainer

Am 09.11.2015 um 21:23 schrieb Rainer Jung:
> Hi,
>
> after upgrading from 1.7.0_76 to 1.7.0_80 we experience long pauses
> directly after a GC log rotation.
>
> The pause duration varies due to application and load but is in the
> range of 6 seconds to 60 seconds. There is no GC involved, i.e. no GC
> output is written related to these pauses.
>
> Example:
>
> Previous file ends with:
>
> 2015-11-09T01:28:36.832+0100: 38461,486: Application time: 5,2840810
> seconds
> {Heap before GC invocations=7366 (full 8):
> par new generation   total 458752K, used 442678K [0xfffffffe00400000,
> 0xfffffffe20400000, 0xfffffffe20400000)
>    eden space 393216K, 100% used [0xfffffffe00400000,
> 0xfffffffe18400000, 0xfffffffe18400000)
>    from space 65536K,  75% used [0xfffffffe18400000, 0xfffffffe1b44d998,
> 0xfffffffe1c400000)
>    to   space 65536K,   0% used [0xfffffffe1c400000, 0xfffffffe1c400000,
> 0xfffffffe20400000)
> concurrent mark-sweep generation total 3670016K, used 1887085K
> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
> concurrent-mark-sweep perm gen total 524288K, used 453862K
> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
> 2015-11-09T01:28:36.839+0100: 38461,493:
> [GC2015-11-09T01:28:36.840+0100: 38461,493: [ParNew
> Desired survivor size 33554432 bytes, new threshold 16 (max 31)
> - age   1:    2964800 bytes,    2964800 total
> - age   2:    2628048 bytes,    5592848 total
> - age   3:    1415792 bytes,    7008640 total
> - age   4:    1354008 bytes,    8362648 total
> - age   5:    1132056 bytes,    9494704 total
> - age   6:    1334072 bytes,   10828776 total
> - age   7:    1407336 bytes,   12236112 total
> - age   8:    3321304 bytes,   15557416 total
> - age   9:    1531064 bytes,   17088480 total
> - age  10:    2453024 bytes,   19541504 total
> - age  11:    2797616 bytes,   22339120 total
> - age  12:    1698584 bytes,   24037704 total
> - age  13:    1870064 bytes,   25907768 total
> - age  14:    2211528 bytes,   28119296 total
> - age  15:    3626888 bytes,   31746184 total
> : 442678K->37742K(458752K), 0,0802687 secs]
> 2329763K->1924827K(4128768K), 0,0812120 secs] [Times: user=0,90
> sys=0,03, real=0,08 secs]
> Heap after GC invocations=7367 (full 8):
> par new generation   total 458752K, used 37742K [0xfffffffe00400000,
> 0xfffffffe20400000, 0xfffffffe20400000)
>    eden space 393216K,   0% used [0xfffffffe00400000,
> 0xfffffffe00400000, 0xfffffffe18400000)
>    from space 65536K,  57% used [0xfffffffe1c400000, 0xfffffffe1e8db9a0,
> 0xfffffffe20400000)
>    to   space 65536K,   0% used [0xfffffffe18400000, 0xfffffffe18400000,
> 0xfffffffe1c400000)
> concurrent mark-sweep generation total 3670016K, used 1887085K
> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
> concurrent-mark-sweep perm gen total 524288K, used 453862K
> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
> }
> ....
> 2015-11-09T01:28:36.921+0100: 38461,575: Total time for which
> application threads were stopped: 0,0888232 seconds, Stopping threads
> took: 0,0005420 seconds
> 2015-11-09T01:28:59.821+0100: 38484,474: Application time: 0,0002954
> seconds
> 2015-11-09T01:28:59.823+0100: 38484,477: Total time for which
> application threads were stopped: 0,0026081 seconds, Stopping threads
> took: 0,0004146 seconds
> 2015-11-09T01:28:59.824+0100: 38484,477: Application time: 0,0003073
> seconds
> 2015-11-09T01:28:59.826+0100: 38484,480: Total time for which
> application threads were stopped: 0,0025411 seconds, Stopping threads
> took: 0,0004064 seconds
> 2015-11-09T01:28:59.827+0100: 38484,480: Application time: 0,0002885
> seconds
> 2015-11-09 01:28:59 GC log file has reached the maximum size. Saved as
> ./application/logs-a/mkb_gc.log.2
>
> This output looks normal. Last timestamp is 2015-11-09T01:28:59.827
>
> Now the next file begins:
>
> 2015-11-09 01:28:59 GC log file created ./application/logs-a/mkb_gc.log.3
> Java HotSpot(TM) 64-Bit Server VM (24.80-b11) for solaris-sparc JRE
> (1.7.0_80-b15), built on Apr 10 2015 18:47:18 by "" with Sun Studio 12u1
> Memory: 8k page, physical 133693440k(14956840k free)
> CommandLine flags: -XX:AllocateInstancePrefetchLines=2
> -XX:AllocatePrefetchInstr=1 -XX:AllocatePrefetchLines=6
> -XX:AllocatePrefetchStyle=3 -XX:+CMSClassUnloadingEnabled
> -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
> -XX:GCLogFileSize=10485760 -XX:InitialHeapSize=4294967296
> -XX:MaxHeapSize=4294967296 -XX:MaxNewSize=536870912
> -XX:MaxPermSize=536870912 -XX:MaxTenuringThreshold=31
> -XX:NewSize=536870912 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16
> -XX:ParallelGCThreads=16 -XX:PermSize=536870912 -XX:+PrintGC
> -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:SurvivorRatio=6
> -XX:-UseAdaptiveSizePolicy -XX:+UseCompressedOops
> -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC
> 2015-11-09T01:29:55.640+0100: 38540,292: Total time for which
> application threads were stopped: 55,8119519 seconds, Stopping threads
> took: 0,0003857 seconds
> 2015-11-09T01:29:55.648+0100: 38540,299: Application time: 0,0076173
> seconds
>
> Note the 55.8 seconds pause directly after printing the flags and the
> consistent timestamp jump from 01:28:59 to 01:29:55. There's no GC
> output, although verbose GC is active and works. For some other reason
> there is a very long safepoint. Note also, that the time is not due to
> waiting until the safepoint is reached. At least the log claims that
> reaching the safepoint only took 0.008 seconds. Also at that timeof day
> the servers are not very busy.
>
> Is there any idea, what happens here? Anything that rings a bell between
> 1.7.0_76 and 1.7.0_80? Why should there be a long safepoint directly
> after GC rotation opened a new file?
>
> I searched the bug parade, but didn't find a good hit. There was also
> nothing in the change for JDK-7164841 that seemed immediately
> responsible for a long pause.
>
> Unfortunately this happens on a production system and the first thing
> was to roll back to the old Java version.Not sure, how good this will be
> reproducible on a test system (will check tomorrow).
>
> Thanks for any hint,
>
> Rainer


More information about the hotspot-gc-use mailing list