Long safepoint pause directly after GC log file rotation in 1.7.0_80
Rainer Jung
rainer.jung at kippdata.de
Tue Nov 10 12:47:34 UTC 2015
The pause is due to the call "(void) check_addr0(st)" in
os::print_memory_info().
The call reads "/proc/self/map". In our case it has for instance 1400
entries, and each read takes about 40 ms.
The same function check_addr0() is also used in
os::run_periodic_checks(). Not sure why it is also done directly after
each GC log rotation.
Regards,
Rainer
Am 10.11.2015 um 10:13 schrieb Rainer Jung:
> Addition: the longest pause that was experienced was more than 2400
> seconds ...
>
> And: platform is Solaris Sparc (T4). But we don't know whether it is
> platform dependent.
>
> It also happens on test systems, so I'll write a script that calls
> pstack when detection is detected to find out, what the threads are
> doing or where they are hanging.
>
> Regards,
>
> Rainer
>
> Am 09.11.2015 um 21:23 schrieb Rainer Jung:
>> Hi,
>>
>> after upgrading from 1.7.0_76 to 1.7.0_80 we experience long pauses
>> directly after a GC log rotation.
>>
>> The pause duration varies due to application and load but is in the
>> range of 6 seconds to 60 seconds. There is no GC involved, i.e. no GC
>> output is written related to these pauses.
>>
>> Example:
>>
>> Previous file ends with:
>>
>> 2015-11-09T01:28:36.832+0100: 38461,486: Application time: 5,2840810
>> seconds
>> {Heap before GC invocations=7366 (full 8):
>> par new generation total 458752K, used 442678K [0xfffffffe00400000,
>> 0xfffffffe20400000, 0xfffffffe20400000)
>> eden space 393216K, 100% used [0xfffffffe00400000,
>> 0xfffffffe18400000, 0xfffffffe18400000)
>> from space 65536K, 75% used [0xfffffffe18400000, 0xfffffffe1b44d998,
>> 0xfffffffe1c400000)
>> to space 65536K, 0% used [0xfffffffe1c400000, 0xfffffffe1c400000,
>> 0xfffffffe20400000)
>> concurrent mark-sweep generation total 3670016K, used 1887085K
>> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
>> concurrent-mark-sweep perm gen total 524288K, used 453862K
>> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
>> 2015-11-09T01:28:36.839+0100: 38461,493:
>> [GC2015-11-09T01:28:36.840+0100: 38461,493: [ParNew
>> Desired survivor size 33554432 bytes, new threshold 16 (max 31)
>> - age 1: 2964800 bytes, 2964800 total
>> - age 2: 2628048 bytes, 5592848 total
>> - age 3: 1415792 bytes, 7008640 total
>> - age 4: 1354008 bytes, 8362648 total
>> - age 5: 1132056 bytes, 9494704 total
>> - age 6: 1334072 bytes, 10828776 total
>> - age 7: 1407336 bytes, 12236112 total
>> - age 8: 3321304 bytes, 15557416 total
>> - age 9: 1531064 bytes, 17088480 total
>> - age 10: 2453024 bytes, 19541504 total
>> - age 11: 2797616 bytes, 22339120 total
>> - age 12: 1698584 bytes, 24037704 total
>> - age 13: 1870064 bytes, 25907768 total
>> - age 14: 2211528 bytes, 28119296 total
>> - age 15: 3626888 bytes, 31746184 total
>> : 442678K->37742K(458752K), 0,0802687 secs]
>> 2329763K->1924827K(4128768K), 0,0812120 secs] [Times: user=0,90
>> sys=0,03, real=0,08 secs]
>> Heap after GC invocations=7367 (full 8):
>> par new generation total 458752K, used 37742K [0xfffffffe00400000,
>> 0xfffffffe20400000, 0xfffffffe20400000)
>> eden space 393216K, 0% used [0xfffffffe00400000,
>> 0xfffffffe00400000, 0xfffffffe18400000)
>> from space 65536K, 57% used [0xfffffffe1c400000, 0xfffffffe1e8db9a0,
>> 0xfffffffe20400000)
>> to space 65536K, 0% used [0xfffffffe18400000, 0xfffffffe18400000,
>> 0xfffffffe1c400000)
>> concurrent mark-sweep generation total 3670016K, used 1887085K
>> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
>> concurrent-mark-sweep perm gen total 524288K, used 453862K
>> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
>> }
>> ....
>> 2015-11-09T01:28:36.921+0100: 38461,575: Total time for which
>> application threads were stopped: 0,0888232 seconds, Stopping threads
>> took: 0,0005420 seconds
>> 2015-11-09T01:28:59.821+0100: 38484,474: Application time: 0,0002954
>> seconds
>> 2015-11-09T01:28:59.823+0100: 38484,477: Total time for which
>> application threads were stopped: 0,0026081 seconds, Stopping threads
>> took: 0,0004146 seconds
>> 2015-11-09T01:28:59.824+0100: 38484,477: Application time: 0,0003073
>> seconds
>> 2015-11-09T01:28:59.826+0100: 38484,480: Total time for which
>> application threads were stopped: 0,0025411 seconds, Stopping threads
>> took: 0,0004064 seconds
>> 2015-11-09T01:28:59.827+0100: 38484,480: Application time: 0,0002885
>> seconds
>> 2015-11-09 01:28:59 GC log file has reached the maximum size. Saved as
>> ./application/logs-a/mkb_gc.log.2
>>
>> This output looks normal. Last timestamp is 2015-11-09T01:28:59.827
>>
>> Now the next file begins:
>>
>> 2015-11-09 01:28:59 GC log file created ./application/logs-a/mkb_gc.log.3
>> Java HotSpot(TM) 64-Bit Server VM (24.80-b11) for solaris-sparc JRE
>> (1.7.0_80-b15), built on Apr 10 2015 18:47:18 by "" with Sun Studio 12u1
>> Memory: 8k page, physical 133693440k(14956840k free)
>> CommandLine flags: -XX:AllocateInstancePrefetchLines=2
>> -XX:AllocatePrefetchInstr=1 -XX:AllocatePrefetchLines=6
>> -XX:AllocatePrefetchStyle=3 -XX:+CMSClassUnloadingEnabled
>> -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
>> -XX:GCLogFileSize=10485760 -XX:InitialHeapSize=4294967296
>> -XX:MaxHeapSize=4294967296 -XX:MaxNewSize=536870912
>> -XX:MaxPermSize=536870912 -XX:MaxTenuringThreshold=31
>> -XX:NewSize=536870912 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16
>> -XX:ParallelGCThreads=16 -XX:PermSize=536870912 -XX:+PrintGC
>> -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:SurvivorRatio=6
>> -XX:-UseAdaptiveSizePolicy -XX:+UseCompressedOops
>> -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC
>> 2015-11-09T01:29:55.640+0100: 38540,292: Total time for which
>> application threads were stopped: 55,8119519 seconds, Stopping threads
>> took: 0,0003857 seconds
>> 2015-11-09T01:29:55.648+0100: 38540,299: Application time: 0,0076173
>> seconds
>>
>> Note the 55.8 seconds pause directly after printing the flags and the
>> consistent timestamp jump from 01:28:59 to 01:29:55. There's no GC
>> output, although verbose GC is active and works. For some other reason
>> there is a very long safepoint. Note also, that the time is not due to
>> waiting until the safepoint is reached. At least the log claims that
>> reaching the safepoint only took 0.008 seconds. Also at that timeof day
>> the servers are not very busy.
>>
>> Is there any idea, what happens here? Anything that rings a bell between
>> 1.7.0_76 and 1.7.0_80? Why should there be a long safepoint directly
>> after GC rotation opened a new file?
>>
>> I searched the bug parade, but didn't find a good hit. There was also
>> nothing in the change for JDK-7164841 that seemed immediately
>> responsible for a long pause.
>>
>> Unfortunately this happens on a production system and the first thing
>> was to roll back to the old Java version.Not sure, how good this will be
>> reproducible on a test system (will check tomorrow).
>>
>> Thanks for any hint,
>>
>> Rainer
More information about the hotspot-gc-use
mailing list