Long safepoint pause directly after GC log file rotation in 1.7.0_80

Tue Nov 10 13:31:49 UTC 2015

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6478739 relates (&
states it is only for solaris)

On Tue, Nov 10, 2015 at 12:47 PM, Rainer Jung <rainer.jung at kippdata.de> wrote:
> The pause is due to the call "(void) check_addr0(st)" in
> os::print_memory_info().
>
> The call reads "/proc/self/map". In our case it has for instance 1400
> entries, and each read takes about 40 ms.
>
> The same function check_addr0() is also used in os::run_periodic_checks().
> Not sure why it is also done directly after each GC log rotation.
>
> Regards,
>
> Rainer
>
>
> Am 10.11.2015 um 10:13 schrieb Rainer Jung:
>>
>> Addition: the longest pause that was experienced was more than 2400
>> seconds ...
>>
>> And: platform is Solaris Sparc (T4). But we don't know whether it is
>> platform dependent.
>>
>> It also happens on test systems, so I'll write a script that calls
>> pstack when detection is detected to find out, what the threads are
>> doing or where they are hanging.
>>
>> Regards,
>>
>> Rainer
>>
>> Am 09.11.2015 um 21:23 schrieb Rainer Jung:
>>>
>>> Hi,
>>>
>>> after upgrading from 1.7.0_76 to 1.7.0_80 we experience long pauses
>>> directly after a GC log rotation.
>>>
>>> The pause duration varies due to application and load but is in the
>>> range of 6 seconds to 60 seconds. There is no GC involved, i.e. no GC
>>> output is written related to these pauses.
>>>
>>> Example:
>>>
>>> Previous file ends with:
>>>
>>> 2015-11-09T01:28:36.832+0100: 38461,486: Application time: 5,2840810
>>> seconds
>>> {Heap before GC invocations=7366 (full 8):
>>> par new generation   total 458752K, used 442678K [0xfffffffe00400000,
>>> 0xfffffffe20400000, 0xfffffffe20400000)
>>>    eden space 393216K, 100% used [0xfffffffe00400000,
>>> 0xfffffffe18400000, 0xfffffffe18400000)
>>>    from space 65536K,  75% used [0xfffffffe18400000, 0xfffffffe1b44d998,
>>> 0xfffffffe1c400000)
>>>    to   space 65536K,   0% used [0xfffffffe1c400000, 0xfffffffe1c400000,
>>> 0xfffffffe20400000)
>>> concurrent mark-sweep generation total 3670016K, used 1887085K
>>> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
>>> concurrent-mark-sweep perm gen total 524288K, used 453862K
>>> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
>>> 2015-11-09T01:28:36.839+0100: 38461,493:
>>> [GC2015-11-09T01:28:36.840+0100: 38461,493: [ParNew
>>> Desired survivor size 33554432 bytes, new threshold 16 (max 31)
>>> - age   1:    2964800 bytes,    2964800 total
>>> - age   2:    2628048 bytes,    5592848 total
>>> - age   3:    1415792 bytes,    7008640 total
>>> - age   4:    1354008 bytes,    8362648 total
>>> - age   5:    1132056 bytes,    9494704 total
>>> - age   6:    1334072 bytes,   10828776 total
>>> - age   7:    1407336 bytes,   12236112 total
>>> - age   8:    3321304 bytes,   15557416 total
>>> - age   9:    1531064 bytes,   17088480 total
>>> - age  10:    2453024 bytes,   19541504 total
>>> - age  11:    2797616 bytes,   22339120 total
>>> - age  12:    1698584 bytes,   24037704 total
>>> - age  13:    1870064 bytes,   25907768 total
>>> - age  14:    2211528 bytes,   28119296 total
>>> - age  15:    3626888 bytes,   31746184 total
>>> : 442678K->37742K(458752K), 0,0802687 secs]
>>> 2329763K->1924827K(4128768K), 0,0812120 secs] [Times: user=0,90
>>> sys=0,03, real=0,08 secs]
>>> Heap after GC invocations=7367 (full 8):
>>> par new generation   total 458752K, used 37742K [0xfffffffe00400000,
>>> 0xfffffffe20400000, 0xfffffffe20400000)
>>>    eden space 393216K,   0% used [0xfffffffe00400000,
>>> 0xfffffffe00400000, 0xfffffffe18400000)
>>>    from space 65536K,  57% used [0xfffffffe1c400000, 0xfffffffe1e8db9a0,
>>> 0xfffffffe20400000)
>>>    to   space 65536K,   0% used [0xfffffffe18400000, 0xfffffffe18400000,
>>> 0xfffffffe1c400000)
>>> concurrent mark-sweep generation total 3670016K, used 1887085K
>>> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
>>> concurrent-mark-sweep perm gen total 524288K, used 453862K
>>> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
>>> }
>>> ....
>>> 2015-11-09T01:28:36.921+0100: 38461,575: Total time for which
>>> application threads were stopped: 0,0888232 seconds, Stopping threads
>>> took: 0,0005420 seconds
>>> 2015-11-09T01:28:59.821+0100: 38484,474: Application time: 0,0002954
>>> seconds
>>> 2015-11-09T01:28:59.823+0100: 38484,477: Total time for which
>>> application threads were stopped: 0,0026081 seconds, Stopping threads
>>> took: 0,0004146 seconds
>>> 2015-11-09T01:28:59.824+0100: 38484,477: Application time: 0,0003073
>>> seconds
>>> 2015-11-09T01:28:59.826+0100: 38484,480: Total time for which
>>> application threads were stopped: 0,0025411 seconds, Stopping threads
>>> took: 0,0004064 seconds
>>> 2015-11-09T01:28:59.827+0100: 38484,480: Application time: 0,0002885
>>> seconds
>>> 2015-11-09 01:28:59 GC log file has reached the maximum size. Saved as
>>> ./application/logs-a/mkb_gc.log.2
>>>
>>> This output looks normal. Last timestamp is 2015-11-09T01:28:59.827
>>>
>>> Now the next file begins:
>>>
>>> 2015-11-09 01:28:59 GC log file created ./application/logs-a/mkb_gc.log.3
>>> Java HotSpot(TM) 64-Bit Server VM (24.80-b11) for solaris-sparc JRE
>>> (1.7.0_80-b15), built on Apr 10 2015 18:47:18 by "" with Sun Studio 12u1
>>> Memory: 8k page, physical 133693440k(14956840k free)
>>> CommandLine flags: -XX:AllocateInstancePrefetchLines=2
>>> -XX:AllocatePrefetchInstr=1 -XX:AllocatePrefetchLines=6
>>> -XX:AllocatePrefetchStyle=3 -XX:+CMSClassUnloadingEnabled
>>> -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
>>> -XX:GCLogFileSize=10485760 -XX:InitialHeapSize=4294967296
>>> -XX:MaxHeapSize=4294967296 -XX:MaxNewSize=536870912
>>> -XX:MaxPermSize=536870912 -XX:MaxTenuringThreshold=31
>>> -XX:NewSize=536870912 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16
>>> -XX:ParallelGCThreads=16 -XX:PermSize=536870912 -XX:+PrintGC
>>> -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
>>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:SurvivorRatio=6
>>> -XX:-UseAdaptiveSizePolicy -XX:+UseCompressedOops
>>> -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC
>>> 2015-11-09T01:29:55.640+0100: 38540,292: Total time for which
>>> application threads were stopped: 55,8119519 seconds, Stopping threads
>>> took: 0,0003857 seconds
>>> 2015-11-09T01:29:55.648+0100: 38540,299: Application time: 0,0076173
>>> seconds
>>>
>>> Note the 55.8 seconds pause directly after printing the flags and the
>>> consistent timestamp jump from 01:28:59 to 01:29:55. There's no GC
>>> output, although verbose GC is active and works. For some other reason
>>> there is a very long safepoint. Note also, that the time is not due to
>>> waiting until the safepoint is reached. At least the log claims that
>>> reaching the safepoint only took 0.008 seconds. Also at that timeof day
>>> the servers are not very busy.
>>>
>>> Is there any idea, what happens here? Anything that rings a bell between
>>> 1.7.0_76 and 1.7.0_80? Why should there be a long safepoint directly
>>> after GC rotation opened a new file?
>>>
>>> I searched the bug parade, but didn't find a good hit. There was also
>>> nothing in the change for JDK-7164841 that seemed immediately
>>> responsible for a long pause.
>>>
>>> Unfortunately this happens on a production system and the first thing
>>> was to roll back to the old Java version.Not sure, how good this will be
>>> reproducible on a test system (will check tomorrow).
>>>
>>> Thanks for any hint,
>>>
>>> Rainer
>
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use