Long safepoint pause directly after GC log file rotation in 1.7.0_80
Rainer Jung
rainer.jung at kippdata.de
Tue Nov 10 13:24:02 UTC 2015
Am 10.11.2015 um 13:47 schrieb Rainer Jung:
> The pause is due to the call "(void) check_addr0(st)" in
> os::print_memory_info().
>
> The call reads "/proc/self/map". In our case it has for instance 1400
> entries, and each read takes about 40 ms.
>
> The same function check_addr0() is also used in
> os::run_periodic_checks(). Not sure why it is also done directly after
> each GC log rotation.
Note that the solaris command pmap reads the same file in one go using
the pread() call instead of reading lots of entries one by one.
Therefore pmap executes very quick, but check_addr0() not.
If I copy the check_addr0() code in a separate executable and run it
from there, reading the map file of the Java process, I can reproduce
the long runtime.
So it seems this is unfortunately an example of bad (very inefficient)
programming.
The only thing I could not yet analyze, is what the difference to
1.7.0_76 is, where the problem doesn't happen. Work in progress.
It looks like we should open a bug?
Regards,
Rainer
> Am 10.11.2015 um 10:13 schrieb Rainer Jung:
>> Addition: the longest pause that was experienced was more than 2400
>> seconds ...
>>
>> And: platform is Solaris Sparc (T4). But we don't know whether it is
>> platform dependent.
>>
>> It also happens on test systems, so I'll write a script that calls
>> pstack when detection is detected to find out, what the threads are
>> doing or where they are hanging.
>>
>> Regards,
>>
>> Rainer
>>
>> Am 09.11.2015 um 21:23 schrieb Rainer Jung:
>>> Hi,
>>>
>>> after upgrading from 1.7.0_76 to 1.7.0_80 we experience long pauses
>>> directly after a GC log rotation.
>>>
>>> The pause duration varies due to application and load but is in the
>>> range of 6 seconds to 60 seconds. There is no GC involved, i.e. no GC
>>> output is written related to these pauses.
>>>
>>> Example:
>>>
>>> Previous file ends with:
>>>
>>> 2015-11-09T01:28:36.832+0100: 38461,486: Application time: 5,2840810
>>> seconds
>>> {Heap before GC invocations=7366 (full 8):
>>> par new generation total 458752K, used 442678K [0xfffffffe00400000,
>>> 0xfffffffe20400000, 0xfffffffe20400000)
>>> eden space 393216K, 100% used [0xfffffffe00400000,
>>> 0xfffffffe18400000, 0xfffffffe18400000)
>>> from space 65536K, 75% used [0xfffffffe18400000, 0xfffffffe1b44d998,
>>> 0xfffffffe1c400000)
>>> to space 65536K, 0% used [0xfffffffe1c400000, 0xfffffffe1c400000,
>>> 0xfffffffe20400000)
>>> concurrent mark-sweep generation total 3670016K, used 1887085K
>>> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
>>> concurrent-mark-sweep perm gen total 524288K, used 453862K
>>> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
>>> 2015-11-09T01:28:36.839+0100: 38461,493:
>>> [GC2015-11-09T01:28:36.840+0100: 38461,493: [ParNew
>>> Desired survivor size 33554432 bytes, new threshold 16 (max 31)
>>> - age 1: 2964800 bytes, 2964800 total
>>> - age 2: 2628048 bytes, 5592848 total
>>> - age 3: 1415792 bytes, 7008640 total
>>> - age 4: 1354008 bytes, 8362648 total
>>> - age 5: 1132056 bytes, 9494704 total
>>> - age 6: 1334072 bytes, 10828776 total
>>> - age 7: 1407336 bytes, 12236112 total
>>> - age 8: 3321304 bytes, 15557416 total
>>> - age 9: 1531064 bytes, 17088480 total
>>> - age 10: 2453024 bytes, 19541504 total
>>> - age 11: 2797616 bytes, 22339120 total
>>> - age 12: 1698584 bytes, 24037704 total
>>> - age 13: 1870064 bytes, 25907768 total
>>> - age 14: 2211528 bytes, 28119296 total
>>> - age 15: 3626888 bytes, 31746184 total
>>> : 442678K->37742K(458752K), 0,0802687 secs]
>>> 2329763K->1924827K(4128768K), 0,0812120 secs] [Times: user=0,90
>>> sys=0,03, real=0,08 secs]
>>> Heap after GC invocations=7367 (full 8):
>>> par new generation total 458752K, used 37742K [0xfffffffe00400000,
>>> 0xfffffffe20400000, 0xfffffffe20400000)
>>> eden space 393216K, 0% used [0xfffffffe00400000,
>>> 0xfffffffe00400000, 0xfffffffe18400000)
>>> from space 65536K, 57% used [0xfffffffe1c400000, 0xfffffffe1e8db9a0,
>>> 0xfffffffe20400000)
>>> to space 65536K, 0% used [0xfffffffe18400000, 0xfffffffe18400000,
>>> 0xfffffffe1c400000)
>>> concurrent mark-sweep generation total 3670016K, used 1887085K
>>> [0xfffffffe20400000, 0xffffffff00400000, 0xffffffff00400000)
>>> concurrent-mark-sweep perm gen total 524288K, used 453862K
>>> [0xffffffff00400000, 0xffffffff20400000, 0xffffffff20400000)
>>> }
>>> ....
>>> 2015-11-09T01:28:36.921+0100: 38461,575: Total time for which
>>> application threads were stopped: 0,0888232 seconds, Stopping threads
>>> took: 0,0005420 seconds
>>> 2015-11-09T01:28:59.821+0100: 38484,474: Application time: 0,0002954
>>> seconds
>>> 2015-11-09T01:28:59.823+0100: 38484,477: Total time for which
>>> application threads were stopped: 0,0026081 seconds, Stopping threads
>>> took: 0,0004146 seconds
>>> 2015-11-09T01:28:59.824+0100: 38484,477: Application time: 0,0003073
>>> seconds
>>> 2015-11-09T01:28:59.826+0100: 38484,480: Total time for which
>>> application threads were stopped: 0,0025411 seconds, Stopping threads
>>> took: 0,0004064 seconds
>>> 2015-11-09T01:28:59.827+0100: 38484,480: Application time: 0,0002885
>>> seconds
>>> 2015-11-09 01:28:59 GC log file has reached the maximum size. Saved as
>>> ./application/logs-a/mkb_gc.log.2
>>>
>>> This output looks normal. Last timestamp is 2015-11-09T01:28:59.827
>>>
>>> Now the next file begins:
>>>
>>> 2015-11-09 01:28:59 GC log file created
>>> ./application/logs-a/mkb_gc.log.3
>>> Java HotSpot(TM) 64-Bit Server VM (24.80-b11) for solaris-sparc JRE
>>> (1.7.0_80-b15), built on Apr 10 2015 18:47:18 by "" with Sun Studio 12u1
>>> Memory: 8k page, physical 133693440k(14956840k free)
>>> CommandLine flags: -XX:AllocateInstancePrefetchLines=2
>>> -XX:AllocatePrefetchInstr=1 -XX:AllocatePrefetchLines=6
>>> -XX:AllocatePrefetchStyle=3 -XX:+CMSClassUnloadingEnabled
>>> -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
>>> -XX:GCLogFileSize=10485760 -XX:InitialHeapSize=4294967296
>>> -XX:MaxHeapSize=4294967296 -XX:MaxNewSize=536870912
>>> -XX:MaxPermSize=536870912 -XX:MaxTenuringThreshold=31
>>> -XX:NewSize=536870912 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16
>>> -XX:ParallelGCThreads=16 -XX:PermSize=536870912 -XX:+PrintGC
>>> -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
>>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:SurvivorRatio=6
>>> -XX:-UseAdaptiveSizePolicy -XX:+UseCompressedOops
>>> -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC
>>> 2015-11-09T01:29:55.640+0100: 38540,292: Total time for which
>>> application threads were stopped: 55,8119519 seconds, Stopping threads
>>> took: 0,0003857 seconds
>>> 2015-11-09T01:29:55.648+0100: 38540,299: Application time: 0,0076173
>>> seconds
>>>
>>> Note the 55.8 seconds pause directly after printing the flags and the
>>> consistent timestamp jump from 01:28:59 to 01:29:55. There's no GC
>>> output, although verbose GC is active and works. For some other reason
>>> there is a very long safepoint. Note also, that the time is not due to
>>> waiting until the safepoint is reached. At least the log claims that
>>> reaching the safepoint only took 0.008 seconds. Also at that timeof day
>>> the servers are not very busy.
>>>
>>> Is there any idea, what happens here? Anything that rings a bell between
>>> 1.7.0_76 and 1.7.0_80? Why should there be a long safepoint directly
>>> after GC rotation opened a new file?
>>>
>>> I searched the bug parade, but didn't find a good hit. There was also
>>> nothing in the change for JDK-7164841 that seemed immediately
>>> responsible for a long pause.
>>>
>>> Unfortunately this happens on a production system and the first thing
>>> was to roll back to the old Java version.Not sure, how good this will be
>>> reproducible on a test system (will check tomorrow).
>>>
>>> Thanks for any hint,
>>>
>>> Rainer
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
More information about the hotspot-gc-use
mailing list