Solaris specific bug in GC log rotation - extremely long pause
Poonam Bajaj Parhar
poonam.bajaj at oracle.com
Tue Nov 10 23:05:45 UTC 2015
Hello Rainer,
Could you please file a bug for this problem here:
http://bugreport.java.com/
Please provide the problem description there and attach the reproducible
program, and let us know the created bug id.
Thanks,
Poonam
On 11/10/2015 2:15 PM, Rainer Jung wrote:
> In case anyone is interested, I'll attach some reproduction code.
> You can run it as a standalone Java program on Solaris. It is pretty
> small and simple. A README is included.
>
> I hope the list accepts attachments.
>
> Regards,
>
> Rainer
>
> Am 10.11.2015 um 16:45 schrieb Rainer Jung:
>> There's a bug in the Solaris specific HotSpot code which exhibits itself
>> as extremely long pauses directly after GC log rotation.
>>
>> The bug can be seen in 1.7.0 patchlevel 80 and above, 1.8.0 patchlevel
>> 20 and above and 1.9.0. It was noticed using
>> -XX:PrintGCApplicationStoppedTime.
>>
>> During GC log rotation the function os::print_memory_info() in
>> src/os/solaris/vm/os_solaris.cpp is called, which itself calls
>> check_addr0().
>>
>> Now check_addr0() reads the file /proc/self/pmap, but does this entry by
>> entry. Productive Java processes often have 1000 and more map entries
>> (e.g. each thread stack is one entry, often each jar etc.
>>
>> Reading one entry takes about 35-40ms in a zone on a T4 system. Reading
>> e.g. 1000 entries takes 35-40 seconds during which the JVM is stopped.
>> We experienced even stop times up until 2400 seconds in extreme cases.
>>
>> Note that this doesn't have to do with GC. It only happens directly
>> after GC log file rotation.
>>
>> The bug is reading the file entry by entry instead of using pread() to
>> read it in one go. Look for instance at the source code of the Solaris
>> command "pmap" which uses pread() to read the file at once.
>>
>> It might also be questioned, whether this map file check must happen
>> during a safepoint or could also be done outside, only synchronizing in
>> case a hit was found and a warning actually needs to be written to the
>> log file.
>>
>> The bug was shadowed by another bug until and including 1.7.0_76 and
>> 1.8.0_11: the close() that closes the map file after reading the entries
>> was erroneously put inside the read loop instead of after the read loop.
>> Thus these versions only read the first entry, then closed the file and
>> erred out when trying to read the next entry. So reading was quick.
>>
>> The responsible change was:
>>
>> Java 8:
>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/diff/7d28f4e15b61/src/os/solaris/vm/os_solaris.cpp
>>
>>
>>
>> Java 7:
>> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/diff/e50eb3195734/src/os/solaris/vm/os_solaris.cpp
>>
>>
>>
>> Example code for reading using pread() (going back to OpenSolaris):
>>
>> https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/ptools/pmap/pmap.c
>>
>>
>>
>> Look e.g. at function xmapping_iter() and there look for pread().
>>
>> See also the discussion thread starting at
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2015-November/002351.html.
>>
>>
>>
>> The analysis was done using source code review and realtime checks with
>> pstack and truss. It would be nice if you could instruct me how to get
>> this fixed. It is a bit specific because it only happens on Solaris and
>> when GC log rotation is active, but Solaris should be an important
>> platform for Oracle and GC log rotation will be used more and more.
>>
>> I do *not* have a support contract.
>>
>> Regards,
>>
>> Rainer
More information about the hotspot-gc-dev
mailing list