[8u] RFR: 8144483: One long Safepoint pause directly after each GC log rotation
Rainer Jung
rainer.jung at kippdata.de
Thu Jan 7 03:43:04 UTC 2016
Hi all,
I'm the original reporter of the problem.
Am 06.01.2016 um 17:18 schrieb Yumin Qi:
> Cheleswer,
>
> If the call to check_addr0() is unnecessary, remove it is a better
> choice I think. In fatal error, means VM will exit, the error handler
> will print memory info, so it is redundant here. I don't know why it is
> added to log rotation code in the first place.
When I first investigated the problem, I noticed that the same function
check_addr0() is also used in os::run_periodic_checks(). The whole code
goes back to
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6478739
I also wondered why it was called during GC log rotation but didn't
investigate further.
Note that in addition to the problem of the long safepoint pause during
GC log file rotation, this bug has a second aspect, namely that reading
the maps file is done in a very inefficient and slow way. So even after
removing the call to check_addr0() during gc log file rotation - which
would fix this bug here - whatever code remains to call check_addr0(),
will still notice that the call takes a very long time (e.g. more than
30 seconds if your process has lots of threads and open jar files).
This aspect was triggered by an older fix of the code which reads the
map file. Before the fix, the file was closed after reading the first
record from it. That was a bug but it terminated the read loop early and
the call returned quickly. After the old fix, the close was moved behind
the read loop and now the full read loop executed and took too long. As
an example for the old fix see
http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/diff/e50eb3195734/src/os/solaris/vm/os_solaris.cpp
I think the correct (and fast) way of reading the maps file can be seen
by looking at the code of the Solaris "pmap" command. It reads the file
in one go using the pread() call instead of reading lots of structs one
by one. Therefore pmap executes very fast, much faster than
check_addr0() for the same maps file. I did check that on the same map
file, that showed the slowness bug in check_addr0().
Since the pmap code isn't available in the open, I looked at the Illumos
code based on older OpenSolaris. There one can see:
In
https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/cmd/ptools/pmap/pmap.c
- calls Pmapping_iter_resolved() to iterate over the maps
In
https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/lib/libproc/common/Psymtab.c
- Pmapping_iter_resolved() calls i_Pmapping_iter()
- i_Pmapping_iter() calls Prd_agent()
- Prd_agent() calls Pupdate_maps()
- Pupdate_maps() calls Preadmaps()
- Preadmaps() calls P->ops.pop_read_maps()
In
https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/lib/libproc/common/Pcontrol.c
- P->ops.pop_read_maps is Pread_maps_live()
- Pread_maps_live() (starting at line 100) uses pread() to read the
whole file in one go into a buffer, that was sized using the st_size
field in the stat returned from fstat() called on the map file.
So I suggest you also fix the slowness in check_addr0() by adjusting the
code that reads the map file in a similar way. If you think that's to
dangerous for a huge file, then you might consider at least reading the
file in big enough chunks (e.g. 100 structs at a time).
Regards,
Rainer
> On 1/6/2016 3:37 AM, cheleswer sahu wrote:
>> Hi,
>>
>> Please review the code changes for
>> "https://bugs.openjdk.java.net/browse/JDK-8144483".
>>
>> webrev link: http://cr.openjdk.java.net/~kevinw/8144483/webrev.00/
>> JBS link: https://bugs.openjdk.java.net/browse/JDK-8144483
>>
>> Bug brief:
>> A long pause is observed after each gc log rotation in Solaris.
>>
>> Problem Identified:
>> In each GC log rotation "print_memory_info()" is getting called
>> through dump_loggc_header().
>> "print_memory_info()" of solaris version calls check_addr0(st);
>>
>> File:
>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/80959a760b85/src/os/solaris/vm/os_solaris.cpp
>>
>>
>> void os::print_memory_info(outputStream* st) {
>> st->print("Memory:");
>>
>> st->print(" %dk page", os::vm_page_size()>>10);
>>
>> st->print(", physical " UINT64_FORMAT "k", os::physical_memory()>>10);
>>
>> st->print("(" UINT64_FORMAT "k free)", os::available_memory() >> 10);
>>
>> st->cr();
>> (void) check_addr0(st);
>> }
>>
>> Now check_addr0(st) function do a lot of read call to read the data
>> from /proc/self/map.
>> and check if virtual address is mapped to 0x0. These read calls take
>> lot of time which results in GC rotation pause.
>> Here calling check_addr0() seems unnecessary for every log rotation.
>> It will be more logical if this function gets called only
>> when an error is reported.
>>
>> Solution proposed:
>> Before GC log rotation print_memory_info() is ever getting called from
>> Vm_error.cpp during error reporting. And in case of error reporting
>> checking for address mapping to 0x0 looks fine. So the proposed
>> solution is to do an extra check inside print_memory_info().
>>
>> - (void) check_addr0(st);
>> + if (VMError::fatal_error_in_progress()){
>> + (void) check_addr0(st);
>> + }
>> }
>>
>> This check doesn't fit well inside printing function, but at this
>> point I don't see the need to create a new os:: method and change all
>> the OS classes just for that check.
>>
>> Regards,
>> Cheleswer
More information about the hotspot-runtime-dev
mailing list