[8u] RFR: 8144483: One long Safepoint pause directly after each GC log rotation
Rainer Jung
rainer.jung at kippdata.de
Fri Jan 8 12:45:29 UTC 2016
Hello Kevin,
Am 08.01.2016 um 13:25 schrieb Kevin Walls:
>
> Hello Rainer, thanks for getting in touch -
>
> Cheleswer and I have been talking about how to do this - currently we
> plan to push the change below to 8u as it doesn't affect 9 and it solves
> the specific issue raised about log rotation. However as you say, the
> read itself should be more efficient: we'd like to handle that in a
> separate CR, which we'll log, as it makes more sense from our point of
> view when what we want in 8 and 9 are different.
>
> In 9, we don't have a problem at log rotation due to other changes.
> (Maybe when using -Xcheck:jni we can see the same pause, as
> run_periodic_checks() calls check_addr0(). I'm not sure offhand if slow
> -Xcheck:jni behaviour will block other work as the gc rotation pause did).
>
> I created https://bugs.openjdk.java.net/browse/JDK-8146683 for 9, and
> we'll consider backporting.
thanks a bunch for the feedback and for opening the separate CR. Sounds
like a good plan!
Regards,
Rainer
> On 07/01/2016 03:43, Rainer Jung wrote:
>> Hi all,
>>
>> I'm the original reporter of the problem.
>>
>> Am 06.01.2016 um 17:18 schrieb Yumin Qi:
>>> Cheleswer,
>>>
>>> If the call to check_addr0() is unnecessary, remove it is a better
>>> choice I think. In fatal error, means VM will exit, the error handler
>>> will print memory info, so it is redundant here. I don't know why it is
>>> added to log rotation code in the first place.
>>
>> When I first investigated the problem, I noticed that the same
>> function check_addr0() is also used in os::run_periodic_checks(). The
>> whole code goes back to
>>
>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6478739
>>
>> I also wondered why it was called during GC log rotation but didn't
>> investigate further.
>>
>> Note that in addition to the problem of the long safepoint pause
>> during GC log file rotation, this bug has a second aspect, namely that
>> reading the maps file is done in a very inefficient and slow way. So
>> even after removing the call to check_addr0() during gc log file
>> rotation - which would fix this bug here - whatever code remains to
>> call check_addr0(), will still notice that the call takes a very long
>> time (e.g. more than 30 seconds if your process has lots of threads
>> and open jar files).
>>
>> This aspect was triggered by an older fix of the code which reads the
>> map file. Before the fix, the file was closed after reading the first
>> record from it. That was a bug but it terminated the read loop early
>> and the call returned quickly. After the old fix, the close was moved
>> behind the read loop and now the full read loop executed and took too
>> long. As an example for the old fix see
>>
>> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/diff/e50eb3195734/src/os/solaris/vm/os_solaris.cpp
>>
>>
>> I think the correct (and fast) way of reading the maps file can be
>> seen by looking at the code of the Solaris "pmap" command. It reads
>> the file in one go using the pread() call instead of reading lots of
>> structs one by one. Therefore pmap executes very fast, much faster
>> than check_addr0() for the same maps file. I did check that on the
>> same map file, that showed the slowness bug in check_addr0().
>>
>> Since the pmap code isn't available in the open, I looked at the
>> Illumos code based on older OpenSolaris. There one can see:
>>
>> In
>> https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/cmd/ptools/pmap/pmap.c
>>
>>
>> - calls Pmapping_iter_resolved() to iterate over the maps
>>
>> In
>> https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/lib/libproc/common/Psymtab.c
>>
>>
>> - Pmapping_iter_resolved() calls i_Pmapping_iter()
>> - i_Pmapping_iter() calls Prd_agent()
>> - Prd_agent() calls Pupdate_maps()
>> - Pupdate_maps() calls Preadmaps()
>> - Preadmaps() calls P->ops.pop_read_maps()
>>
>> In
>> https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/lib/libproc/common/Pcontrol.c
>>
>>
>> - P->ops.pop_read_maps is Pread_maps_live()
>> - Pread_maps_live() (starting at line 100) uses pread() to read the
>> whole file in one go into a buffer, that was sized using the st_size
>> field in the stat returned from fstat() called on the map file.
>>
>> So I suggest you also fix the slowness in check_addr0() by adjusting
>> the code that reads the map file in a similar way. If you think that's
>> to dangerous for a huge file, then you might consider at least reading
>> the file in big enough chunks (e.g. 100 structs at a time).
>>
>> Regards,
>>
>> Rainer
>>
>>> On 1/6/2016 3:37 AM, cheleswer sahu wrote:
>>>> Hi,
>>>>
>>>> Please review the code changes for
>>>> "https://bugs.openjdk.java.net/browse/JDK-8144483".
>>>>
>>>> webrev link: http://cr.openjdk.java.net/~kevinw/8144483/webrev.00/
>>>> JBS link: https://bugs.openjdk.java.net/browse/JDK-8144483
>>>>
>>>> Bug brief:
>>>> A long pause is observed after each gc log rotation in Solaris.
>>>>
>>>> Problem Identified:
>>>> In each GC log rotation "print_memory_info()" is getting called
>>>> through dump_loggc_header().
>>>> "print_memory_info()" of solaris version calls check_addr0(st);
>>>>
>>>> File:
>>>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/80959a760b85/src/os/solaris/vm/os_solaris.cpp
>>>>
>>>>
>>>>
>>>> void os::print_memory_info(outputStream* st) {
>>>> st->print("Memory:");
>>>>
>>>> st->print(" %dk page", os::vm_page_size()>>10);
>>>>
>>>> st->print(", physical " UINT64_FORMAT "k",
>>>> os::physical_memory()>>10);
>>>>
>>>> st->print("(" UINT64_FORMAT "k free)", os::available_memory() >> 10);
>>>>
>>>> st->cr();
>>>> (void) check_addr0(st);
>>>> }
>>>>
>>>> Now check_addr0(st) function do a lot of read call to read the data
>>>> from /proc/self/map.
>>>> and check if virtual address is mapped to 0x0. These read calls take
>>>> lot of time which results in GC rotation pause.
>>>> Here calling check_addr0() seems unnecessary for every log rotation.
>>>> It will be more logical if this function gets called only
>>>> when an error is reported.
>>>>
>>>> Solution proposed:
>>>> Before GC log rotation print_memory_info() is ever getting called from
>>>> Vm_error.cpp during error reporting. And in case of error reporting
>>>> checking for address mapping to 0x0 looks fine. So the proposed
>>>> solution is to do an extra check inside print_memory_info().
>>>>
>>>> - (void) check_addr0(st);
>>>> + if (VMError::fatal_error_in_progress()){
>>>> + (void) check_addr0(st);
>>>> + }
>>>> }
>>>>
>>>> This check doesn't fit well inside printing function, but at this
>>>> point I don't see the need to create a new os:: method and change all
>>>> the OS classes just for that check.
>>>>
>>>> Regards,
>>>> Cheleswer
More information about the hotspot-runtime-dev
mailing list