[8u] RFR: 8144483: One long Safepoint pause directly after each GC log rotation

Fri Jan 8 12:25:45 UTC 2016

Hello Rainer, thanks for getting in touch -

Cheleswer and I have been talking about how to do this - currently we 
plan to push the change below to 8u as it doesn't affect 9 and it solves 
the specific issue raised about log rotation.  However as you say, the 
read itself should be more efficient: we'd like to handle that in a 
separate CR, which we'll log, as it makes more sense from our point of 
view when what we want in 8 and 9 are different.

In 9, we don't have a problem at log rotation due to other changes. 
(Maybe when using -Xcheck:jni we can see the same pause, as 
run_periodic_checks() calls check_addr0().  I'm not sure offhand if slow 
-Xcheck:jni behaviour will block other work as the gc rotation pause did).

I created https://bugs.openjdk.java.net/browse/JDK-8146683 for 9, and 
we'll consider backporting.

Many thanks
Kevin

On 07/01/2016 03:43, Rainer Jung wrote:
> Hi all,
>
> I'm the original reporter of the problem.
>
> Am 06.01.2016 um 17:18 schrieb Yumin Qi:
>> Cheleswer,
>>
>>    If the call to check_addr0() is unnecessary, remove it is a better
>> choice I think. In fatal error, means VM will exit, the error handler
>> will print memory info, so it is redundant here. I don't know why it is
>> added to log rotation code in the first place.
>
> When I first investigated the problem, I noticed that the same 
> function check_addr0() is also used in os::run_periodic_checks(). The 
> whole code goes back to
>
> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6478739
>
> I also wondered why it was called during GC log rotation but didn't 
> investigate further.
>
> Note that in addition to the problem of the long safepoint pause 
> during GC log file rotation, this bug has a second aspect, namely that 
> reading the maps file is done in a very inefficient and slow way. So 
> even after removing the call to check_addr0() during gc log file 
> rotation - which would fix this bug here - whatever code remains to 
> call check_addr0(), will still notice that the call takes a very long 
> time (e.g. more than 30 seconds if your process has lots of threads 
> and open jar files).
>
> This aspect was triggered by an older fix of the code which reads the 
> map file. Before the fix, the file was closed after reading the first 
> record from it. That was a bug but it terminated the read loop early 
> and the call returned quickly. After the old fix, the close was moved 
> behind the read loop and now the full read loop executed and took too 
> long. As an example for the old fix see
>
> http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/diff/e50eb3195734/src/os/solaris/vm/os_solaris.cpp 
>
>
> I think the correct (and fast) way of reading the maps file can be 
> seen by looking at the code of the Solaris "pmap" command. It reads 
> the file in one go using the pread() call instead of reading lots of 
> structs one by one. Therefore pmap executes very fast, much faster 
> than check_addr0() for the same maps file. I did check that on the 
> same map file, that showed the slowness bug in check_addr0().
>
> Since the pmap code isn't available in the open, I looked at the 
> Illumos code based on older OpenSolaris. There one can see:
>
> In 
> https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/cmd/ptools/pmap/pmap.c
>
> - calls Pmapping_iter_resolved() to iterate over the maps
>
> In 
> https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/lib/libproc/common/Psymtab.c
>
> - Pmapping_iter_resolved() calls i_Pmapping_iter()
> - i_Pmapping_iter() calls Prd_agent()
> - Prd_agent() calls Pupdate_maps()
> - Pupdate_maps() calls Preadmaps()
> - Preadmaps() calls P->ops.pop_read_maps()
>
> In 
> https://github.com/illumos/illumos-gate/blob/139510fb6efa97dbe5f5479594b308d940cab8d1/usr/src/lib/libproc/common/Pcontrol.c
>
> - P->ops.pop_read_maps is Pread_maps_live()
> - Pread_maps_live() (starting at line 100) uses pread() to read the 
> whole file in one go into a buffer, that was sized using the st_size 
> field in the stat returned from fstat() called on the map file.
>
> So I suggest you also fix the slowness in check_addr0() by adjusting 
> the code that reads the map file in a similar way. If you think that's 
> to dangerous for a huge file, then you might consider at least reading 
> the file in big enough chunks (e.g. 100 structs at a time).
>
> Regards,
>
> Rainer
>
>> On 1/6/2016 3:37 AM, cheleswer sahu wrote:
>>> Hi,
>>>
>>> Please review the code changes for
>>> "https://bugs.openjdk.java.net/browse/JDK-8144483".
>>>
>>> webrev link: http://cr.openjdk.java.net/~kevinw/8144483/webrev.00/
>>> JBS link: https://bugs.openjdk.java.net/browse/JDK-8144483
>>>
>>> Bug brief:
>>> A long pause is observed after each gc log rotation in Solaris.
>>>
>>> Problem Identified:
>>> In each GC log rotation "print_memory_info()" is getting called
>>> through dump_loggc_header().
>>> "print_memory_info()" of solaris version calls check_addr0(st);
>>>
>>> File:
>>> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/80959a760b85/src/os/solaris/vm/os_solaris.cpp 
>>>
>>>
>>>
>>> void os::print_memory_info(outputStream* st) {
>>>   st->print("Memory:");
>>>
>>>   st->print(" %dk page", os::vm_page_size()>>10);
>>>
>>>   st->print(", physical " UINT64_FORMAT "k", 
>>> os::physical_memory()>>10);
>>>
>>>   st->print("(" UINT64_FORMAT "k free)", os::available_memory() >> 10);
>>>
>>>   st->cr();
>>>  (void) check_addr0(st);
>>> }
>>>
>>> Now check_addr0(st) function do a lot of read call to read the data
>>> from /proc/self/map.
>>> and check if virtual address is mapped to 0x0. These read calls take
>>> lot of time which results in GC rotation pause.
>>> Here calling check_addr0() seems unnecessary for every log rotation.
>>> It will be more logical if this function gets called only
>>> when an error is reported.
>>>
>>> Solution proposed:
>>> Before GC log rotation print_memory_info() is ever getting called from
>>> Vm_error.cpp during error reporting. And in case of error reporting
>>> checking for address mapping to 0x0 looks fine. So the proposed
>>> solution is to do an extra check inside print_memory_info().
>>>
>>> -  (void) check_addr0(st);
>>> +  if (VMError::fatal_error_in_progress()){
>>> +     (void) check_addr0(st);
>>> +  }
>>>  }
>>>
>>>  This check doesn't fit well inside printing function, but at this
>>> point I don't see the need to create a new os:: method and change all
>>> the OS classes just for that check.
>>>
>>> Regards,
>>> Cheleswer