RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1

Mon Jul 6 09:11:55 UTC 2020

Hi,

On 05.07.20 18:21, Ruslan Synytsky wrote:
>>
>>
>>> question regarding the naming: did we agree on how this parameter should
>> be
>>> called? What happens when heap usage goes higher than SoftMaxHeapSize -
>>> OOMError or JVM gets a little bit more memory? If JVM throws OOMError I
>>> believe the right naming should be HardMaxHeapSize. Sorry in advance if I
>>> missed this point in the previous conversations.
>>
>> SoftMaxHeapSize is not what you describe here - SoftMaxHeapSize is only
>> an internal goal for heap sizing without guarantees. Hence the name
>> *Soft*MaxHeapSize.
>>
>> See https://bugs.openjdk.java.net/browse/JDK-8222145.
>>
>> There has been no progress on anything like Current/HardMaxHeapSize.
>>
> Thomas and Liang, is there a possibility to easily add an optional
> manageable parameter that regulates behaviour of JVM when memory
> consumption goes above SoftMaxHeapSize? For example, by default JVM
> allocates more memory when it can't keep memory usage
> below SoftMaxHeapSize, and if the optional parameter was specified then JVM
> throws OutOfMemoryError. In this case we will cover two cases with the same
> code base.

I think there is still the patch from Rodrigo; from my understanding 
from last time there were some issues around when you are allowed to 
change that (as Current/HardMaxHeapSize is read at "arbitrary" locations 
you need to make sure the gc has a consistent view of it), and naming: I 
am not sure but CurrentMaxHeapSize has been the favorite or so.

>>> Also, some news regarding analysis automation of memory usage efficiency
>>> I'm working on in the background. We built a relatively small script that
>>> collects memory usage metrics from many containers running inside the
>> same
>>> large host machine. After executing it in one of our dev environments
>> with
>>> about 150 containers we got interesting results - the used heap is very
>>> close to the committed heap while Xmx is much higher compared to
>> committed
>>> value. Please note, almost all containers use JEP 346 improvement or
>>> javaagent which triggers GC at idle state in the older JDK versions.
>>>
>>> [image: Screenshot 2020-06-18 at 13.20.19.jpg] >
>>> Zoomed
>>>
>>> [image: Screenshot 2020-06-18 at 14.40.18.jpg]
>>
>> While the screenshots have been scrubbed by the mailing list it's very
>> nice to hear that you have had success with these approaches.
>>
> I plan to share more details on this soon with a link to dynamic charts for
> a more convenient analysis.

Looking forward to this. :)

>>> However, enabling JMX ManagementAgent via jcmd and connecting to JVM
>> with a
>>> JMX client is a quite complex operation for getting such a simple metric
>>> about heap memory usage. Also, some java processes may already
>>> start ManagementAgent on a custom port with auth protection, so we can't
>>> collect statistics from such java processes without contacting the
>>> application owner (you can see the gaps on the chart). Do you know any
>>> other way for collecting accurate heap usage statistics from a running
>> java
>>> process? We plan to run this analysis tool on productions with a large
>>> number of containers, so we can get a more realistic picture.
>>>
>>
>> Jcmd with the GC.heap_info command provides some information, probably
>> not enough (I filed JDK-8248136) though. More information can be
>> retrieved with the "VM.info" command, the detailed per-region printout
>> which might be too much information.
>>
>> There is also JFR with its event streaming API that could be an option,
>> however it is JDK14 only (https://openjdk.java.net/jeps/349).
>>
>> Finally, there is jstat to gather some information.
>>
> Unfortunately GC.heap_info and VM.info do not provide information about
> COMMITTED heap. And jstat documentation
> <https://docs.oracle.com/en/java/javase/11/tools/jstat.html> does not
> mention options for collecting committed heap as well. Analyzing used and
> max without understanding committed heap is useless in this context as the
> "lost memory" is located between used and committed.
> 

Sorry. However, VM.info prints the whole heap map with G1, i.e. for 
every region what type it is. For uncommitted regions, it does not print 
such a line in the "Heap regions" section...(*)(**) so you could count 
the lines and compare with what would be expected.

(*) the |   0|0x0000000600000000, 0x0000000600400000, 
0x0000000600400000|100%| O|  |TAMS 0x0000000600400000, 
0x0000000600000000| Untracked lines

(**) that is not completely correct, it prints "available" regions, i.e. 
regions that the topmost layer of g1 thinks are there, but in some 
cases, this is not entirely correct. I.e. if page size is > region size, 
a region on one page may be "unavailable", but another is, but obviously 
in that case the whole page, i.e. the space for all two regions, are 
committed.

But you can reconstruct the data as the indices in that list are fixed. 
E.g. if VM.info shows you, assuming 1m region and 2m page size. When 
using small (4k) pages, this situation can't happen because region size 
is always > page size at least on x86:

|  0| ...
|  1| ...
|  3| ...
|  4| ...
|  5| ...
|  8| ...
....

then the page 0 where region 0/1 are located is committed, page 1 where 
region 2/3 are located is committed (because g1 can't uncommit half 
pages obviously), page 2 where region 4/5 is committed, page 3 where 
regions 6/7 are NOT committed because both are missing, and so on.

I hope this makes sense to you.

While a somewhat cumbersome way to find this out, this works since 
jdk8u40 (implemented in JDK-8038423) iirc.

I recently filed JDK-8248136 for improving the heap info output for G1.

Thanks,
   Thomas