Container-aware heap sizing for OpenJDK

Mon Sep 26 08:43:49 UTC 2022

Hi,

On 22.09.22 11:16, Man Cao wrote:
> Hi all,
> 
> Great to see so many responses in this thread! I work with Jonathan and 
> reviewed all design and implementation aspects of AHS.
> 
> Adding a few more comments to Jonathan's response:
[...]
> 
> Now it seems a great time to get [1] and [2] implemented, so AHS can 
> actually converge. Are there recent prototypes for them? What are the 
> main problems in those prototypes? Jonathan and I could take a look and 
> help get them implemented.
> 

 From what I understand the main issue is thinking about and defining 
interaction with existing Min/Initial/MaxHeapSize flags (e.g. [1]).

SoftMaxHeapSize otherwise seems to be relatively unproblematic.

Iirc specifically for the Current/HardMaxHeapSize flag, the 
implementation needs to be thought through in more detail. There is an 
initial list of problems ([2]) that should be worked through and defined 
as it has much more external impact.

> For the AHS main logic that sets the manageable JVM flags, currently it 
> resides in a native thread started by our own Java launcher. [3] has a 
> description on what our launcher does.
> However, the main logic could be implemented in other places, such as a 
> Java agent, a JNI library, or another process. We could do some rework 
> to make the main logic a JNI library, then open-source it.
> We also agree with Thomas that the AHS main logic is not suitable to be 
> integrated into the JVM, as it varies greatly depending on the 
> deployment environment (e.g., the container technology, OS, additional 
> external signals that should be taken into account).
> 
> Besides [1] and [2], another required JVM feature is monitoring metrics 
> for GC CPU time. We added hsperfdata counters in our JDK 11 for this, 
> and I will create RFEs to upstream them. After all, it seems unlikely 
> that AHS needs another JEP.

I have no strong opinion about doing a JEP for this.

> @Thomas
>  > SPECjbb2015 is a standard industry Java VM performance benchmark
>  > (https://www.spec.org/jbb2015/ <https://www.spec.org/jbb2015/>). 
> Unfortunately it's commercial.
>  > ...
>  > b) and during measurement phase, SPECjbb2015 has a peculiar way of
>  > trying to find maximum sustainable load.
>  > ...
> Thanks for the details on specjbb2015!
> We did try SPECjbb2015 for benchmarking earlier (Jonathan was not 
> involved at that time), but did not pursue running it regularly as we 
> did not find it representative enough to our typical server workload.
> The peculiar way of finding maximum load (sudden allocation spike after 
> a quiescence period) does sound problematic for most GC heuristics. It 
> does NOT sound like an important case to optimize in the real world. 
> Real-world server workload typically ramps up and ramps down gradually, 
> corresponding to daily traffic patterns. Batch and desktop applications 
> are more likely to have phased behavior, but such phase changes should 
> only last for a short, temporary period.
Thanks,
   Thomas

[1] 
https://bugs.openjdk.org/browse/JDK-8236073?focusedCommentId=14368061&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14368061
[2] https://mail.openjdk.org/pipermail/hotspot-gc-dev/2018-June/022472.html