Container-aware heap sizing for OpenJDK

Thomas Schatzl thomas.schatzl at oracle.com
Fri Sep 16 14:12:28 UTC 2022


Hi Jonathan,

   great to hear from you again :)

On 13.09.22 21:52, Jonathan Joo wrote:
> Hello hotspot-dev and hotspot-gc-dev,
> 
> 
> My name is Jonathan, and I'm working on the Java Platform Team at 
> Google. Here, we are working on a project to address Java container 
> memory issues, as we noticed that a significant number of Java servers 
[...]
> 
> Below (under the dotted line) is a more detailed explanation of our 
> initial approach. Does this sound like something that may be useful for 
> the general OpenJDK community? If so, would some of you be open to 
> further discussion? I would also like to better understand what 

Most of these suggestions seem to be fairly consistent with existing 
RFEs (e.g. [1], [2], [3], ...) that have been discussed before with you 
(e.g. in [4]) and been considered really nice to have iirc.

> container environments look like outside of Google, to see how we could 
> modify our approach for the more general case.
> 
[...]
>   *
> 
>     A separate thread runs alongside the JVM, querying:
>
> [...] 

I am not convinced that having a thread inside the JVM is really the 
best solution. Constantly querying the _environment_ for changes seems 
to be traditionally outside of the scope of the JVM.

Doing so also opens a quite big can of worms, just re-iterating the 
concerns given in other comments like:

  * How is the container memory limit being determined? Does that 
process take into account non-Java processes running in the container as 
well? (Ashutosh Mehra)

  * If you have two (multiple) JVM processes running inside the 
container, how do they coordinate? (Ioi Lam)

  * The properties queried and the policies (e.g. which process should 
get preference if there are multiple?) are likely fairly specific to the 
deployment too, so is there really some one-size-fits-all policy here? 
(Volker Simonis)

  * I assume that so far you were only talking about G1/Linux support, 
and hoping the rest of the community jumping in...

So at first glance, I question the advantages of putting the coordinator 
inside the JVM. The only one I can come up on the spot is: you do not 
have to deploy something extra.

Using some external process (however it is distributed) seems to be a 
much more flexible option (not only in customizability but also in terms 
of the release cycle for it). I would suggest to at least separate this 
effort from improving the JVM capabilities.

>   *
> 
>     This thread then uses this information to calculate new values for
>     the two new JVM flags, and continually updates them at runtime.

Since these flags were planned as manageable afair, any process could 
already change them as needed.

> 
>   *
> 
>     The `Current maximum heap expansion size` informs the JVM what is
>     the maximum amount we can expand the heap by, while staying within
>     container limits. This is a hard limit, and trying to expand more
>     than this amount results in behavior equivalent to hitting the Xmx
>     limit.

This sounds like [2].

> 
>   *
> 
>     The `Current target heap size` is a soft target value, which is used
>     to resize the heap (when possible) so as to bring GC CPU overhead
>     toward its target value.
> 

See [1]. (As Stefan Karlsson mentioned, this functionality is already 
available in ZGC).

>       *Caveats:
> 
>   * Enabling this feature might require tuning of the newly
>     introduced default GC CPU overhead target to avoid regressions.
> 
>   * Time spent doing GC for an application may increase significantly
>     (though generally we've seen in practice that even if this is the
>     case, end-to-end latency does not increase a noticeable amount)
> 

 From the discussion in [4], JDK-8244603 has actually already been 
integrated. I have had some time to re-baseline the impact of [2] a few 
weeks ago, and actually I was planning to pick up some of that old work 
in the near future, but the impact ranges from nothing to very 
significant (-15% throughput for "simple" applications), and 
particularly some applications that exhibit a very "phased" behavior 
results show very bad behavior. I am seeing like -50% in criticaljops 
for SPECjbb2015 due to G1 being more aggressive with giving back memory 
to other users....

I am aware that nobody is running SPECjbb all the time, but just to 
mention that this is not work to be underestimated.

Maybe your implementation of similar functionality fares much better in 
that area though. I think everyone is now already really curious to see 
your changes :)


>   * Enabling AHS results in frequent heap resizings, but we have not
>     seen evidence of any negative effects as a result of these more
>     frequent heap resizings.

See above.

> 
>   * AHS is not necessarily a replacement for proper JVM tuning, but
>     should generally work better than an untuned or improperly tuned
>     configuration.

Thanks,
   Thomas

[1] https://bugs.openjdk.org/browse/JDK-8236073
[2] https://bugs.openjdk.org/browse/JDK-8204088
[3] https://bugs.openjdk.org/browse/JDK-8238687
[4] https://mail.openjdk.org/pipermail/hotspot-gc-dev/2021-May/035092.html


More information about the hotspot-dev mailing list