RFR: 8292083: Detected container memory limit may exceed physical machine memory [v6]

Jonathan Dowland jdowland at openjdk.org
Thu Aug 18 15:56:02 UTC 2022


> We discovered some systems configured with cgroups v1 which report a bogus container memory limit value which is above the physical memory of the host. OpenJDK then calculates flags such as InitialHeapSize based on this invalid value; this can be larger than the available memory which can result in the OS terminating the process due to OOM.
> 
> hotspot's container awareness attempts to sanity check the limit value by ensuring it's below `_unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size()`, but that still leaves a large range of potential invalid values between physical RAM and that ceiling value.
> 
> Cgroups V1 in particular returns an uninitialised value for the memory limit when one has not been explicitly set. Cgroups v2 does not suffer the same problem: however, it's possible for any value to be set for the max memory, including values exceeding the available physical memory, in either v1 or v2.
> 
> This fixes the problem in two places. Further work may be required in the area of Java metrics / MXBeans. I'd also look again at whether the existing ceiling value `_unlimited_memory` serves any useful purpose. I personally don't feel those improvements should hold up this fix.

Jonathan Dowland has updated the pull request incrementally with two additional commits since the last revision:

 - Simplify testContainerMemExceedsPhysical, avoid OperatingSystemMXBean
   
   Rewrite the test to run two containers. First time, capture the logging
   to get the reported physical memory size. Derive a bad value from this
   (*10). Second run, set the container memory limit to the bad value.
   Check the trace log for a line indicating this was detected and ignored.
 - debug log physical memory (not cgroup constrained)

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/9880/files
  - new: https://git.openjdk.org/jdk/pull/9880/files/fc2ae1b9..ff57cf41

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=9880&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9880&range=04-05

  Stats: 24 lines in 2 files changed: 7 ins; 13 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/9880.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9880/head:pull/9880

PR: https://git.openjdk.org/jdk/pull/9880


More information about the hotspot-runtime-dev mailing list