RFR: 8292083: Detected container memory limit may exceed physical machine memory [v9]
Jonathan Dowland
jdowland at openjdk.org
Mon Aug 22 14:03:52 UTC 2022
> We discovered some systems configured with cgroups v1 which report a bogus container memory limit value which is above the physical memory of the host. OpenJDK then calculates flags such as InitialHeapSize based on this invalid value; this can be larger than the available memory which can result in the OS terminating the process due to OOM.
>
> hotspot's container awareness attempts to sanity check the limit value by ensuring it's below `_unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size()`, but that still leaves a large range of potential invalid values between physical RAM and that ceiling value.
>
> Cgroups V1 in particular returns an uninitialised value for the memory limit when one has not been explicitly set. Cgroups v2 does not suffer the same problem: however, it's possible for any value to be set for the max memory, including values exceeding the available physical memory, in either v1 or v2.
>
> This fixes the problem in two places. Further work may be required in the area of Java metrics / MXBeans. I'd also look again at whether the existing ceiling value `_unlimited_memory` serves any useful purpose. I personally don't feel those improvements should hold up this fix.
Jonathan Dowland has updated the pull request incrementally with one additional commit since the last revision:
Replace _unlimited_memory with calls to os::Linux
_unlimited_memory was a constant in cgroupV1Subsystem_linux which is
initialised to a very large number and used as a ceiling sanity check
when reading a number of memory-related cgroup limits. This was not
sufficient to rule out all possible bad values from cgroups and so a
lower ceiling, set to the host's physical RAM, was needed (8292083)
Eliminate _unlimited_memory which is superfluous and use the host
physical memory instead.
For memory_and_swap_limit_in_bytes we need a higher limit than
physical RAM, so extend os::Linux to report on the host's configured
swap value and combine the two.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/9880/files
- new: https://git.openjdk.org/jdk/pull/9880/files/caa7913c..66bb149d
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=9880&range=08
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=9880&range=07-08
Stats: 22 lines in 4 files changed: 12 ins; 4 del; 6 mod
Patch: https://git.openjdk.org/jdk/pull/9880.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/9880/head:pull/9880
PR: https://git.openjdk.org/jdk/pull/9880
More information about the hotspot-runtime-dev
mailing list