RFR: 8292083: Detected container memory limit may exceed physical machine memory

Thomas Stuefe stuefe at openjdk.org
Wed Aug 17 13:13:20 UTC 2022


On Mon, 15 Aug 2022 14:51:51 GMT, Jonathan Dowland <jdowland at openjdk.org> wrote:

> We discovered some systems configured with cgroups v1 which report a bogus container memory limit value which is above the physical memory of the host. OpenJDK then calculates flags such as InitialHeapSize based on this invalid value; this can be larger than the available memory which can result in the OS terminating the process due to OOM.
> 
> hotspot's container awareness attempts to sanity check the limit value by ensuring it's below `_unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size()`, but that still leaves a large range of potential invalid values between physical RAM and that ceiling value.
> 
> Cgroups V1 in particular returns an uninitialised value for the memory limit when one has not been explicitly set. Cgroups v2 does not suffer the same problem: however, it's possible for any value to be set for the max memory, including values exceeding the available physical memory, in either v1 or v2.
> 
> This fixes the problem in two places. Further work may be required in the area of Java metrics / MXBeans. I'd also look again at whether the existing ceiling value `_unlimited_memory` serves any useful purpose. I personally don't feel those improvements should hold up this fix.

src/hotspot/os/linux/osContainer_linux.cpp line 74:

> 72:     log_trace(os, container)("Container memory limit exceeded or equal to physical"
> 73:                              " memory! container mem: " JLONG_FORMAT ", host mem: " JLONG_FORMAT,
> 74:                              mem_limit, host_memory);

Not sure we need the exclamation mark :-) and I'd reduce this to one line. 

Unless Severin objects, I would however make this at least debug, maybe even info, mirroring above branch. This section gets executed exactly once, so there is no danger of drowning in output.

src/hotspot/os/linux/os_linux.cpp line 223:

> 221:   if (OSContainer::is_containerized()) {
> 222:     jlong mem_limit;
> 223:     if ((mem_limit = OSContainer::memory_limit_in_bytes()) > 0 && mem_limit < phys_mem) {

I don't understand this, but also not the preexisting code.

Did we not just *overwrite* `os::Linux::_physical_memory` during the initialization of the container subsystem depending on the limit? Why is there even a need for added logic here, why not just return `os::Linux::_physical_memory`?

Maybe the answer is that `os::physical_memory()` is supposed to mirror cgroup limit changes during the VMs lifetime (is it?). But then, the question is the other way around, why bother correcting the value during initialization if we need to correct it in `os::physical_memory()` ?

-------------

PR: https://git.openjdk.org/jdk/pull/9880


More information about the hotspot-runtime-dev mailing list