RFR: 8292083: Detected container memory limit may exceed physical machine memory [v17]

Severin Gehwolf sgehwolf at openjdk.org
Tue Aug 23 14:39:43 UTC 2022


On Tue, 23 Aug 2022 13:45:56 GMT, Jonathan Dowland <jdowland at openjdk.org> wrote:

>> We discovered some systems configured with cgroups v1 which report a bogus container memory limit value which is above the physical memory of the host. OpenJDK then calculates flags such as InitialHeapSize based on this invalid value; this can be larger than the available memory which can result in the OS terminating the process due to OOM.
>> 
>> hotspot's container awareness attempts to sanity check the limit value by ensuring it's below `_unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size()`, but that still leaves a large range of potential invalid values between physical RAM and that ceiling value.
>> 
>> Cgroups V1 in particular returns an uninitialised value for the memory limit when one has not been explicitly set. Cgroups v2 does not suffer the same problem: however, it's possible for any value to be set for the max memory, including values exceeding the available physical memory, in either v1 or v2.
>> 
>> This fixes the problem in two places. Further work may be required in the area of Java metrics / MXBeans. I'd also look again at whether the existing ceiling value `_unlimited_memory` serves any useful purpose. I personally don't feel those improvements should hold up this fix.
>
> Jonathan Dowland has updated the pull request incrementally with one additional commit since the last revision:
> 
>   avoid calling OSContainer::memory_usage_in_bytes
>   
>   if mem_limit isn't set, avoid calling OSContainer::memory_usage_in_bytes

Looks mostly good to me. Thanks for the perseverance! The test will need some adjustment, though.

test/hotspot/jtreg/containers/docker/TestMemoryAwareness.java line 122:

> 120:             .addDockerOpts("--memory", badMem);
> 121:         Common.run(opts)
> 122:             .shouldMatch("container memory limit ignored: "+badMem+", using host value "+goodMem);

Test now fails with:

java.lang.RuntimeException: 'container memory limit ignored: 332580331520, using host value 33258033152' missing from stdout/stderr


Log lines on cg1 look like:

[0.065s][debug][os,container] container memory limit unlimited: -1, using host value 33258033152


So it should be made more lenient to what it expects. For example:


        Common.run(opts)
            .shouldMatch("container memory limit (ignored|unlimited): (-1|" + badMem + "), using host value " + goodMem);

-------------

Changes requested by sgehwolf (Reviewer).

PR: https://git.openjdk.org/jdk/pull/9880


More information about the hotspot-runtime-dev mailing list