RFR: 8292083: Detected container memory limit may exceed physical machine memory [v2]

Thomas Stuefe stuefe at openjdk.org
Wed Aug 17 13:39:52 UTC 2022


On Wed, 17 Aug 2022 13:36:27 GMT, Jonathan Dowland <jdowland at openjdk.org> wrote:

>> We discovered some systems configured with cgroups v1 which report a bogus container memory limit value which is above the physical memory of the host. OpenJDK then calculates flags such as InitialHeapSize based on this invalid value; this can be larger than the available memory which can result in the OS terminating the process due to OOM.
>> 
>> hotspot's container awareness attempts to sanity check the limit value by ensuring it's below `_unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size()`, but that still leaves a large range of potential invalid values between physical RAM and that ceiling value.
>> 
>> Cgroups V1 in particular returns an uninitialised value for the memory limit when one has not been explicitly set. Cgroups v2 does not suffer the same problem: however, it's possible for any value to be set for the max memory, including values exceeding the available physical memory, in either v1 or v2.
>> 
>> This fixes the problem in two places. Further work may be required in the area of Java metrics / MXBeans. I'd also look again at whether the existing ceiling value `_unlimited_memory` serves any useful purpose. I personally don't feel those improvements should hold up this fix.
>
> Jonathan Dowland has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Separate out debug logging for three invalid memory limit scenarios
>    
>    Refactor the ternary expression into an if/else chain and expand it
>    to the third case (memory limit equal to or exceeding physical RAM)
>    
>    Format the trace log message for that case to match that of the other
>    two
>    
>    Adjust the other two to incorporate physical RAM into the log message
>  - Ensure trace log is enabled before trace logging
>    
>    Thanks Severin

src/hotspot/os/linux/osContainer_linux.cpp line 65:

> 63:   }
> 64:   if ((mem_limit = cgroup_subsystem->memory_limit_in_bytes()) > 0 &&
> 65:        mem_limit < host_memory) {

I really would like it if these two conditions were nested, that would be easier to understand.

if (memlimit valid and > 0) {
  if (memlimit <= phys) {
    phys = memlimit
  } else {
    log
  }
}

-------------

PR: https://git.openjdk.org/jdk/pull/9880


More information about the hotspot-runtime-dev mailing list