RFR: 8292083: Detected container memory limit may exceed physical machine memory [v15]
Severin Gehwolf
sgehwolf at openjdk.org
Tue Aug 23 13:33:11 UTC 2022
On Tue, 23 Aug 2022 13:28:32 GMT, Jonathan Dowland <jdowland at openjdk.org> wrote:
>> We discovered some systems configured with cgroups v1 which report a bogus container memory limit value which is above the physical memory of the host. OpenJDK then calculates flags such as InitialHeapSize based on this invalid value; this can be larger than the available memory which can result in the OS terminating the process due to OOM.
>>
>> hotspot's container awareness attempts to sanity check the limit value by ensuring it's below `_unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size()`, but that still leaves a large range of potential invalid values between physical RAM and that ceiling value.
>>
>> Cgroups V1 in particular returns an uninitialised value for the memory limit when one has not been explicitly set. Cgroups v2 does not suffer the same problem: however, it's possible for any value to be set for the max memory, including values exceeding the available physical memory, in either v1 or v2.
>>
>> This fixes the problem in two places. Further work may be required in the area of Java metrics / MXBeans. I'd also look again at whether the existing ceiling value `_unlimited_memory` serves any useful purpose. I personally don't feel those improvements should hold up this fix.
>
> Jonathan Dowland has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove superfluous log line from os::Linux::available_memory
>
> Now that os::Linux::available_memory calls
> OSContainer::memory_limit_in_bytes, we can remove a log_debug
> that replicates the logging that takes place there.
src/hotspot/os/linux/os_linux.cpp line 199:
> 197: jlong mem_limit = OSContainer::memory_limit_in_bytes();
> 198: jlong mem_usage = OSContainer::memory_usage_in_bytes();
> 199: if (mem_limit > 0 && mem_usage < 1) {
This will run into the `OSContainer::memory_usage_in_bytes()` is not cached problem (you should notice this that the logs are more noisy). A pattern like:
jlong mem_limit = OSContainer::memory_limit_in_bytes();
jlong mem_usage;
if (mem_limit > 0 && (mem_usage = OSContainer::memory_usage_in_bytes()) < 1) {
[...]
Works better due to short-circuiting `and`-expression.
-------------
PR: https://git.openjdk.org/jdk/pull/9880
More information about the hotspot-runtime-dev
mailing list