RFR: 8292083: Detected container memory limit may exceed physical machine memory [v19]
Thomas Stuefe
stuefe at openjdk.org
Thu Aug 25 17:47:41 UTC 2022
On Wed, 24 Aug 2022 10:35:18 GMT, Jonathan Dowland <jdowland at openjdk.org> wrote:
>> We discovered some systems configured with cgroups v1 which report a bogus container memory limit value which is above the physical memory of the host. OpenJDK then calculates flags such as InitialHeapSize based on this invalid value; this can be larger than the available memory which can result in the OS terminating the process due to OOM.
>>
>> hotspot's container awareness attempts to sanity check the limit value by ensuring it's below `_unlimited_memory = (LONG_MAX / os::vm_page_size()) * os::vm_page_size()`, but that still leaves a large range of potential invalid values between physical RAM and that ceiling value.
>>
>> Cgroups V1 in particular returns an uninitialised value for the memory limit when one has not been explicitly set. Cgroups v2 does not suffer the same problem: however, it's possible for any value to be set for the max memory, including values exceeding the available physical memory, in either v1 or v2.
>>
>> This fixes the problem in two places. Further work may be required in the area of Java metrics / MXBeans. I'd also look again at whether the existing ceiling value `_unlimited_memory` serves any useful purpose. I personally don't feel those improvements should hold up this fix.
>
> Jonathan Dowland has updated the pull request incrementally with one additional commit since the last revision:
>
> Address style nit
I think the gist of my remark is that I would like the layers to behave consistently.
I see that `CgroupSubsystem::memory_limit_in_bytes()` is only used in two places, `os::Linux::available_memory() ` and `os::physical_memory`.
I would say let the `os` layer lie and `Linux` and `CgroupSystem` be the truth. Then we end up with a clear hierarchy:
- let `os::Linux::available_memory()` and `os::Linux::physical_memory()` return the pure host values
- let the cgroup system return the pure cgroup values
- let `os::available_memory()` and `os::physical_memory()` return either one or the other depending on what makes sense.
In addition, let the cgroup subsystem return defined values for "invalid" (if that is possible).
Would that make sense? I don't think this would be a huge effort. We also could do it in a separate RFE.
>
> > > The gist of this patch is code like this:
> > > ```
> > > jlong CgroupV1Subsystem::read_memory_limit_in_bytes() {
> > > GET_CONTAINER_INFO(julong, _memory->controller(), "/memory.limit_in_bytes",
> > > "Memory Limit is: " JULONG_FORMAT, JULONG_FORMAT, memlimit);
> > > if (memlimit >= _unlimited_memory) {
> > > ...
> > > } else {
> > > return (jlong)memlimit;
> > > }
> > > ```
> > >
> > >
> > >
> > >
> > >
> Because you don't know? There is nothing in the cg1 interface files which would tell you that. So you have to come up with a heuristic for "unlimited". For cg2 you have `max`, cg1 just contains random large numbers (if unset).
Oh, ok. Fair enough. Then my only question is at what layer we want the heuristic to happen.
-------------
PR: https://git.openjdk.org/jdk/pull/9880
More information about the hotspot-runtime-dev
mailing list