RFR: 8292984: Refactor internal container-related interfaces for clarity [v2]
Severin Gehwolf
sgehwolf at openjdk.org
Tue Sep 30 09:49:11 UTC 2025
On Tue, 30 Sep 2025 08:45:38 GMT, Casper Norrbin <cnorrbin at openjdk.org> wrote:
>> Hi everyone,
>>
>> The current memory-related code paths in Linux are unclear and convoluted, with responsibilities and data flow crossing between `os::Linux` and various container-related layers.
>>
>> For example, consider the call sequence for `os::available_memory()`:
>>
>> os::available_memory()
>> |
>> v
>> os::Linux::available_memory()
>> |--------------------------------------------
>> v v
>> OSContainer::memory_limit_in_bytes() or return host physical memory
>> |
>> v
>> CgroupSubsystem::memory_limit_in_bytes()
>> |--------------------------------------------
>> v v
>> return os::Linux::physical_memory() or return cgroup v1/v2 limit
>>
>>
>> This structure is difficult to follow. Calls move between `os::Linux` and container subsystems in a confusing manner. Ideally, each component should be responsible only for its relevant functionality:
>> * `os::Linux` should focus solely on actual machine memory values.
>> * `CgroupSubsystem` should focus exclusively on cgroup memory limits.
>> * The selection of which value to use should occur at the `os` layer, based on whether the environment is containerized.
>>
>>
>> A revised structure separates these responsibilities:
>>
>> os::available_memory()
>> |--------------------------------------------
>> v v
>> OSContainer::memory_limit_in_bytes() or os::Linux::available_memory()
>> |--------------------------------------------
>> v v
>> CgroupSubsystem::memory_limit_in_bytes() os::Linux::physical_memory()
>> |
>> v
>> return bounded cgroup v1/v2 limit
>>
>>
>> With these changes:
>> * `os::Linux` only retrieves machine values.
>> * `CgroupSubsystem` works exclusively with cgroup limits.
>> * `OSContainer` fetches and passes bounds for the cgroup values.
>> * The decision of container or machine value is done in the `os` layer.
>>
>> The concrete code changes include:
>> * Moving container selection logic from `os::Linux::{available/free}_memory()` to `os:{available/free}_memory()`, so `os::Linux` now only deals with machine values (was already the case for `os::physical_memory()`).
>> * Moving `os::Linux::available_memory_in_container()` to `OSContainer` instead, removing container-specific logic from `os::Linux`. Also ref...
>
> Casper Norrbin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits:
>
> - Merge branch 'master' into linux-container-mem-restructure
> - keep cgroupsubsystem separate
> - keep os::linux separate
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 674:
> 672: return memory_limit->value();
> 673: }
> 674: jlong mem_limit = contrl->controller()->read_memory_limit_in_bytes(upper_bound);
This removes the logging at the trace level for the upper bound. Intentional?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/27470#discussion_r2390614627
More information about the hotspot-runtime-dev
mailing list