RFR: 8292984: Refactor internal container-related interfaces for clarity
Casper Norrbin
cnorrbin at openjdk.org
Wed Sep 24 14:04:37 UTC 2025
Hi everyone,
The current memory-related code paths in Linux are unclear and convoluted, with responsibilities and data flow crossing between `os::Linux` and various container-related layers.
For example, consider the call sequence for `os::available_memory()`:
os::available_memory()
|
v
os::Linux::available_memory()
|--------------------------------------------
v v
OSContainer::memory_limit_in_bytes() or return host physical memory
|
v
CgroupSubsystem::memory_limit_in_bytes()
|--------------------------------------------
v v
return os::Linux::physical_memory() or return cgroup v1/v2 limit
This structure is difficult to follow. Calls move between `os::Linux` and container subsystems in a confusing manner. Ideally, each component should be responsible only for its relevant functionality:
* `os::Linux` should focus solely on actual machine memory values.
* `CgroupSubsystem` should focus exclusively on cgroup memory limits.
* The selection of which value to use should occur at the `os` layer, based on whether the environment is containerized.
A revised structure separates these responsibilities:
os::available_memory()
|--------------------------------------------
v v
OSContainer::memory_limit_in_bytes() or os::Linux::available_memory()
|--------------------------------------------
v v
CgroupSubsystem::memory_limit_in_bytes() os::Linux::physical_memory()
|
v
return bounded cgroup v1/v2 limit
With these changes:
* `os::Linux` only retrieves machine values.
* `CgroupSubsystem` works exclusively with cgroup limits.
* `OSContainer` fetches and passes bounds for the cgroup values.
* The decision of container or machine value is done in the `os` layer.
The concrete code changes include:
* Moving container selection logic from `os::Linux::{available/free}_memory()` to `os:{available/free}_memory()`, so `os::Linux` now only deals with machine values (was already the case for `os::physical_memory()`).
* Moving `os::Linux::available_memory_in_container()` to `OSContainer` instead, removing container-specific logic from `os::Linux`. Also refactored to use the new bool and reference interface introduced in [JDK-8357086](https://bugs.openjdk.org/browse/JDK-8357086).
* Moving accessing host values from `CgroupSubsystem` to `OSContainer`, and abstracting `CgroupSubsystem` to use more generic limits (e.g. `upper_mem_bound`) instead of a direct system reference (e.g. `host_mem`).
Note: I intentionally kept the `julong` parameter types unchanged. I believe it's better to update all types simultaneously in [JDK-8365606](https://bugs.openjdk.org/browse/JDK-8365606) instead to ensure the change is complete and consistent.
Testing:
* Oracle tiers 1-3.
* Container tests on cgroup v1 and v2 systems.
-------------
Commit messages:
- keep cgroupsubsystem separate
- keep os::linux separate
Changes: https://git.openjdk.org/jdk/pull/27470/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27470&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8292984
Stats: 135 lines in 9 files changed: 32 ins; 27 del; 76 mod
Patch: https://git.openjdk.org/jdk/pull/27470.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/27470/head:pull/27470
PR: https://git.openjdk.org/jdk/pull/27470
More information about the hotspot-runtime-dev
mailing list