RFR: 8343191: Cgroup v1 subsystem fails to set subsystem path
Sergey Chernyshev
schernyshev at openjdk.org
Thu Oct 31 15:04:57 UTC 2024
Cgroup V1 subsustem fails to initialize mounted controllers properly in certain cases, that may lead to controllers left undetected/inactive. We observed the behavior in CloudFoundry deployments, it affects also host systems.
The relevant /proc/self/mountinfo line is
2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct
/proc/self/cgroup:
11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
Here, Java runs inside containerized process that is being moved cgroups due to load balancing.
Let's examine the condition at line 64 here https://github.com/openjdk/jdk/blob/55a7cf14453b6cd1de91362927b2fa63cba400a1/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L59-L72
It is always FALSE and the branch is never taken. The issue was spotted earlier by @jerboaa in [JDK-8288019](https://bugs.openjdk.org/browse/JDK-8288019).
The original logic was intended to find the common prefix of `_root`and `cgroup_path` and concatenate the remaining suffix to the `_mount_point` (lines 67-68). That could lead to the following results:
Example input
_root = "/a"
cgroup_path = "/a/b"
_mount_point = "/sys/fs/cgroup/cpu,cpuacct"
result _path
"/sys/fs/cgroup/cpu,cpuacct/b"
Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control groups states:
...
/proc/pid/cgroup (since Linux 2.6.24)
This file describes control groups to which the process
with the corresponding PID belongs. The displayed
information differs for cgroups version 1 and version 2
hierarchies.
For each cgroup hierarchy of which the process is a
member, there is one entry containing three colon-
separated fields:
hierarchy-ID:controller-list:cgroup-path
For example:
5:cpuacct,cpu,cpuset:/daemons
...
[3] This field contains the pathname of the control group
in the hierarchy to which the process belongs. This
pathname is relative to the mount point of the
hierarchy.
This explicitly states the "pathname is relative to the mount point of the hierarchy". Hence, the correct result could have been
/sys/fs/cgroup/cpu,cpuacct/a/b
However, if Java runs in a container, `/proc/self/cgroup` and `/proc/self/mountinfo` are mapped (read-only) from host, because docker uses `--cgroupns=host` by default in cgroup v1 hosts. Then `_root` and `cgroup_path` belong to the host and do not exist in the container. In containers Java must fall back to `_mount_point` of the corresponding cgroup controller.
When `--cgroupns=private` is used, `_root` and `cgroup_path` are always equal to `/`.
In hosts, the `cgroup_path` should always be added to the mount point, no matter how it compares to the `_root`.
The patch uses the result of `is_containerized()` to select the correct path. It is suggested to change the semantics of `is_read_only()` so that it returns the combined read-only flag for all mounted controllers. Currently the only usage of `_read_only` flag is to determine that V1 subsystem `is_containerized()`. `_read_only` flags are available in advance, before initialization of any CgroupV1SubsystemController objects.
The Java side is updated to follow the same logic.
-------------
Commit messages:
- 8343191: Cgroup v1 subsystem fails to set subsystem path
Changes: https://git.openjdk.org/jdk/pull/21808/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21808&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8343191
Stats: 229 lines in 10 files changed: 157 ins; 30 del; 42 mod
Patch: https://git.openjdk.org/jdk/pull/21808.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/21808/head:pull/21808
PR: https://git.openjdk.org/jdk/pull/21808
More information about the hotspot-runtime-dev
mailing list