RFR: 8343191: Cgroup v1 subsystem fails to set subsystem path [v3]
Sergey Chernyshev
schernyshev at openjdk.org
Tue Nov 12 19:13:15 UTC 2024
On Thu, 7 Nov 2024 22:31:21 GMT, Sergey Chernyshev <schernyshev at openjdk.org> wrote:
>> Cgroup V1 subsustem fails to initialize mounted controllers properly in certain cases, that may lead to controllers left undetected/inactive. We observed the behavior in CloudFoundry deployments, it affects also host systems.
>>
>> The relevant /proc/self/mountinfo line is
>>
>>
>> 2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct
>>
>>
>> /proc/self/cgroup:
>>
>>
>> 11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
>>
>>
>> Here, Java runs inside containerized process that is being moved cgroups due to load balancing.
>>
>> Let's examine the condition at line 64 here https://github.com/openjdk/jdk/blob/55a7cf14453b6cd1de91362927b2fa63cba400a1/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L59-L72
>> It is always FALSE and the branch is never taken. The issue was spotted earlier by @jerboaa in [JDK-8288019](https://bugs.openjdk.org/browse/JDK-8288019).
>>
>> The original logic was intended to find the common prefix of `_root`and `cgroup_path` and concatenate the remaining suffix to the `_mount_point` (lines 67-68). That could lead to the following results:
>>
>> Example input
>>
>> _root = "/a"
>> cgroup_path = "/a/b"
>> _mount_point = "/sys/fs/cgroup/cpu,cpuacct"
>>
>>
>> result _path
>>
>> "/sys/fs/cgroup/cpu,cpuacct/b"
>>
>>
>> Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control groups states:
>>
>>
>> ...
>> /proc/pid/cgroup (since Linux 2.6.24)
>> This file describes control groups to which the process
>> with the corresponding PID belongs. The displayed
>> information differs for cgroups version 1 and version 2
>> hierarchies.
>> For each cgroup hierarchy of which the process is a
>> member, there is one entry containing three colon-
>> separated fields:
>>
>> hierarchy-ID:controller-list:cgroup-path
>>
>> For example:
>>
>> 5:cpuacct,cpu,cpuset:/daemons
>> ...
>> [3] This field contains the pathname of the control group
>> in the hierarchy to which the process belongs. This
>> pathname is relative to the mount point of the
>> hierarchy.
>>
>>
>> This explicitly states the "pathname is relative t...
>
> Sergey Chernyshev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:
>
> - Merge branch 'master' into JDK-8343191
> - patch reimplemented
> - fix the logic that skips duplicate controller's mount points
> - 8343191: Cgroup v1 subsystem fails to set subsystem path
> Edit: Yet, cg v2 will get into trouble since there, for example on rootless podman on cg v2 you'd end up with this instead:
>
> ```
> [0.008s][trace][os,container] OSContainer::init: Initializing Container Support
> [0.008s][debug][os,container] Detected optional pids controller entry in /proc/cgroups
> [0.008s][debug][os,container] Detected cgroups v2 unified hierarchy
> [0.008s][trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/../../../../../../test
> [0.008s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/../../../../../../test/memory.max
> [0.008s][debug][os,container] Open of file /sys/fs/cgroup/../../../../../../test/memory.max failed, No such file or directory
> [0.008s][trace][os,container] Memory Limit failed: -2
> ...
> [0.009s][trace][os,container] Path to /memory.max is /sys/fs/cgroup/memory.max
> [0.009s][debug][os,container] Open of file /sys/fs/cgroup/memory.max failed, No such file or directory
> [0.009s][trace][os,container] Memory Limit failed: -2
> [0.009s][trace][os,container] Memory Limit is: -2
> ```
Here, the path `/sys/fs/cgroup/memory.max` would normally point to the actual memory limit inside the container. In this particular case, the directory `/sys/fs/cgroup` is empty.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/21808#issuecomment-2471350885
More information about the core-libs-dev
mailing list