RFR: 8343191: Cgroup v1 subsystem fails to set subsystem path [v6]

Severin Gehwolf sgehwolf at openjdk.org
Wed Nov 27 11:10:47 UTC 2024


On Wed, 27 Nov 2024 09:11:22 GMT, Sergey Chernyshev <schernyshev at openjdk.org> wrote:

>> Cgroup V1 subsustem fails to initialize mounted controllers properly in certain cases, that may lead to controllers left undetected/inactive. We observed the behavior in CloudFoundry deployments, it affects also host systems.
>> 
>> The relevant /proc/self/mountinfo line is
>> 
>> 
>> 2207 2196 0:43 /system.slice/garden.service/garden/good/2f57368b-0eda-4e52-64d8-af5c /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct
>> 
>> 
>> /proc/self/cgroup:
>> 
>> 
>> 11:cpu,cpuacct:/system.slice/garden.service/garden/bad/2f57368b-0eda-4e52-64d8-af5c
>> 
>> 
>> Here, Java runs inside containerized process that is being moved cgroups due to load balancing.
>> 
>> Let's examine the condition at line 64 here https://github.com/openjdk/jdk/blob/55a7cf14453b6cd1de91362927b2fa63cba400a1/src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp#L59-L72
>> It is always FALSE and the branch is never taken. The issue was spotted earlier by @jerboaa in [JDK-8288019](https://bugs.openjdk.org/browse/JDK-8288019). 
>> 
>> The original logic was intended to find the common prefix of `_root`and `cgroup_path` and concatenate the remaining suffix to the `_mount_point` (lines 67-68). That could lead to the following results: 
>> 
>> Example input
>> 
>> _root = "/a"
>> cgroup_path = "/a/b"
>> _mount_point = "/sys/fs/cgroup/cpu,cpuacct"
>> 
>> 
>> result _path
>> 
>> "/sys/fs/cgroup/cpu,cpuacct/b"
>> 
>> 
>> Here, cgroup_path comes from /proc/self/cgroup 3rd column. The man page (https://man7.org/linux/man-pages/man7/cgroups.7.html#NOTES) for control groups states:
>> 
>> 
>> ...
>>        /proc/pid/cgroup (since Linux 2.6.24)
>>               This file describes control groups to which the process
>>               with the corresponding PID belongs.  The displayed
>>               information differs for cgroups version 1 and version 2
>>               hierarchies.
>>               For each cgroup hierarchy of which the process is a
>>               member, there is one entry containing three colon-
>>               separated fields:
>> 
>>                   hierarchy-ID:controller-list:cgroup-path
>> 
>>               For example:
>> 
>>                   5:cpuacct,cpu,cpuset:/daemons
>> ...
>>               [3]  This field contains the pathname of the control group
>>                    in the hierarchy to which the process belongs. This
>>                    pathname is relative to the mount point of the
>>                    hierarchy.
>> 
>> 
>> This explicitly states the "pathname is relative t...
>
> Sergey Chernyshev has updated the pull request incrementally with one additional commit since the last revision:
> 
>   adjust path suffix in cgroup (v1) version specific code, when root != cgroup

src/hotspot/os/linux/cgroupUtil_linux.cpp line 70:

> 68:     os::free(limit_cg_path); // handles nullptr
> 69:     limit_cg_path = os::strdup(cg_path);
> 70:   }

We can avoid the duplicate copy of the original cgroup path, which is already captured in `orig` by using:


jlong lowest_limit = limit < 0 ? phys_mem : limit;
julong orig_limit = ((julong)lowest_limit) != phys_mem ? lowest_limit : phys_mem;


And on line 91 we change the condition from:


if ((julong)lowest_limit != phys_mem) {


to:


if ((julong)lowest_limit != orig_limit) {

src/hotspot/os/linux/cgroupUtil_linux.cpp line 129:

> 127:     lowest_limit = cpus;
> 128:     os::free(limit_cg_path); // handles nullptr
> 129:     limit_cg_path = os::strdup(cg_path);

Same here with the extra allocation of `cg_path`;

src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 320:

> 318:     ss.print_raw(cgroup_path);
> 319:     if (strstr((char*)cgroup_path, "../") != nullptr) {
> 320:       log_warning(os, container)("Cgroup v2 path at [%s] is [%s], cgroup limits can be wrong.",

Suggestion:

      log_warning(os, container)("Cgroup cpu/memory controller path includes '../', detected limits won't be accurate",

src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 322:

> 320:       log_warning(os, container)("Cgroup v2 path at [%s] is [%s], cgroup limits can be wrong.",
> 321:         mount_path, cgroup_path);
> 322:     }

Why the cast to `char*`?

We should probably move this warning to `CgroupUtil::adjust_controller`, right before we've determined that we actually need to adjust. I wonder, though, if we should just print the warning and set the cgroup_path to `/` and return early. Otherwise, path adjustment will run with no different result.

src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java line 66:

> 64:                         // Rely on path adjustment that determines the actual suffix.
> 65:                         path += cgroupPath;
> 66:                     }

This seems a simpler solution than the hotspot one. While I prefer this one, please make them consistent at the least.

test/hotspot/jtreg/containers/docker/TestMemoryWithSubgroups.java line 39:

> 37:  * @modules java.base/jdk.internal.platform
> 38:  * @library /test/lib
> 39:  * @build jdk.test.whitebox.WhiteBox CheckOperatingSystemMXBean

`CheckOperatingSystemMXBean` seems unused.

test/jdk/jdk/internal/platform/cgroup/CgroupV1SubsystemControllerTest.java line 93:

> 91: 
> 92:     @Test
> 93:     public void testCgPathFallbackToMountPoint() {

Suggestion:

    public void testCgPathToMovedPath() {

test/jdk/jdk/internal/platform/cgroup/TestCgroupSubsystemFactory.java line 479:

> 477:         cgroupv1MemoryController.setPath(cpuInfo.getCgroupPath());
> 478:         String actualPath = cgroupv1MemoryController.path();
> 479:         assertNotNull("Controller path should not have been null", actualPath);

Suggestion:

        assertNotNull(actualPath);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860392728
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860401209
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860438020
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860430948
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860449131
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860453472
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860455992
PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1860458236


More information about the core-libs-dev mailing list