RFR: 8343191: Cgroup v1 subsystem fails to set subsystem path [v3]

Sergey Chernyshev schernyshev at openjdk.org
Fri Nov 22 09:57:44 UTC 2024


On Tue, 12 Nov 2024 23:29:47 GMT, Sergey Chernyshev <schernyshev at openjdk.org> wrote:

>> src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1SubsystemController.java line 46:
>> 
>>> 44:     }
>>> 45: 
>>> 46:     public void setPath(String cgroupPath) {
>> 
>> This should behave the same as Hotspot and also append the cgroup path to the mount point. Then let [JDK-8336881](https://bugs.openjdk.org/browse/JDK-8336881) kick in to reduce it down to the mount point (if necessary).
>
> Correct. I double checked the behavior in cg v2:
> 
> 
> sudo cgcreate -g memory:/test
> sudo cgcreate -g memory:/test/test
> sudo cgset -r memory.max=50m test
> sudo cgset -r memory.swap.max=0 test
> sudo cgset -r memory.max=100m test/test
> sudo cgset -r memory.swap.max=0 test/test
> 
> with the above setup, allocations smaller than the outer group size will work, bigger will be oom-killed.
> 
> 
> $ sudo cgexec -g memory:/test/test sh -c "head -c 45m /dev/zero | tail | wc -c"
> 
> 47185920
> 
> $ sudo cgexec -g memory:/test/test sh -c "head -c 55m /dev/zero | tail | wc -c"
> 
> 0
> Killed
> 
> 
> I'll update the PR next week.

When a process is moved to a subgroup (cg v1), such as with the command

docker run --tty=true --rm --volume=$JAVA_HOME:/jdk --privileged ubuntu:latest \
     sh -c "mkdir -p /sys/fs/cgroup/memory/test ; echo 100m > /sys/fs/cgroup/memory/test/memory.limit_in_bytes ; echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs ; /jdk/bin/java -Xlog:os+container=trace -version"

the hierarchical limits may fail to be detected because of an issue in the code below
https://github.com/openjdk/jdk/blob/b9bf447209db5d7f6bb16a0310421dbe4170500c/src/hotspot/os/linux/cgroupUtil_linux.cpp#L64-L75
Here, `limit` at line 64 is not stored as a possible lowest limit, so if the inner group has lower limit than the outer group, it won't be detected (cg v2 is affected too). The cgroup path will be adjusted to the outer group (when it's limited). Another issue is in the loop at line 66 that reduces the path. In cg v1 in `cgroupns=host` mode (default) the cgroup_path includes the base cgroup and the subgroup (when moved). It may be not quite obvious which part of the path is the actual subgroup (and the CloudFoundry case has demonstrated it). It is suggested to determine the actual subgroup (path suffix) before reducing the group path. Please see the updated patch. I also added a new test case (TestMemoryWithSubgroups.java). The issue affects also Java Metrics. Maybe you could consider adding a similar fix in the path adjustment and a test case in the Metrics update ([JDK-8336881](https://bugs.openjdk.org/browse/JDK-8336881)), here's an example for Metrics test:

    private static void testMemoryLimitSubgroupV1(String innerSize, String outerGroupMemorySize, boolean privateNamespace) throws Exception {
        Common.logNewTestCase("testMemoryLimitSubgroup, innerSize = " + innerSize);
        DockerRunOptions opts =
            new DockerRunOptions(imageName, "sh", "-c");
        opts.javaOpts = new ArrayList<>();
        opts.appendTestJavaOptions = false;
        opts.addDockerOpts("--volume", Utils.TEST_CLASSES + ":/test-classes/")
            .addDockerOpts("--privileged")
            .addDockerOpts("--cgroupns=" + (privateNamespace ? "private" : "host"))
            .addDockerOpts("--memory", outerGroupMemorySize);
        opts.addClassOptions("mkdir -p /sys/fs/cgroup/memory/test ; " +
            "echo " + innerSize + " > /sys/fs/cgroup/memory/test/memory.limit_in_bytes ; " +
            "echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs ; " +
            "/jdk/bin/java -XX:+ErrorFileToStderr -cp /test-classes/ " +
            "--add-exports java.base/jdk.internal.platform=ALL-UNNAMED " +
            "MetricsMemoryTester memory " + innerSize);

        DockerTestUtils.dockerRunJava(opts).shouldHaveExitValue(0).shouldContain("TEST PASSED!!!");
    }

    private static void testMemoryLimitSubgroupV2(String innerSize, String outerGroupMemorySize, boolean privateNamespace) throws Exception {
        Common.logNewTestCase("testMemoryLimitSubgroup, innerSize = " + innerSize);
        DockerRunOptions opts =
            new DockerRunOptions(imageName, "sh", "-c");
        opts.javaOpts = new ArrayList<>();
        opts.appendTestJavaOptions = false;
        opts.addDockerOpts("--volume", Utils.TEST_CLASSES + ":/test-classes/")
            .addDockerOpts("--privileged")
            .addDockerOpts("--cgroupns=" + (privateNamespace ? "private" : "host"))
            .addDockerOpts("--memory", outerGroupMemorySize);
        opts.addClassOptions("mkdir -p /sys/fs/cgroup/memory/test ; " +
            "echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs ; " +
            "echo '+memory' > /sys/fs/cgroup/cgroup.subtree_control ; " +
            "echo '+memory' > /sys/fs/cgroup/memory/cgroup.subtree_control ; " +
            "echo " + innerSize + " > /sys/fs/cgroup/memory/test/memory.max ; " +
            "/jdk/bin/java -XX:+ErrorFileToStderr -cp /test-classes/ " +
            "--add-exports java.base/jdk.internal.platform=ALL-UNNAMED " +
            "MetricsMemoryTester memory " + innerSize);

        DockerTestUtils.dockerRunJava(opts).shouldHaveExitValue(0).shouldContain("TEST PASSED!!!");
    }

In cg v2 the way to enable memory controller is to move the process out of the group.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21808#discussion_r1853650100


More information about the core-libs-dev mailing list