RFR: 8322420: [Linux] cgroup v2: Limits in parent nested control groups are not detected [v2]

Jan Kratochvil jkratochvil at openjdk.org
Wed Jan 31 15:42:32 UTC 2024


On Fri, 12 Jan 2024 17:12:37 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

> Walking the directory hierarchy (**and** interface files) on every call of `physical_memory()` (or `active_processor_count()` for CPU) seems concerning.

I have therefore implemented a hierarchical `memory.stat` extension for cgroup2 which already exists for cgroup1 - if the patch is agreed upon I will submit it to Linux kernel: [cgroup2-hierarchical_memory_limit.patch.txt](https://github.com/openjdk/jdk/files/14113452/cgroup2-hierarchical_memory_limit.patch.txt)

> Since it seems unlikely for cgroup interface file paths changing during runtime, walking the hierarchy for every look-up seems overkill. Note that the prime users of this feature are in-container use, where most runtimes have the limit settings at the leaf.

While I understand the most common change of the limits is at the leaf technically any parent group limit can change. Which is also what this patch is implementing. For the performance issue I have implemented the Linux kernel extension above.

> `N` files per metric to look at where it used to be one file (i.e. constant).

The problem is any of the files can change. But to fix the performance I have implemented the Linux kernel patch above.

> I wonder if it would be sufficient to "lock" into the cgroup path where the lowest  limits are set in the hierarchy and only use the interface files at that path of a specific controller.

That mostly disables the functionality of this patch.


>     * There seems to be an inconsistency between cgroups v1 (no recursive look-up) and cgroups v2  (recursive look-up). Why? I think we should employ the same semantics to both  versions (and drop the hierarchical work-around - [JDK-8217338](https://bugs.openjdk.java.net/browse/JDK-8217338) - for cg v1).

I did not much expect anyone still uses cgroup1. Plus customer was also interested in cgroup2. AFAIK a last OS using cgroup1 was RHEL-7 which will have EOL in a few months. But then I tried to implement cgroup1 but there it sort of already worked thanks for your [JDK-8217338](https://bugs.openjdk.java.net/browse/JDK-8217338). I just fixed there to prefer the hierarchical limit over the leaf limit (and the testcase does test it now) as that is what is used by Linux kernel.

And then my Linux kernel patch implements the same `memory.stat` way even for cgroup2.

>     * There also seems to be an inconsistency between metrics being iterated. `memory.max` and `memory.swap.max`  and `memory.swap.current` are being iterated, others, like CPU quotas (`cpu.max` or
>       `cpu.weight`) not. Why? Both limits could potentially be one path up the hierarchy, meaning that cpu limits imposed that way wouldn't be detected.

That is a correct comment. Customer asked only for `memory`. I can implement `cpu` but I would like to find out the approved approach first before implementing the same also for `cpu`.

>     * What testing have you done on this? Did this run through cgroups v1 and cgroups v2  testing? I see failures in `jdk/internal/platform/cgroup` tests (compilation fails).

Thanks for the catch, I have fixed the failure now. I rely on GHA (GitHub Actions) as running the testsuite on my local machine takes 24 hours with many fuzzy results so it is very time consuming to reliably test a patch.

> src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 287:
> 
>> 285:   _paths_size = 0;
>> 286:   for (const char *cs = _cgroup_path; (cs = strchr(cs, '/')); ++cs)
>> 287:     ++_paths_size;
> 
> What happens if we end up having multiple slashes? Like `/foo//bar/baz`? This logic would be a good candidate for having a gtest.

It should not happen as we get the cgroup directory path from Linux kernel. But I have added a gtest for it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17198#issuecomment-1919349149
PR Review Comment: https://git.openjdk.org/jdk/pull/17198#discussion_r1472971602


More information about the core-libs-dev mailing list