RFR: 8322420: [Linux] cgroup v2: Limits in parent nested control groups are not detected [v2]
Severin Gehwolf
sgehwolf at openjdk.org
Fri Jan 12 17:15:26 UTC 2024
On Thu, 28 Dec 2023 15:19:23 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:
>> The testcase requires root permissions.
>
> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix gtest testcases compilation errors
It looks like this patch needs a bit more discussion. Generally, it would be good to have the functionality, but I'm unsure about the proposed implementation.
Walking the directory hierarchy (**and** interface files) on every call of `physical_memory()` (or `active_processor_count()` for CPU) seems concerning. Since it seems unlikely for cgroup interface file paths changing during runtime, walking the hierarchy for every look-up seems overkill. Note that the prime users of this feature are in-container use, where most runtimes have the limit settings at the leaf. A few more comments below:
- After this patch, AFAIUI for a cgroup path like `/foo/bar/baz/memory.max` the following files are being looked at for the memory limit (for example for the interface file `memory.max`):
```
/foo/bar/baz/memory.max
/foo/bar/memory.max
/foo/memory.max
```
This used to be just one file at the leaf, `/foo/bar/baz/memory.max` (prior this patch). So this change has an effect on file IO. `N` files per metric to look at where it used to be one file (i.e. constant). Note that switches like `UseDynamicNumberOfCompilerThreads` make this particularly worrying. I wonder if it would be sufficient to "lock" into the cgroup path where the lowest limits are set in the hierarchy and only use the interface files at that path of a specific controller. This would mean adjusting `CgroupsV2Subsystem` similar to `CgroupsV1Subsystem` to have unique controller paths (i.e. `cpu` and `memory` controller paths could potentially be different). But the code would already be mostly there for this. The idea would be to do the initial walk of the tree at JVM startup, and from then on, only look at the path with a limit set (if any) for subsequent calls. That is `controller->subsystem_path()` would return the correct path at runtime when initialization is done. T
houghts?
- There seems to be an inconsistency between cgroups v1 (no recursive look-up) and cgroups v2 (recursive look-up). Why? I think we should employ the same semantics to both versions (and drop the hierarchical work-around - [JDK-8217338](https://bugs.openjdk.java.net/browse/JDK-8217338) - for cg v1).
- There also seems to be an inconsistency between metrics being iterated. `memory.max` and `memory.swap.max` and `memory.swap.current` are being iterated, others, like CPU quotas (`cpu.max` or
`cpu.weight`) not. Why? Both limits could potentially be one path up the hierarchy, meaning that cpu limits imposed that way wouldn't be detected.
- What testing have you done on this? Did this run through cgroups v1 and cgroups v2 testing? I see failures in `jdk/internal/platform/cgroup` tests (compilation fails).
src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 287:
> 285: _paths_size = 0;
> 286: for (const char *cs = _cgroup_path; (cs = strchr(cs, '/')); ++cs)
> 287: ++_paths_size;
What happens if we end up having multiple slashes? Like `/foo//bar/baz`? This logic would be a good candidate for having a gtest.
test/hotspot/jtreg/containers/cgroup/NestedCgroup.java line 30:
> 28: * @requires vm.flagless
> 29: * @library /testlibrary /test/lib
> 30: * @run driver jdk.test.lib.helpers.ClassFileInstaller
This ClassFileInstaller seems not needed.
test/hotspot/jtreg/containers/cgroup/NestedCgroup.java line 94:
> 92: cgdelete.add("-g");
> 93: cgdelete.add(CONTROLLERS_PATH_OUTER);
> 94: pSystem(cgdelete);
This deletes a control group which might not have been created. On my system that test fails due to that.
test/hotspot/jtreg/containers/cgroup/NestedCgroup.java line 97:
> 95:
> 96: List<String> cgcreate = new ArrayList<>();
> 97: cgcreate.add("cgcreate");
This test relies on `cgcreate` being present on the test system. I wonder if we could create a similar test with using `systemd-slices` only. Either way, we need to have a corresponding check that this test dependency is there a. la. `@requires docker`.
test/hotspot/jtreg/containers/cgroup/NestedCgroup.java line 112:
> 110: if (!matcher.find()) {
> 111: System.err.println(mountInfo);
> 112: throw new SkippedException("cgroup2 filesystem mount point not found");
Why is this check there? It should be the same for hierachical cgroup v1 systems (most of them), no? My testing of the `cgcreate` flow suggests it's the same on `v1`. It's just not as visible since there are per controller paths in `/proc/self/cgroup` on `v1` and we have the hierarchical memory limit work-around employed on v1.
-------------
PR Review: https://git.openjdk.org/jdk/pull/17198#pullrequestreview-1816085160
PR Review Comment: https://git.openjdk.org/jdk/pull/17198#discussion_r1449164637
PR Review Comment: https://git.openjdk.org/jdk/pull/17198#discussion_r1450686310
PR Review Comment: https://git.openjdk.org/jdk/pull/17198#discussion_r1450465850
PR Review Comment: https://git.openjdk.org/jdk/pull/17198#discussion_r1450673701
PR Review Comment: https://git.openjdk.org/jdk/pull/17198#discussion_r1450540176
More information about the core-libs-dev
mailing list