RFR: 8322420: [Linux] cgroup v2: Limits in parent nested control groups are not detected [v8]

Severin Gehwolf sgehwolf at openjdk.org
Thu Apr 25 14:00:36 UTC 2024


On Sun, 10 Mar 2024 14:40:09 GMT, Jan Kratochvil <jkratochvil at openjdk.org> wrote:

>> The testcase requires root permissions.
>> 
>> Designed by  Severin Gehwolf, implemented by Jan Kratochvil.
>
> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
> 
>  - Fix whitespace
>  - Merge branch 'master' into master-cgroup
>    
>    Conflicts:
>    	test/hotspot/gtest/os/linux/test_cgroupSubsystem_linux.cpp
>  - Fix gtest
>  - Update the Java part
>  - Fix cgroup1 backward compatibility message
>  - Merge remote-tracking branch 'centos79/master-cgroup' into master-cgroup
>  - Disable cgroup.subtree_control testcase on cgroup1
>  - Fix gtest
>  - Merge branch 'master' into master-cgroup
>  - Merge remote-tracking branch 'f38crac/master-cgroup' into master-cgroup
>  - ... and 25 more: https://git.openjdk.org/jdk/compare/243cb098...39c90162

Thanks for the updates. I like that we have consistency between cgv1 and cgv2 in the latest version in terms of hierarchical limit. What would be even better is to get consistency between CPU and memory lookup (if the restriction is enforced higher up the hierarchy). That is, it would be ideal to make `initialize_hierarchy()` controller specific.

Meanwhile I've been working on [some refactoring](https://github.com/jerboaa/jdk/commit/92aaa6fd7e3ff8b64de064fecfcd725a157cb5bb) which builds on top of [JDK-8302744](https://bugs.openjdk.org/browse/JDK-8302744) so as to make the code a bit nicer once this integrates. Then, the idea would be to use scratch controllers (`CgroupCpuController` and `CgroupMemoryController`) to determine whether or not there is a limit and figure out the actual path on a per-controller specific way - (use `CgroupMemoryController->read_memory_limit_in_bytes(phys_mem)` and `CgroupUtil::processor_count(CgroupCpuController* cpu_ctrl, int host_cpus)` in the process). Does that make sense?

A few other observations:

- The common case is when the JVM runs in a container. Then, the cgroup path is `/` on cgv2 and the and `root_mount == cgroup_path` on cgv1. We don't need to do the extra processing on those systems as the limit will be at the leaf.
- The (fairly) uncommon case is the host case where the cgroup limit is applied elsewhere (e.g. systemd slice). This is where we'd need the hierarchy walk.
- When we need to walk the hierarchy, we start at the longest path and only traverse if there is _NO_ limit. A system which sets a higher, limit (that isn't `max`), seems ill-defined and I've not come across one.
   As soon as we've found a lower value than unlimited (`-1`), we stop.
   Since cg2 is hierarchical, the lowest limit will affect the entire tree (corollary: higher values further down from that point won't have an effect):
   ```
   /a/b --> memory.max 300
      `- /c --> memory.max max (this wouldn't have any effect, therefore can be ignored).
    ```

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17198#issuecomment-2077263573


More information about the core-libs-dev mailing list