RFR: 8322420: [Linux] cgroup v2: Limits in parent nested control groups are not detected [v4]

Zdenek Zambersky zzambers at openjdk.org
Wed Aug 28 14:09:20 UTC 2024


On Wed, 28 Aug 2024 09:00:33 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this Linux container detection improvement which allows limits being detected even if they are not exposed at the leaf nodes. So far this is only observable on systemd slices on cgroup v2. For cgroup v1 this has been addressed with [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) in a version specific way. This patch proposes to address the problem a different way. Instead of only looking at the determined cgroup path for the interface files, we iterate the hierarchy up to its root and stop as soon as we have determined any limit at a given path since it's best practise to not set any higher limit lower down the hierarchy (except for the default of unset/max).
>> 
>> Consider this subsystem path:
>> 
>> 
>> /sys/fs/cgroup/memory/user.slice/user-cg.slice/user-cg-cpu.slice/run-r634adce2617145ea9660623c335cb3db.scope
>> 
>> 
>> with a root of `/sys/fs/cgroup/memory` and a cgroup path of `/user.slice/user-cg.slice/user-cg-cpu.slice/run-r634adce2617145ea9660623c335cb3db.scope`. Then prior this patch we only looked at `/sys/fs/cgroup/memory/user.slice/user-cg.slice/user-cg-cpu.slice/run-r634adce2617145ea9660623c335cb3db.scope/memory.max` on cgroup v2 systems for the limit and at `/sys/fs/cgroup/memory/user.slice/user-cg.slice/user-cg-cpu.slice/run-r634adce2617145ea9660623c335cb3db.scope/memory.limit_in_bytes` on cgroup v1 systems. On cgroup v1 we also looked at `/sys/fs/cgroup/memory/user.slice/user-cg.slice/user-cg-cpu.slice/run-r634adce2617145ea9660623c335cb3db.scope/memory.stat` looking for the `hierarchical_memory_limit` key in there if the original look-up of the limit in `memory.limit_in_bytes` file returned no limit. However, the `hierarchical_memory_limit` info is cgroup v1 specific and not present on cg v2's `memory.stat` files.
>> 
>> This patch addresses this problem in a uniform way by walking the cgroup path up to the root looking up any limit, solving the problem that got addressed version specific with JDK-8217338 at the time as well as addressing the problem on cgroup v2. As soon as any limit is being found it uses that path for the specific controller. That is on cg v1 the following series of paths are being looked at in that order (provided there is no limit set, thus processing doesn't stop early):
>> 
>> 
>> /sys/fs/cgroup/memory/user.slice/user-cg.slice/user-cg-cpu.slice/run-r634adce2617145ea9660623c335cb3db.scope/memory.limit_in_bytes
>> /sys/fs/cgroup/memory/user.slice/user-cg.slice/user-cg-cpu.slice/memory.limit_in_bytes
>> /sy...
>
> Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Set _path to nullptr (cgv1)

Btw, we have [old reproducer](https://github.com/rh-openjdk/jtreg-buffer/blob/7acd6f59c374bb81c2fc3e2483a182cf47a00a9d/test/reproducers/1463098/cgroup-memory-limit-respected-systemd.sh), which was affected by issue addressed by this PR. It is currently excluded on cgroup2, due to current JDK limitation.

I have tried to run this test (re-enabled on cgroup2) on jdk build with this changeset:
RHEL-9 (cgrup2): OK
RHEL-8 (cgroup1): OK

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20646#issuecomment-2315423621


More information about the hotspot-runtime-dev mailing list