RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups
Severin Gehwolf
sgehwolf at openjdk.org
Thu Feb 27 18:00:01 UTC 2025
On Wed, 26 Feb 2025 21:03:58 GMT, Thomas Fitzsimmons <duke at openjdk.org> wrote:
> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811.
>
> I tested it with:
>
>
> java -Xlog:os+container=trace -version
>
> on:
>
> `Red Hat Enterprise Linux 8 (cgroups v1 only)`:
> _No change in behaviour_
>
> `Fedora 41 (cgroups v2)`:
> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_
>
> --- tt-old-f41.txt 2025-02-26 15:37:56.310738515 -0500
> +++ tt-new-f41.txt 2025-02-26 15:37:56.601739407 -0500
> @@ -1,7 +1,12 @@
> [trace][os,container] OSContainer::init: Initializing Container Support
> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups
> -[debug][os,container] controller cpuset is not enabled
> - ]
> +[debug][os,container] v2 controller cpuset is enabled and relevant
> +[debug][os,container] v2 controller cpu is enabled and required
> +[debug][os,container] v2 controller io is enabled but not relevant
> +[debug][os,container] v2 controller memory is enabled and required
> +[debug][os,container] v2 controller hugetlb is enabled but not relevant
> +[debug][os,container] v2 controller pids is enabled and relevant
> +[debug][os,container] v2 controller rdma is enabled but not relevant
> +[debug][os,container] v2 controller misc is enabled but not relevant
> [debug][os,container] Detected cgroups v2 unified hierarchy
> [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope
> [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max
>
>
> `Fedora 41 (custom kernel with cgroups v1 disabled)`:
> _Fixes `cgroups v2` detection:_
>
> --- tt-old-f41-custom-kernel.txt 2025-02-26 15:37:58.197744304 -0500
> +++ tt-new-f41-custom-kernel.txt 2025-02-26 15:37:59.380747933 -0500
> @@ -1,7 +1,63 @@
> [trace][os,container] OSContainer::init: Initializing Container Support
> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups
> -[debug][os,container] controller cpuset is not enabled
> - ]
> -[debug][os,container] controller memory is not enabled
> - ]
> -[debug][os,container] One or more required controllers disabled at kernel level.
> +[debug][os,container] v2 controller cpuset is enabled and relevant
> +[debug][os,container] v2 contro...
Thanks for this. It's getting there...
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 42:
> 40: // Inlined from <linux/magic.h> for portability.
> 41: #define CGROUP2_SUPER_MAGIC 0x63677270
> 42:
We may want to surround this with:
#ifndef CGROUP2_SUPER_MAGIC
...
#endif
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 81:
> 79: bool cgroups_v2_enabled = false;
> 80:
> 81: if (statfs(sys_fs_cgroup, &fsstat) != -1) {
This probably deserves a comment:
// Assume cgroups v2 iff /sys/fs/cgroup has the cgroup v2 file system magic.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 263:
> 261: char buf[MAXPATHLEN+1];
> 262: char *p;
> 263: bool is_cgroupsV2 = true;
For all intents and purposes we can remove `is_cgroupsV2` here and use `cgroups_v2_enabled` instead.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 285:
> 283: if ((p = fgets(buf, MAXPATHLEN, controllers)) != nullptr) {
> 284: char* controller = nullptr;
> 285: char* buf_ptr = buf;
Suggestion:
char* buf_ptr = p;
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 287:
> 285: char* buf_ptr = buf;
> 286: int i;
> 287: while ((controller = strsep(&buf_ptr, " \n\t\r\f\v")) != nullptr) {
Consider defining the separators as `#define IS_SPACE_CHARS " \n\t\r\f\v"` or some such.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 288:
> 286: int i;
> 287: while ((controller = strsep(&buf_ptr, " \n\t\r\f\v")) != nullptr) {
> 288: // Skip empty string due to line ending in delimiter, '\n'.
Suggestion:
// Skip empty controllers. Be lean about the cgroups.controllers file,
// though we probably don't have to be.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 299:
> 297: } else {
> 298: log_debug(os, container)("v2 controller %s is enabled but not relevant", controller);
> 299: }
Do we really need this verbose logging? If you really think we need it, then please bump it to `trace` level. We'd be bailing out anyway if we are missing a required controller with a log.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 321:
> 319: } else {
> 320: /*
> 321: * cgroups v2 is not enabled. Read /proc/cgroups; for cgroups v1 hierarchy (hybrid or
Suggestion:
* The /sys/fs/cgroup filesystem magic hint suggests we have cg v1. Read /proc/cgroups; for cgroups v1 hierarchy (hybrid or
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 339:
> 337: cg_infos[MEMORY_IDX]._enabled = (enabled == 1);
> 338: } else if (strcmp(name, "cpuset") == 0) {
> 339: log_debug(os, container)("Detected optional cpuset controller entry in %s", controllers_file);
In https://bugs.openjdk.org/browse/JDK-8347129 we decided to keep the `cpuset` optionality alone for cg v1. I'd prefer if we kept it that way as it's becoming increasingly difficult to find those systems (or change them). Thus, hard to know if this would break something.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 360:
> 358: for (int i = 0; i < CG_INFO_LENGTH; i++) {
> 359: // pids and cpuset controllers are optional. All other controllers are required
> 360: if (i != PIDS_IDX && i != CPUSET_IDX) {
Same comment for `cpuset` controller. Keep it required for cg v1.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 361:
> 359: // pids and cpuset controllers are optional. All other controllers are required
> 360: if (i != PIDS_IDX && i != CPUSET_IDX) {
> 361: is_cgroupsV2 = is_cgroupsV2 && cg_infos[i]._hierarchy_id == 0;
Fundamentally, we are changing the "hint" as to what constitutes cg v2. So this line needs to be removed. We've already determined at this point that we have cgroup v2 (via the magic check) and we need to use that here. https://github.com/jerboaa/jdk/commit/9958173dc03b66ae96227fb3579bc053cb911f06 would do that and fix a test-consistency issue.
-------------
PR Review: https://git.openjdk.org/jdk/pull/23811#pullrequestreview-2647973128
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973695585
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974085054
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973890517
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973712472
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973705509
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974072923
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974083186
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973722049
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973728648
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973730310
PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1973877201
More information about the hotspot-dev
mailing list