RFR: 8349988: Change cgroup version detection logic to not depend on /proc/cgroups

Thomas Fitzsimmons duke at openjdk.org
Fri Feb 28 07:45:53 UTC 2025


On Thu, 27 Feb 2025 14:48:16 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> This pull request fixes https://bugs.openjdk.org/browse/JDK-8349988 and https://bugs.openjdk.org/browse/JDK-8347811.
>> 
>> I tested it with:
>> 
>> 
>> java -Xlog:os+container=trace -version
>> 
>> on:
>> 
>> `Red Hat Enterprise Linux 8 (cgroups v1 only)`:
>> _No change in behaviour_
>> 
>> `Fedora 41 (cgroups v2)`:
>> _More verbose output due to `/sys/fs/cgroup/cgroup.controllers` parsing:_
>> 
>> --- tt-old-f41.txt	2025-02-26 15:37:56.310738515 -0500
>> +++ tt-new-f41.txt	2025-02-26 15:37:56.601739407 -0500
>> @@ -1,7 +1,12 @@
>>  [trace][os,container] OSContainer::init: Initializing Container Support
>> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups
>> -[debug][os,container] controller cpuset is not enabled
>> -                    ] 
>> +[debug][os,container] v2 controller cpuset is enabled and relevant
>> +[debug][os,container] v2 controller cpu is enabled and required
>> +[debug][os,container] v2 controller io is enabled but not relevant
>> +[debug][os,container] v2 controller memory is enabled and required
>> +[debug][os,container] v2 controller hugetlb is enabled but not relevant
>> +[debug][os,container] v2 controller pids is enabled and relevant
>> +[debug][os,container] v2 controller rdma is enabled but not relevant
>> +[debug][os,container] v2 controller misc is enabled but not relevant
>>  [debug][os,container] Detected cgroups v2 unified hierarchy
>>  [trace][os,container] Adjusting controller path for memory: /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope
>>  [trace][os,container] Path to /memory.max is /sys/fs/cgroup/user.slice/user-4215196.slice/user at 4215196.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135086d6-2de4-4f2e-ad94-899b5eecaf83.scope/memory.max
>> 
>> 
>> `Fedora 41 (custom kernel with cgroups v1 disabled)`:
>> _Fixes `cgroups v2` detection:_
>> 
>> --- tt-old-f41-custom-kernel.txt	2025-02-26 15:37:58.197744304 -0500
>> +++ tt-new-f41-custom-kernel.txt	2025-02-26 15:37:59.380747933 -0500
>> @@ -1,7 +1,63 @@
>>  [trace][os,container] OSContainer::init: Initializing Container Support
>> -[debug][os,container] Detected optional pids controller entry in /proc/cgroups
>> -[debug][os,container] controller cpuset is not enabled
>> -                    ] 
>> -[debug][os,container] controller memory is not enabled
>> -                    ] 
>> -[debug][os,container] One or more required controllers disabled at kernel level.
>> +[...
>
> src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 360:
> 
>> 358:     for (int i = 0; i < CG_INFO_LENGTH; i++) {
>> 359:       // pids and cpuset controllers are optional. All other controllers are required
>> 360:       if (i != PIDS_IDX && i != CPUSET_IDX) {
> 
> Same comment for `cpuset` controller. Keep it required for cg v1.

Hmm, this logic was there before my changes; I am pretty sure I did keep `cpuset` required for `cgroups v1`.

To attempt to confirm my patch had not changed the `cgroups v1` `cpuset` optionality logic, I wrote a test case for "cgroups v1, cpuset not present in /proc/cgroups".

(To avoid assertion failures I created fake `/proc/self/cgroup` and `/proc/self/mountinfo` files that also do not have `cpuset` lines.)

I adapted the test case to before my patch:

https://github.com/fitzsim/jdk/commit/cab0000598b522d38865452575f95ed368b5935b

and after my patch:

https://github.com/fitzsim/jdk/commit/d8537baed5d56b5f36ce628b8c180ce18ce6f6e6

and it passes in both cases.  i.e., for `cgroups v1`, `cpuset` is required, so `determine_type` produces `INVALID_CGROUPS_V1 (4)`, from this logic:


  if (!cg_infos[CPUSET_IDX]._data_complete) {
    log_debug(os, container)("Required cgroup v1 cpuset subsystem not found");
    cleanup(cg_infos);
    *flags = INVALID_CGROUPS_V1;
    return false;
  }


The test fails in both cases if I remove ` && i != CPUSET_IDX`, with  `determine_type` returning `INVALID_CGROUPS_GENERIC (6)`:


Required cpuset controller missing in /proc/cgroups. Invalid. expected: 4 but was: 6


due to this block (which my patch left alone):


  if (!all_required_controllers_enabled) {
    // one or more required controllers disabled, disable container support
    log_debug(os, container)("One or more required controllers disabled at kernel level.");
    cleanup(cg_infos);
    *flags = INVALID_CGROUPS_GENERIC;
    return false;
  }


No existing test checks this logic as far as I can tell, so if you think the above makes sense, I can add this test to this pull request.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23811#discussion_r1974950233


More information about the hotspot-dev mailing list