RFR: 8333446: Add tests for hierarchical container support [v9]
Zdenek Zambersky
zzambers at openjdk.org
Tue Sep 10 12:15:16 UTC 2024
On Mon, 9 Sep 2024 17:28:16 GMT, Zdenek Zambersky <zzambers at openjdk.org> wrote:
>> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision:
>>
>> - Adapt JDK-8339148
>> - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>> - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>> - Fix comment of WB::host_cpus()
>> - Handle non-root + CGv2
>> - Add nested hierarchy to test framework
>> - Revert "Add root check for SystemdMemoryAwarenessTest.java"
>>
>> This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5.
>> - Add root check for SystemdMemoryAwarenessTest.java
>> - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>> - Merge branch 'master' into jdk-8333446-systemd-slice-tests
>> - ... and 7 more: https://git.openjdk.org/jdk/compare/dd1b7120...30f32d22
>
> I have done some testing on RHELs (build with changes from this PR + other 2 container PRs applied):
> **RHEL-8** (cgroup1/non-root)
> - test was skipped correctly
>
> **RHEL-9** (cgroup2/non-root)
> - I saw failure of `active_processor_count` check.
> - after investigation, I have found, that `cpu` cgroup controller is not delegated to `user at 1000.service` (and children) on rhel-9 (unlike in e.g. fedora) it only had `memory pids` (btw. available controllers at given "level" are listed in `cgroup.controllers` file in cgroups v2)
> - when I modified `user at .service` to also delegate cpu controller, test passed
>
> Apart from issue with check for `active_processor_count` on RHEL-9/non-root, it looks good. However I don't know how to easily fix issue with `active_processor_count` check. Maybe check could be skipped for non-root. (Work-around is to modify system configuration.)
> @zzambers Thanks for taking a look.
>
> > I have done some testing on RHELs (build with changes from this PR + other 2 container PRs applied): **RHEL-8** (cgroup1/non-root)
> > ```
> > * test was skipped correctly
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > **RHEL-9** (cgroup2/non-root)
> > ```
> > * I saw failure of `active_processor_count` check.
> >
> > * after investigation, I have found, that `cpu` cgroup controller is not delegated to `user at 1000.service` (and children) on rhel-9 (unlike in e.g. fedora) it only had `memory pids` (btw. available controllers at given "level" are listed in `cgroup.controllers` file in cgroups v2)
> >
> > * when I modified `user at .service` to also delegate cpu controller, test passed
> > ```
>
> Could it be that the setup you've done to employ delegation is similar to this one? https://github.com/jerboaa/openjdk-cgroupv2-setup/blob/97690683af17b303276ea473fe44b3dde7ead327/config_cgroupv2.yml#L24-L32
I have just added `cpu` to Delegate list of `user at .service`, looks similar, to what is done there. I see use of `Delegate=yes` in your link, that probably delegates all.
Thanks for this link.
>
> > Apart from issue with check for `active_processor_count` on RHEL-9/non-root, it looks good. However I don't know how to easily fix issue with `active_processor_count` check. Maybe check could be skipped for non-root. (Work-around is to modify system configuration.)
>
> Do existing podman container tests pass on that system? It seems fair to assume that that's the baseline config for container tests in general: systemd ones or podman/docker. I know that on cg v2 not all container tests pass out-of-the-box. In particular certain CPU awareness tests. Keeping that basic idea in terms of required config for those tests consistent with other container tests seem adequate to me.
You are right. I have ran container tests in my VM and indeed faced issue with missing cpuset controller.
(yesterday I forgot to set required properties, so most of them got skipped)
Interesting that we have not faced this issue in our testing (container tests are passing). However that is probably because we run containers tests in different way (we don't use VMs for it, but rather run them in beaker). I would need to investigate.
Anyway good to know, there can be this issue with cgroup controllers.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2340526280
More information about the core-libs-dev
mailing list