[PING] RFR: 8231111: Cgroups v2: Rework Metrics in java.base so as to recognize unified hierarchy
Bob Vandette
bob.vandette at oracle.com
Wed Feb 12 15:13:09 UTC 2020
> On Feb 12, 2020, at 6:22 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>
> Hi Bob,
>
> On Tue, 2020-02-11 at 16:59 -0500, Bob Vandette wrote:
>> I applied your patch to the latest JDK 15 sources and ran the container
>> tests on Oracle Linux 8.1 with podman/cgroupv2 enabled. There were some issues.
>> I’m not sure if its my setup or not.
>
> Looking over them, they appear to be setup issues.
>
> Getting the setup working right was tricky for me as well.
>
> Aside:
> podman has a --runtime switch which you could point to a cgroups v2
> capable crun binary for example.
> https://bugs.openjdk.java.net/browse/JDK-8230305 has some examples of
> how to use it.
>
> I also needed some extra setup to get the controllers' delegation to
> work:
> https://urldefense.com/v3/__https://scrivano.org/2019/02/26/resources-management-with-rootless-containers/__;!!GqivPVa7Brio!Ot7tv-QJgGntRyaimzIRrRh7rqvnzMu9AoOaBWINWNnnpR9LUVNNvGUe9GXp0Ri0Wg$
>
>> I also ran the same build on Ubuntu with docker/cgroupv1. I didn't see any failures on
>> the cgroupv1 system.
>
> Great, thanks!
>
>> Here are some notes:
>>
>> The podman version on OL 8.1 doesn't yes support cgroupv2. I
>> built the latest from sources for this test.
>>
>> The docker tests take a very long time under podman! Longer than
>> the cgroupv1 run.
>
> Possibly. I haven't done any fair comparison of the two.
It’s possible that it’s trying to access several container hubs. I may be getting some timeouts
since I haven’t figured out how to specify proxies for podman. I’ve set a local http_proxy env variable
but maybe that’s not working.
>
>> cpusets and cpusets.mems are blank on host and if none are specified
>> on podman/docker run command. On cgroupv1 they are host values if not
>> specified for container. Effective cpusets and cpusets.mems are set properly
>> in a container.
>
> Yes, there seems to be slight differences to cgroup v1. You can only
> set cpusets on containers if the host has the controller enabled,
> though:
>
> # cat /sys/fs/cgroup/cgroup.subtree_control
> cpuset cpu io memory pids
> # podman run --memory=300M --cpuset-cpus=0,1 -ti -v `pwd`:/mnt fedora:30 /bin/bash
> [root at 5d4a4e593a24 /]# cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cpuset.cpus
> 0-1
> [root at 5d4a4e593a24 mnt]# ./cgroupsv2-jdk/bin/java -XshowSettings:system -version
> Operating System Metrics:
> Provider: cgroupv2
> Effective CPU Count: 2
> CPU Period: 100000us
> CPU Quota: -1
> CPU Shares: -1
> List of Processors, 2 total:
> 0 1
> List of Effective Processors, 2 total:
> 0 1
> List of Memory Nodes: N/A
> List of Available Memory Nodes, 1 total:
> 0
> Memory Limit: 300.00M
> Memory Soft Limit: Unlimited
> Memory & Swap Limit: 600.00M
>
> openjdk version "15-internal" 2020-09-15
> OpenJDK Runtime Environment (build 15-internal+0-adhoc.sgehwolf.openjdk-head-2)
> OpenJDK 64-Bit Server VM (build 15-internal+0-adhoc.sgehwolf.openjdk-head-2, mixed mode, sharing)
>
In my setup, your command above works correctly. The only thing that doesn’t work is if I run the -XshowSettings
directly on the host OR I run a container without specifying —cpuset-cpus. The default setup seems wrong.
>
>> HOST OUTPUT:
>> ./java -XshowSettings:system -version
>> Operating System Metrics:
>> Provider: cgroupv2
>> Effective CPU Count: 32
>> CPU Period: -1
>> CPU Quota: -1
>> CPU Shares: -1
>> List of Processors: N/A
>> List of Effective Processors: N/A
>> List of Memory Nodes: N/A
>> List of Available Memory Nodes: N/A
>> Memory Limit: Unlimited
>> Memory Soft Limit: Unlimited
>> Memory & Swap Limit: Unlimited
>>
>> CONTAINER OUTPUT:
>> # podman run -it -v `pwd`:/mnt ubuntu bash
>> root at 3c6654a3b834:/mnt/jdk/bin# ./java -XshowSettings:system -version
>> Operating System Metrics:
>> Provider: cgroupv2
>> Effective CPU Count: 32
>> CPU Period: 100000us
>> CPU Quota: -1
>> CPU Shares: -1
>> List of Processors: N/A
>> List of Effective Processors, 32 total:
>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>> List of Memory Nodes: N/A
>> List of Available Memory Nodes, 1 total:
>> 0
>> Memory Limit: Unlimited
>> Memory Soft Limit: Unlimited
>> Memory & Swap Limit: Unlimited
>
> Yes, host and container output differ depending on the configured test
> system. I remember that I had cpusets working on the host system too at
> some point, but I forgot what the magic was to get this properly
> delegated via systemd.
>
>> Docker tests fail if /bin/docker is not available in podman setup. We
>> probably should enhance the docker check to also look for podman.
>
> Could you be more specific about this? How do you run docker tests? I
> use:
>
> -e PATH -verbose:summary -Djdk.test.container.command=podman -Djdk.test.docker.image.name=fedora -Djdk.test.docker.image.version=30
>
> with -Djdk.test.container.command=podman it shouldn't need docker. Did
> you specify that property?
I thought that most podman setups try to provide docker compatibility. I did not try using the properties.
In order to make our automation work cleaner, I was hoping that we could always just execute docker.
>
>> Two container tests failed:
>>
>> FAILED: containers/cgroup/PlainRead.java failed Memory Limit is: -2 instead
>> of unlimited or -1. This is because memory.max is not foumd.
>
> This doesn't fail for me, because I've got memory.max present on host:
>
> # cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cgroup.controllers
> cpu io memory pids
> [root at f31 sgehwolf]# cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/memory.max
> max
>
> Note, howevre, this is a hotspot test. So support for it for cgroup v2
> came with JDK-8230305. It seems we should return -1 if memory.max is
> not found (over -2, not supported). Could you file a bug for this? It's
> unrelated to this change. It should be a simple fix.
Here are my controllers:
cpuset cpu io memory pids rdma
This seems like another case where this file doesn’t exist in the path we form on the host.
[0.035s][debug][os,container] /sys/fs/cgroup/user.slice/user-23603.slice/session-14.scope/cpu.max failed, No such file or directory
It does exist here: /sys/fs/cgroup/user.slice so maybe it’s a delegation issue.
>
>> FAILED: jdk/internal/platform/cgroup/TestCgroupMetrics.java
>> This fails because nr_periods line doesn't always exist. I think you’ve got
>> to enable a quota for this to appear (not sure).
>
> Passes for me, but it needs the cpu controller enabled on the test
> system.
>
> # cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cgroup.controllers
> cpu io memory pids
> # cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cpu.stat
> usage_usec 1157537
> user_usec 606832
> system_usec 550705
> nr_periods 0
> nr_throttled 0
> throttled_usec 0
>
>> Here’s the contents:
>> % more cpu.stat
>> usage_usec 23974562755
>> user_usec 22257183568
>> system_usec 1717379186
>
> It suggests you've got the cpu controller disabled:
> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html*cpu__;Iw!!GqivPVa7Brio!Ot7tv-QJgGntRyaimzIRrRh7rqvnzMu9AoOaBWINWNnnpR9LUVNNvGUe9GWZKkNZMA$
>
> """
> and the following three when the controller is enabled:
>
>
> nr_periods
> nr_throttled
> throttled_usec
> “""
I do have the cpu controller enabled. I’ll look into your fixes for delegation issues and try again.
Bob.
>
>> CGROUPV2 results on Oracle Linux 8.1
>> ---------
>>
>> testing container APIs
>>
>> FAILED: containers/cgroup/PlainRead.java
>> Passed: containers/docker/DockerBasicTest.java
>> Passed: containers/docker/TestCPUAwareness.java
>> Passed: containers/docker/TestCPUSets.java
>> Passed: containers/docker/TestJcmdWithSideCar.java
>> Passed: containers/docker/TestJFREvents.java
>> Passed: containers/docker/TestJFRNetworkEvents.java
>> Passed: containers/docker/TestMemoryAwareness.java
>> Passed: containers/docker/TestMisc.java
>> Test results: passed: 8; failed: 1
>> Results written to /export/users/bobv/jdk15/build/jtreg/JTwork
>> Error: Some tests failed or other problems occurred.
>
> These are hotspot tests. Covered by JDK-8230305 (hotspot changes). The
> plain read test passes on a properly configured host system with
> controller delegation.
>
>> testing jdk.internal.platform APIs
>>
>> FAILED: jdk/internal/platform/cgroup/TestCgroupMetrics.java
>> Passed: jdk/internal/platform/cgroup/TestCgroupSubsystemController.java
>> Passed: jdk/internal/platform/docker/TestDockerCpuMetrics.java
>> Passed: jdk/internal/platform/docker/TestDockerMemoryMetrics.java
>> Passed: jdk/internal/platform/docker/TestSystemMetrics.java
>> Test results: passed: 4; failed: 1
>> Results written to /export/users/bobv/jdk15/build/jtreg/JTwork
>> Error: Some tests failed or other problems occurred.
>
> I believe cgroup/TestCgroupMetrics.java fails due to bad host config.
> It passes here:
>
> [root at f31 jdk-jdk]# rm -rf JTwork/ JTreport && /media/disk/jtreg/bin/jtreg -timeout:4 -jdk:../cgroupsv2-jdk/ -e PATH -verbose:summary -Djdk.test.container.command=podman -Djdk.test.docker.image.name=fedora -Djdk.test.docker.image.version=30 test/jdk/jdk/internal/platform
> Directory "JTwork" not found: creating
> Directory "JTreport" not found: creating
> Passed: jdk/internal/platform/cgroup/TestCgroupMetrics.java
> Passed: jdk/internal/platform/cgroup/TestCgroupSubsystemController.java
> Passed: jdk/internal/platform/docker/TestDockerCpuMetrics.java
> Passed: jdk/internal/platform/docker/TestDockerMemoryMetrics.java
> Passed: jdk/internal/platform/docker/TestSystemMetrics.java
> Test results: passed: 5
> Report written to /home/sgehwolf/jdk-jdk/JTreport/html/report.html
> Results written to /home/sgehwolf/jdk-jdk/JTwork
>
> JTR files available here:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/jtreg-results/
>
>> testing -XshowSettings:system launcher option
>>
>> Passed: tools/launcher/Settings.java
>> Test results: passed: 1
>
> Thanks for running the tests!
>
> FWIW, jdk-submit came back clean. If we could get the initial support
> of this pushed soon it would be great. I'd be happy to fix any follow-
> up issues.
>
> Thanks,
> Severin
>
>> Bob.
>>
>>> On Feb 11, 2020, at 1:04 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>>>
>>> Hi Mandy, Bob,
>>>
>>> Thanks again for the reviews and patience on this. Sorry it took me so
>>> long to get back to this :-/
>>>
>>> Updated webrev:
>>> Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/webrev/
>>> incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/incremental/webrev/
>>>
>>> I've tested this with docker tests on cgroup v1 and via podman on a
>>> cgroup v2 system. They pass. I'll be running this through jdk-submit as
>>> well.
>>>
>>> More below.
>>>
>>> On Tue, 2020-01-21 at 16:09 -0800, Mandy Chung wrote:
>>>> Hi Severin,
>>>>
>>>> Thanks for the update.
>>>>
>>>> On 1/21/20 11:30 AM, Severin Gehwolf wrote:
>>>>> Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/webrev/
>>>>> incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/incremental/webrev/
>>>>>
>>>>
>>>> I have answered my own question. Most of the metrics used to return 0 if unavailable due to IOException reading a file or malformed content and now are changed to return -2 due to error fetching the metric.
>>>>
>>>> The following are about limits which used to return -1 if unlimited or no limit set.
>>>> public long getCpuQuota();
>>>> public long getCpuShares();
>>>> public long getMemoryLimit();
>>>> public long getMemoryAndSwapLimit();
>>>> public long getMemorySoftLimit();
>>>>
>>>> With this patch, only getMemoryLimit and getMemoryAndSwapLimit specify to return -1 if unlimited or no limit set. However the implementation does return -1. All of the above specify to return -2 if unavailable due to error fetching the metric.
>>>>
>>>> I found the implementation quite hard to follow. I spent some time reviewing the code to see if the implementation matches the spec but I can't easily tell yet. For example,
>>>> CgroupSubsystemController::getLongValueMatchingLine returns -1 when IOException occurs.
>>>> CgroupSubsystemController::getLongEntry returns 0L if IOException occurs.
>>>>
>>>> CgroupV1SubsystemController::convertStringToLong returns Long.MAX_VALUE
>>>> when value overflows
>>>
>>> This one is intentional. It's mapped back to unlimited via
>>> longValOrUnlimited(). The reason for this is that cgroup v1 doesn't
>>> have a concept of "unlimited". Unlimited values will be a very large
>>> numbers in cgroup v1 files.
>>>
>>>> CgroupV2SubsystemController::convertStringToLong returns -1 when IOException occurs
>>>>
>>>> CgroupV1Subsystem::getCpuShares return -1 if cpu.shares == 0 or 1024
>>>> CgroupV2Subsystem::getCpuShares returns -1 if cpu.weight == 100 or 0
>>>
>>> These two are special cases too. See the implementation note of
>>> Metrics.getCpuShares(). In the cgroup v2 case the default value is 100
>>> (over 1024 in cgroup v1). That's why unlimited is being returned for
>>> those values.
>>>
>>>> CgroupV2Subsystem::getFromCpuMax returns -1 if error in reading cpu.max or malformed content
>>>> CgroupV2Subsystem::sumTokensIOStat returns -2 if IOException error
>>>> This is called by getBlkIOServiceCount and getBlkIOServiced
>>>>
>>>> I think this can be improved and add the documentation to describe
>>>> what the methods do. Since Metrics APIs consistently return -2 if
>>>> unavailable due to error in fetching the metric, why some utility
>>>> methods in *Subsystem and *SubsystemController return -1 upon error
>>>> and 0 when unlimited?
>>>>
>>>> I suspect if the getXXXValue and other methods are clearly documented
>>>> with the error cases (possibly renaming the method name if appropriate)
>>>> CgroupV1Subsystem and CgroupV2SubSystem will become very explicit
>>>> to understand.
>>>
>>> This should be fixed now.
>>>
>>> I've gone through the API doc of Metrics.java and have updated it. In
>>> general, I've updated it to return -1 if metric is unavailable (due to
>>> error in reading some files or content being empty), and -2 if not
>>> supported. No method returns -2 currently, but it might change and it's
>>> good to have some way of saying "not implementable" for this subsystem
>>> in the spec. That's my take on it anyway.
>>>
>>> There is also a new unit test for shared controller logic:
>>> TestCgroupSubsystemController.java
>>>
>>> It execises various cases of error/success.
>>>
>>> That is to ensure proper symmetry across the various cases (including
>>> IOException). I've also documented static methods in
>>> CgroupSubsystemController. Overall, all methods now return the same
>>> values for cgroup v1 and cgroup v2 (given the impl nuances) for the
>>> various cases.
>>>
>>>> CgroupSubsystem.java
>>>>
>>>> 44 public static final double DOUBLE_RETVAL_NOT_SUPPORTED = LONG_RETVAL_NOT_SUPPORTED;
>>>> 49 public static final Boolean BOOL_RETVAL_NOT_SUPPORTED = null;
>>>>
>>>> They are no longer needed, right?
>>>
>>> Removed.
>>>
>>>> CgroupSubsystemFactory.java
>>>>
>>>> 89 System.err.println("Warning: Mixed cgroupv1 and cgroupv2 not supported. Metrics disabled.");
>>>>
>>>>
>>>> I expect this be a System.Logger log
>>>
>>> Updated.
>>>
>>>> 114 if (!Integer.valueOf(0).toString().equals(tokens[0])) {
>>>>
>>>> This can be simplified to if (!"0".equals(tokens[0]))
>>>
>>> Done, thanks!
>>>
>>>> LauncherHelper.java
>>>>
>>>> 407 // Extended cgroupv1 specific metrics
>>>> 408 if (c instanceof MetricsCgroupV1) {
>>>> 409 MetricsCgroupV1 cgroupV1 = (MetricsCgroupV1)c;
>>>> 410 limit = cgroupV1.getKernelMemoryLimit();
>>>> 411 ostream.println(formatLimitString(limit, INDENT + "Kernel Memory Limit: "));
>>>> 412 limit = cgroupV1.getTcpMemoryLimit();
>>>> 413 ostream.println(formatLimitString(limit, INDENT + "TCP Memory Limit: "));
>>>> 414 Boolean value = cgroupV1.isMemoryOOMKillEnabled();
>>>> 415 ostream.println(formatBoolean(value, INDENT + "Out Of Memory Killer Enabled: "));
>>>> 416 value = cgroupV1.isCpuSetMemoryPressureEnabled();
>>>> 417 ostream.println(formatBoolean(value, INDENT + "CPUSet Memory Pressure Enabled: "));
>>>> 418 }
>>>>
>>>> MetricsCgroupV1 is linux-only. It will fail the compilation when
>>>> building on non-linux. One option is to move this code to
>>>> src/java.base/linux/share/sun/launcher/CgroupMetrics.java
>>>>
>>>> Are they continued to be interesting metrics to be output from
>>>> -XshowSetting? I wonder if they can simply be dropped from the output.
>>>> Bob will have an opinion.
>>>
>>> I've removed those extra cgroup v1 specific metrics printed via
>>> -XshowSettings:system. Not sure what to do with MetricsCgroupV1. It's
>>> only used in tests in webrev 10. On the other hand the idea would be
>>> for consumers to downcast it to MetricsCgroupV1 if they needed those
>>> extra metrics.
>>>
>>> Thanks,
>>> Severin
>>>
>
More information about the core-libs-dev
mailing list