[PING] RFR: 8231111: Cgroups v2: Rework Metrics in java.base so as to recognize unified hierarchy

Bob Vandette bob.vandette at oracle.com
Wed Feb 12 17:29:19 UTC 2020


I applied the delegation change that you recommended and now all container tests pass.

Here’s the change that I applied.

echo +cpu +cpuset +io +memory +pids > /sys/fs/cgroup/cgroup.subtree_control
echo +cpu +cpuset +io +memory +pids > /sys/fs/cgroup/user.slice/cgroup.subtree_control
echo +cpu +cpuset +io +memory +pids > /sys/fs/cgroup/user.slice/user-23603.slice/cgroup.subtree_control

Although all the tests now pass, the values for cpusets and cpusets.mems still do not show up on the host
or container unless limits are specified.  That’s the only remaining minor difference from cgroupv1 that I can see.

testing container APIs
Directory "JTwork" not found: creating
Passed: containers/cgroup/PlainRead.java
Passed: containers/docker/DockerBasicTest.java
Passed: containers/docker/TestCPUAwareness.java
Passed: containers/docker/TestCPUSets.java
Passed: containers/docker/TestJcmdWithSideCar.java
Passed: containers/docker/TestJFREvents.java
Passed: containers/docker/TestJFRNetworkEvents.java
Passed: containers/docker/TestMemoryAwareness.java
Passed: containers/docker/TestMisc.java
Test results: passed: 9
Results written to /export/users/bobv/jdk15/build/jtreg/JTwork
testing jdk.internal.platform APIs
Passed: jdk/internal/platform/cgroup/TestCgroupMetrics.java
Passed: jdk/internal/platform/cgroup/TestCgroupSubsystemController.java
Passed: jdk/internal/platform/docker/TestDockerCpuMetrics.java
Passed: jdk/internal/platform/docker/TestDockerMemoryMetrics.java
Passed: jdk/internal/platform/docker/TestSystemMetrics.java
Test results: passed: 5
Results written to /export/users/bobv/jdk15/build/jtreg/JTwork
testing -XshowSettings:system launcher option
Passed: tools/launcher/Settings.java
Test results: passed: 1
Results written to /export/users/bobv/jdk15/build/jtreg/JTwork

Bob.

> On Feb 12, 2020, at 10:13 AM, Bob Vandette <bob.vandette at oracle.com> wrote:
> 
> 
>> On Feb 12, 2020, at 6:22 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>> 
>> Hi Bob,
>> 
>> On Tue, 2020-02-11 at 16:59 -0500, Bob Vandette wrote:
>>> I applied your patch to the latest JDK 15 sources and ran the container
>>> tests on Oracle Linux 8.1 with podman/cgroupv2 enabled.  There were some issues.
>>> I’m not sure if its my setup or not.  
>> 
>> Looking over them, they appear to be setup issues.
>> 
>> Getting the setup working right was tricky for me as well.
>> 
>> Aside:
>> podman has a --runtime switch which you could point to a cgroups v2
>> capable crun binary for example.
>> https://bugs.openjdk.java.net/browse/JDK-8230305 has some examples of
>> how to use it.
>> 
>> I also needed some extra setup to get the controllers' delegation to
>> work:
>> https://urldefense.com/v3/__https://scrivano.org/2019/02/26/resources-management-with-rootless-containers/__;!!GqivPVa7Brio!Ot7tv-QJgGntRyaimzIRrRh7rqvnzMu9AoOaBWINWNnnpR9LUVNNvGUe9GXp0Ri0Wg$ 
>> 
>>> I also ran the same build on Ubuntu with docker/cgroupv1.  I didn't see any failures on
>>> the cgroupv1 system.
>> 
>> Great, thanks!
>> 
>>> Here are some notes:
>>> 
>>> The podman version on OL 8.1 doesn't yes support cgroupv2.  I
>>> built the latest from sources for this test.
>>> 
>>> The docker tests take a very long time under podman!  Longer than
>>> the cgroupv1 run.
>> 
>> Possibly. I haven't done any fair comparison of the two.
> 
> It’s possible that it’s trying to access several container hubs.  I may be getting some timeouts
> since I haven’t figured out how to specify proxies for podman.  I’ve set a local http_proxy env variable
> but maybe that’s not working.
> 
>> 
>>> cpusets and cpusets.mems are blank on host and if none are specified
>>> on podman/docker run command.  On cgroupv1 they are host values if not
>>> specified for container.  Effective cpusets and cpusets.mems are set properly
>>> in a container.
>> 
>> Yes, there seems to be slight differences to cgroup v1. You can only
>> set cpusets on containers if the host has the controller enabled,
>> though:
>> 
>> # cat /sys/fs/cgroup/cgroup.subtree_control
>> cpuset cpu io memory pids
>> # podman run --memory=300M --cpuset-cpus=0,1 -ti -v `pwd`:/mnt fedora:30 /bin/bash
>> [root at 5d4a4e593a24 /]# cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cpuset.cpus
>> 0-1
>> [root at 5d4a4e593a24 mnt]# ./cgroupsv2-jdk/bin/java -XshowSettings:system -version
>> Operating System Metrics:
>>   Provider: cgroupv2
>>   Effective CPU Count: 2
>>   CPU Period: 100000us
>>   CPU Quota: -1
>>   CPU Shares: -1
>>   List of Processors, 2 total: 
>>   0 1 
>>   List of Effective Processors, 2 total: 
>>   0 1 
>>   List of Memory Nodes: N/A
>>   List of Available Memory Nodes, 1 total: 
>>   0 
>>   Memory Limit: 300.00M
>>   Memory Soft Limit: Unlimited
>>   Memory & Swap Limit: 600.00M
>> 
>> openjdk version "15-internal" 2020-09-15
>> OpenJDK Runtime Environment (build 15-internal+0-adhoc.sgehwolf.openjdk-head-2)
>> OpenJDK 64-Bit Server VM (build 15-internal+0-adhoc.sgehwolf.openjdk-head-2, mixed mode, sharing)
>> 
> 
> In my setup, your command above works correctly.  The only thing that doesn’t work is if I run the -XshowSettings
> directly on the host OR I run a container without specifying —cpuset-cpus.  The default setup seems wrong.
> 
>> 
>>> HOST OUTPUT:
>>> ./java -XshowSettings:system -version
>>> Operating System Metrics:
>>>   Provider: cgroupv2
>>>   Effective CPU Count: 32
>>>   CPU Period: -1
>>>   CPU Quota: -1
>>>   CPU Shares: -1
>>>   List of Processors: N/A
>>>   List of Effective Processors: N/A
>>>   List of Memory Nodes: N/A
>>>   List of Available Memory Nodes: N/A
>>>   Memory Limit: Unlimited
>>>   Memory Soft Limit: Unlimited
>>>   Memory & Swap Limit: Unlimited
>>> 
>>> CONTAINER OUTPUT:
>>> # podman run -it -v `pwd`:/mnt ubuntu bash
>>> root at 3c6654a3b834:/mnt/jdk/bin# ./java -XshowSettings:system -version
>>> Operating System Metrics:
>>>   Provider: cgroupv2
>>>   Effective CPU Count: 32
>>>   CPU Period: 100000us
>>>   CPU Quota: -1
>>>   CPU Shares: -1
>>>   List of Processors: N/A
>>>   List of Effective Processors, 32 total: 
>>>   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
>>>   List of Memory Nodes: N/A
>>>   List of Available Memory Nodes, 1 total: 
>>>   0 
>>>   Memory Limit: Unlimited
>>>   Memory Soft Limit: Unlimited
>>>   Memory & Swap Limit: Unlimited
>> 
>> Yes, host and container output differ depending on the configured test
>> system. I remember that I had cpusets working on the host system too at
>> some point, but I forgot what the magic was to get this properly
>> delegated via systemd.
>> 
>>> Docker tests fail if /bin/docker is not available in podman setup.  We
>>> probably should enhance the docker check to also look for podman.
>> 
>> Could you be more specific about this? How do you run docker tests? I
>> use:
>> 
>> -e PATH -verbose:summary -Djdk.test.container.command=podman -Djdk.test.docker.image.name=fedora -Djdk.test.docker.image.version=30
>> 
>> with -Djdk.test.container.command=podman it shouldn't need docker. Did
>> you specify that property?
> 
> I thought that most podman setups try to provide docker compatibility.  I did not try using the properties.
> In order to make our automation work cleaner, I was hoping that we could always just execute docker.
> 
>> 
>>> Two container tests failed:
>>> 
>>> FAILED: containers/cgroup/PlainRead.java failed Memory Limit is: -2 instead
>>> of unlimited or -1.  This is because memory.max is not foumd.
>> 
>> This doesn't fail for me, because I've got memory.max present on host:
>> 
>> # cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cgroup.controllers
>> cpu io memory pids
>> [root at f31 sgehwolf]# cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/memory.max
>> max
>> 
>> Note, howevre, this is a hotspot test. So support for it for cgroup v2
>> came with JDK-8230305. It seems we should return -1 if memory.max is
>> not found (over -2, not supported). Could you file a bug for this? It's
>> unrelated to this change. It should be a simple fix.
> 
> Here are my controllers:
> cpuset cpu io memory pids rdma
> 
> This seems like another case where this file doesn’t exist in the path we form on the host.
> 
> [0.035s][debug][os,container] /sys/fs/cgroup/user.slice/user-23603.slice/session-14.scope/cpu.max failed, No such file or directory
> 
> It does exist here:  /sys/fs/cgroup/user.slice so maybe it’s a delegation issue.
> 
>> 
>>> FAILED: jdk/internal/platform/cgroup/TestCgroupMetrics.java
>>> This fails because nr_periods line doesn't always exist.  I think you’ve got
>>> to enable a quota for this to appear (not sure).  
>> 
>> Passes for me, but it needs the cpu controller enabled on the test
>> system.
>> 
>> # cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cgroup.controllers
>> cpu io memory pids
>> # cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cpu.stat
>> usage_usec 1157537
>> user_usec 606832
>> system_usec 550705
>> nr_periods 0
>> nr_throttled 0
>> throttled_usec 0
>> 
>>> Here’s the contents:
>>> % more cpu.stat
>>> usage_usec 23974562755
>>> user_usec 22257183568
>>> system_usec 1717379186
>> 
>> It suggests you've got the cpu controller disabled:
>> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html*cpu__;Iw!!GqivPVa7Brio!Ot7tv-QJgGntRyaimzIRrRh7rqvnzMu9AoOaBWINWNnnpR9LUVNNvGUe9GWZKkNZMA$ 
>> 
>> """
>> and the following three when the controller is enabled:
>> 
>> 
>> nr_periods
>> nr_throttled
>> throttled_usec
>> “""
> 
> I do have the cpu controller enabled.  I’ll look into your fixes for delegation issues and try again.
> 
> Bob.
> 
>> 
>>> CGROUPV2 results on Oracle Linux 8.1
>>> ---------
>>> 
>>> testing container APIs
>>> 
>>> FAILED: containers/cgroup/PlainRead.java
>>> Passed: containers/docker/DockerBasicTest.java
>>> Passed: containers/docker/TestCPUAwareness.java
>>> Passed: containers/docker/TestCPUSets.java
>>> Passed: containers/docker/TestJcmdWithSideCar.java
>>> Passed: containers/docker/TestJFREvents.java
>>> Passed: containers/docker/TestJFRNetworkEvents.java
>>> Passed: containers/docker/TestMemoryAwareness.java
>>> Passed: containers/docker/TestMisc.java
>>> Test results: passed: 8; failed: 1
>>> Results written to /export/users/bobv/jdk15/build/jtreg/JTwork
>>> Error: Some tests failed or other problems occurred.
>> 
>> These are hotspot tests. Covered by JDK-8230305 (hotspot changes). The
>> plain read test passes on a properly configured host system with
>> controller delegation.
>> 
>>> testing jdk.internal.platform APIs
>>> 
>>> FAILED: jdk/internal/platform/cgroup/TestCgroupMetrics.java
>>> Passed: jdk/internal/platform/cgroup/TestCgroupSubsystemController.java
>>> Passed: jdk/internal/platform/docker/TestDockerCpuMetrics.java
>>> Passed: jdk/internal/platform/docker/TestDockerMemoryMetrics.java
>>> Passed: jdk/internal/platform/docker/TestSystemMetrics.java
>>> Test results: passed: 4; failed: 1
>>> Results written to /export/users/bobv/jdk15/build/jtreg/JTwork
>>> Error: Some tests failed or other problems occurred.
>> 
>> I believe cgroup/TestCgroupMetrics.java fails due to bad host config.
>> It passes here:
>> 
>> [root at f31 jdk-jdk]# rm -rf JTwork/ JTreport && /media/disk/jtreg/bin/jtreg -timeout:4 -jdk:../cgroupsv2-jdk/ -e PATH -verbose:summary -Djdk.test.container.command=podman -Djdk.test.docker.image.name=fedora -Djdk.test.docker.image.version=30 test/jdk/jdk/internal/platform
>> Directory "JTwork" not found: creating
>> Directory "JTreport" not found: creating
>> Passed: jdk/internal/platform/cgroup/TestCgroupMetrics.java
>> Passed: jdk/internal/platform/cgroup/TestCgroupSubsystemController.java
>> Passed: jdk/internal/platform/docker/TestDockerCpuMetrics.java
>> Passed: jdk/internal/platform/docker/TestDockerMemoryMetrics.java
>> Passed: jdk/internal/platform/docker/TestSystemMetrics.java
>> Test results: passed: 5
>> Report written to /home/sgehwolf/jdk-jdk/JTreport/html/report.html
>> Results written to /home/sgehwolf/jdk-jdk/JTwork
>> 
>> JTR files available here:
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/jtreg-results/
>> 
>>> testing -XshowSettings:system launcher option
>>> 
>>> Passed: tools/launcher/Settings.java
>>> Test results: passed: 1
>> 
>> Thanks for running the tests!
>> 
>> FWIW, jdk-submit came back clean. If we could get the initial support
>> of this pushed soon it would be great. I'd be happy to fix any follow-
>> up issues.
>> 
>> Thanks,
>> Severin
>> 
>>> Bob.
>>> 
>>>> On Feb 11, 2020, at 1:04 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>>>> 
>>>> Hi Mandy, Bob,
>>>> 
>>>> Thanks again for the reviews and patience on this. Sorry it took me so
>>>> long to get back to this :-/
>>>> 
>>>> Updated webrev:
>>>> Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/webrev/
>>>> incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/incremental/webrev/
>>>> 
>>>> I've tested this with docker tests on cgroup v1 and via podman on a
>>>> cgroup v2 system. They pass. I'll be running this through jdk-submit as
>>>> well.
>>>> 
>>>> More below.
>>>> 
>>>> On Tue, 2020-01-21 at 16:09 -0800, Mandy Chung wrote:
>>>>> Hi Severin,
>>>>> 
>>>>> Thanks for the update.
>>>>> 
>>>>> On 1/21/20 11:30 AM, Severin Gehwolf wrote:
>>>>>> Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/webrev/
>>>>>> incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/incremental/webrev/
>>>>>> 
>>>>> 
>>>>> I have answered my own question.   Most of the metrics used to return 0 if unavailable due to IOException reading a file or malformed content and now are changed to return -2 due to error fetching the metric.
>>>>> 
>>>>> The following are about limits which used to return -1 if unlimited or no limit set.
>>>>>  public long getCpuQuota();
>>>>>  public long getCpuShares();
>>>>>  public long getMemoryLimit();
>>>>>  public long getMemoryAndSwapLimit();
>>>>>  public long getMemorySoftLimit();
>>>>> 
>>>>> With this patch, only getMemoryLimit and getMemoryAndSwapLimit specify to return -1 if unlimited or no limit set.  However the implementation does return -1.  All of the above specify to return -2 if unavailable due to error fetching the metric. 
>>>>> 
>>>>> I found the implementation quite hard to follow.  I spent some time reviewing the code to see if the implementation matches the spec  but I can't easily tell yet.  For example,
>>>>> CgroupSubsystemController::getLongValueMatchingLine returns -1 when IOException occurs.
>>>>> CgroupSubsystemController::getLongEntry returns 0L if IOException occurs.
>>>>> 
>>>>> CgroupV1SubsystemController::convertStringToLong returns Long.MAX_VALUE
>>>>> when value overflows
>>>> 
>>>> This one is intentional. It's mapped back to unlimited via
>>>> longValOrUnlimited(). The reason for this is that cgroup v1 doesn't
>>>> have a concept of "unlimited". Unlimited values will be a very large
>>>> numbers in cgroup v1 files.
>>>> 
>>>>> CgroupV2SubsystemController::convertStringToLong returns -1 when IOException occurs
>>>>> 
>>>>> CgroupV1Subsystem::getCpuShares return -1 if cpu.shares == 0 or 1024
>>>>> CgroupV2Subsystem::getCpuShares returns -1 if cpu.weight == 100 or 0
>>>> 
>>>> These two are special cases too. See the implementation note of
>>>> Metrics.getCpuShares(). In the cgroup v2 case the default value is 100
>>>> (over 1024 in cgroup v1). That's why unlimited is being returned for
>>>> those values.
>>>> 
>>>>> CgroupV2Subsystem::getFromCpuMax returns -1 if error in reading cpu.max or malformed content
>>>>> CgroupV2Subsystem::sumTokensIOStat returns -2 if IOException error
>>>>> This is called by getBlkIOServiceCount and getBlkIOServiced
>>>>> 
>>>>> I think this can be improved and add the documentation to describe
>>>>> what the methods do.  Since Metrics APIs consistently return -2 if 
>>>>> unavailable due to error in fetching the metric, why some utility
>>>>> methods in *Subsystem and *SubsystemController return -1 upon error
>>>>> and 0 when unlimited?
>>>>> 
>>>>> I suspect if the getXXXValue and other methods are clearly documented
>>>>> with the error cases (possibly renaming the method name if appropriate)
>>>>> CgroupV1Subsystem and CgroupV2SubSystem will become very explicit 
>>>>> to understand.
>>>> 
>>>> This should be fixed now.
>>>> 
>>>> I've gone through the API doc of Metrics.java and have updated it. In
>>>> general, I've updated it to return -1 if metric is unavailable (due to
>>>> error in reading some files or content being empty), and -2 if not
>>>> supported. No method returns -2 currently, but it might change and it's
>>>> good to have some way of saying "not implementable" for this subsystem
>>>> in the spec. That's my take on it anyway.
>>>> 
>>>> There is also a new unit test for shared controller logic:
>>>> TestCgroupSubsystemController.java
>>>> 
>>>> It execises various cases of error/success.
>>>> 
>>>> That is to ensure proper symmetry across the various cases (including
>>>> IOException). I've also documented static methods in
>>>> CgroupSubsystemController. Overall, all methods now return the same
>>>> values for cgroup v1 and cgroup v2 (given the impl nuances) for the
>>>> various cases.
>>>> 
>>>>> CgroupSubsystem.java
>>>>> 
>>>>> 44     public static final double DOUBLE_RETVAL_NOT_SUPPORTED = LONG_RETVAL_NOT_SUPPORTED;
>>>>> 49     public static final Boolean BOOL_RETVAL_NOT_SUPPORTED = null;
>>>>> 
>>>>> They are no longer needed, right?
>>>> 
>>>> Removed.
>>>> 
>>>>> CgroupSubsystemFactory.java
>>>>> 
>>>>> 89             System.err.println("Warning: Mixed cgroupv1 and cgroupv2 not supported. Metrics disabled.");
>>>>> 
>>>>> 
>>>>> I expect this be a System.Logger log 
>>>> 
>>>> Updated.
>>>> 
>>>>> 114                     if (!Integer.valueOf(0).toString().equals(tokens[0])) {
>>>>> 
>>>>> This can be simplified to if (!"0".equals(tokens[0]))
>>>> 
>>>> Done, thanks!
>>>> 
>>>>> LauncherHelper.java
>>>>> 
>>>>> 407         // Extended cgroupv1 specific metrics
>>>>> 408         if (c instanceof MetricsCgroupV1) {
>>>>> 409             MetricsCgroupV1 cgroupV1 = (MetricsCgroupV1)c;
>>>>> 410             limit = cgroupV1.getKernelMemoryLimit();
>>>>> 411             ostream.println(formatLimitString(limit, INDENT + "Kernel Memory Limit: "));
>>>>> 412             limit = cgroupV1.getTcpMemoryLimit();
>>>>> 413             ostream.println(formatLimitString(limit, INDENT + "TCP Memory Limit: "));
>>>>> 414             Boolean value = cgroupV1.isMemoryOOMKillEnabled();
>>>>> 415             ostream.println(formatBoolean(value, INDENT + "Out Of Memory Killer Enabled: "));
>>>>> 416             value = cgroupV1.isCpuSetMemoryPressureEnabled();
>>>>> 417             ostream.println(formatBoolean(value, INDENT + "CPUSet Memory Pressure Enabled: "));
>>>>> 418         }
>>>>> 
>>>>> MetricsCgroupV1 is linux-only.  It will fail the compilation when
>>>>> building on non-linux.  One option is to move this code to 
>>>>> src/java.base/linux/share/sun/launcher/CgroupMetrics.java
>>>>> 
>>>>> Are they continued to be interesting metrics to be output from
>>>>> -XshowSetting?  I wonder if they can simply be dropped from the output.
>>>>> Bob will have an opinion.
>>>> 
>>>> I've removed those extra cgroup v1 specific metrics printed via
>>>> -XshowSettings:system. Not sure what to do with MetricsCgroupV1. It's
>>>> only used in tests in webrev 10. On the other hand the idea would be
>>>> for consumers to downcast it to MetricsCgroupV1 if they needed those
>>>> extra metrics.
>>>> 
>>>> Thanks,
>>>> Severin
>>>> 
>> 
> 



More information about the core-libs-dev mailing list