[PING] RFR: 8231111: Cgroups v2: Rework Metrics in java.base so as to recognize unified hierarchy

Bob Vandette bob.vandette at oracle.com
Fri Jan 10 16:50:25 UTC 2020


Severin,

The changes look ok to me.  I assume you’ve run the container tests on a cgroupv2 and cgroupv1 enabled system, right?

You’re going to have to find a “R” reviewer.

What’s the status of the hotspot cgroupv2 changes.  Have these been reviewed?

Bob.


> On Jan 9, 2020, at 2:51 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> 
> Hi,
> 
> On Fri, 2020-01-03 at 15:37 -0500, Bob Vandette wrote:
>> Here are a few comments from scanning the webrev.
>> 
>> 
>> It looks like you can remove line 151
>> 
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06-incremental/webrev/src/java.base/linux/classes/jdk/internal/platform/CgroupSubsystemController.java.sdiff.html
>> 
>> 151         int[] ints = new int[0];
>> 
>> —————
>> 
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06-incremental/webrev/src/java.base/share/classes/jdk/internal/platform/Metrics.java.sdiff.html
>> 
>> There are several comments stating that -2 == unlimited.  This is not
>> the case.
>> 
>> @return count of elapsed periods or -2 if the quota is unlimited.
>> 
>> —————
>> 
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06/webrev/test/jdk/jdk/internal/platform/docker/TestDockerMemoryMetrics.java.sdiff.html
>> 
>> Shouldn’t you just check for cgroupv2 before trying to run the
>> testKernelMemoryLimit and testOomKillFlag tests?
>> 
>> —————
>> 
>> There are a few places where getPerCpuUsage is returning “new
>> long[0]” but you changed the interface to expect null
>> for not supported.
>> 
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06/webrev/src/java.base/linux/classes/jdk/internal/platform/cgroupv2/CgroupV2Subsystem.java.html
>> 
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06/webrev/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1Subsystem.java.html
>> 
>> You probably need to check all users of the APIs which used to return
>> array[0] which now return null to make sure you
>> don’t get null pointer exceptions.
>> 
>> One example is testCpuConsumption.
>> 
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06/webrev/test/lib/jdk/test/lib/containers/cgroup/MetricsTesterCgroupV1.java.html
>> 
>> Also, 
>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06/webrev/test/lib/jdk/test/lib/containers/cgroup/MetricsTesterCgroupV2.java.html:167
>> 
>> 
>> You’ll also have to update copyrights to 2020.
> 
> Thanks for the review! Should all be fixed now. Updated webrev:
> 
> incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/07/incremental/webrev/
> full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/07/webrev/
> 
> Note: I'll go through touched files and update copyright dates to 2020.
> Not all have been updated in the full webrev. It'll be done though.
> 
> Thanks,
> Severin
> 
>> Bob.
>> 
>> 
>>> On Dec 20, 2019, at 9:50 AM, Severin Gehwolf <sgehwolf at redhat.com>
>>> wrote:
>>> 
>>> Hi Bob,
>>> 
>>> Sorry for the delay to get this updated.
>>> 
>>> Changes done in this latest webrev:
>>> 
>>>  1. Rebased on top of 8226575: OperatingSystemMXBean should be
>>> made
>>>     container aware. File read ops now use privileged code.
>>>  2. Warning for mixed cgroup controllers and returning null for
>>> metrics.
>>>  3. Added implementations for getBlkIOServiceCount() and
>>>     getBlkIOServiced() using sum of rios/wios and rbytes/wbytes
>>> across
>>>     devices. Added impl for getTcpMemoryUsage()
>>>  4. Returning -2 for not supported (if the metric would return
>>> long).
>>>     boolean metrics now return Boolean and null if not supported.
>>> Same
>>>     for array return types. null if not supported for symmetry.
>>>  5. -XshowSettings:system output has been updated to return "N/A"
>>> for
>>>     when a metric is not available. E.g. "Kernel OOM killer
>>> enabled"
>>>     Boolean.
>>>  6. Tests have been updated to reflect this.
>>> 
>>> webrev:
>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06/webrev/
>>> incremental webrev:
>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06-incremental/webrev/
>>> 
>>> Testing: Docker tests and podman testing on cgroupsv2. I'll run it
>>> through jdk/submit as well.
>>> 
>>> Hopefully this can get pushed with 8230305 early on in the jdk 15
>>> cycle
>>> :)
>>> 
>>> A couple of more inline comments below.
>>> 
>>> On Mon, 2019-12-02 at 12:13 -0500, Bob Vandette wrote:
>>>> Sorry for the delay in responding.  See comments below:
>>>> 
>>>>> On Nov 29, 2019, at 4:05 AM, Severin Gehwolf <
>>>>> sgehwolf at redhat.com> wrote:
>>>>> 
>>>>> On Fri, 2019-11-15 at 17:51 +0100, Severin Gehwolf wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Could I please get reviews of these core-libs, Linux-only,
>>>>>> changes to
>>>>>> the Metrics subsystem? This patch implements cgroupv2 support
>>>>>> for
>>>>>> Metrics which are currently cgroupv1-only. Fedora 31 switched
>>>>>> to
>>>>>> cgroupv2 by default so it's time to get OpenJDK recognize it.
>>>>>> 
>>>>>> Note that a couple of metrics are no longer supported with
>>>>>> cgroupv2.
>>>>>> Most notably (not an exhaustive list, though):
>>>>>> 
>>>>>> Metrics.getKernel*() family of methods.
>>>>>> Metrics.getTcp*() family of methods.
>>>>>> Metrics.getBlkIO*() family of methods.
>>>>>> Metrics.isMemoryOOMKillEnabled()
>>>>>> 
>>>>>> A couple of open questions with regards to that:
>>>>>> 
>>>>>> 1)
>>>>>> Most API docs of Metrics make no distiction between
>>>>>> "unlimited" and
>>>>>> "not supported", both returning -1 for longs, for example.
>>>>>> This is a
>>>>>> problem, because output of "java -XshowSettings:system
>>>>>> -version" will
>>>>>> not distinguish between unlimited and not supported metrics.
>>>>>> Would it
>>>>>> be acceptable to change the API to distinguish those cases so
>>>>>> that
>>>>>> LauncherHelper could display them appropriately?
>>>>>> 
>>>>>> 2)
>>>>>> How should we deal with "not supported" for booleans/arrays,
>>>>>> etc.?
>>>>>> Would it make sense to return record objects from the Metrics
>>>>>> API so
>>>>>> that this could be dealt with? E.g.
>>>>>> 
>>>>>> Metrics m = ...
>>>>>> MetricResult<int[]> result = m.getCpuSetCpus();
>>>>>> switch(result.getType()) {
>>>>>> case NOT_SUPPORTED: /* do something */; break;
>>>>>> case SUPPORTED: int[] val = result.get(); break;
>>>>>> ...         
>>>>>> }
>>>>>> 
>>>>>> I'm bringing this up, because the proposed patch doesn't deal
>>>>>> with the
>>>>>> above open questions as of yet. With that being said, it's
>>>>>> mostly
>>>>>> identical to the proposed hotspot changes in [1].
>>>> 
>>>> For issue 1 and 2 …
>>>> 
>>>> I wonder if we should change the API to throw
>>>> UnsupportedOperationException for those methods 
>>>> that are not available.  This exception is used quite a lot in
>>>> the java/nio and java/net packages
>>>> for dealing with functionality not available on a host platform.
>>>> 
>>>> As an alternate suggestion, we already return -2 for "not
>>>> supported" for most APIs in the hotspot
>>>> implementation.  We could use this for non boolean values in the
>>>> Metrics API.  For boolean
>>>> values, we could change the API to return “Boolean”.  A null
>>>> return would signify not
>>>> implemented.
>>> 
>>> This alternative has been implemented in the latest webrev.
>>> LauncherHelper has been updated to print "N/A" if some property
>>> being
>>> printed is not supported. Example output for cgroupsv2:
>>> 
>>> $ ./bin/java -XshowSettings:system -version
>>> Operating System Metrics:
>>>   Provider: cgroupv2
>>>   Effective CPU Count: 4
>>>   CPU Period: 100000us
>>>   CPU Quota: -1
>>>   CPU Shares: -1
>>>   List of Processors: N/A
>>>   List of Effective Processors: N/A
>>>   List of Memory Nodes: N/A
>>>   List of Available Memory Nodes: N/A
>>>   CPUSet Memory Pressure Enabled: N/A
>>>   Memory Limit: Unlimited
>>>   Memory Soft Limit: Unlimited
>>>   Memory & Swap Limit: Unlimited
>>>   Kernel Memory Limit: N/A
>>>   TCP Memory Limit: N/A
>>>   Out Of Memory Killer Enabled: N/A
>>> 
>>> openjdk version "15-internal" 2020-09-15
>>> OpenJDK Runtime Environment (build 15-internal+0-
>>> adhoc.sgehwolf.openjdk-head-2)
>>> OpenJDK 64-Bit Server VM (build 15-internal+0-
>>> adhoc.sgehwolf.openjdk-head-2, mixed mode, sharing)
>>> 
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231111
>>>>>> webrev: 
>>>>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/04/webrev/
>>>>>> 
>>>> 
>>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/04/webrev/src/java.base/linux/classes/jdk/internal/platform/CgroupSubsystemFactory.java.html
>>>> 
>>>> Should we check to make sure that there are no mixed cgroupv1 and
>>>> cgroupv2 mounted subsystems since
>>>> you are not supporting mixing.
>>> 
>>> Done. null is returned for metrics and a warning printed to stderr.
>>> 
>>>> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/04/webrev/src/java.base/linux/classes/jdk/internal/platform/cgroupv2/CgroupV2Subsystem.java.html
>>>> 
>>>> It looks looks like there may be alternate ways of reporting some
>>>> of the metrics that you have classified as RETVAL_NOT_SUPPORTED.
>>>> 
>>>> BlkIOServicedCount (io.stat)
>>>> KernelMemory (memory.stat)
>>>> TcpMemory (memory.stat)
>>> 
>>> The latest webrev has getBlkIOService* and getTcpMemoryUsage impls.
>>> I've left out kernel memory metrics as it wasn't clear what this
>>> would
>>> be. We can add that in a later patch. The size of this patch is
>>> already
>>> rather big.
>>> 
>>>> You could try the same trick for getMemoryAndSwapMaxUsage which
>>>> keeps track of the highest returned value.
>>> 
>>> We've decided to not do this. getMemoryAndSwapMaxUsage and
>>> getMemoryMaxUsage is being returned as not supported in cgroups v2.
>>> 
>>> Thanks,
>>> Severin
>>> 
>>>> for the benefit of other reviewers, you should provide links to
>>>> the cgroupv1 and v2 docs.
>>>> 
>>>> https://www.kernel.org/doc/Documentation/cgroup-v2.txt
>>>> 
>>>>>> Testing: jdk/submit and platform docker tests on Linux x86_64
>>>>>> (with
>>>>>> hybrid hierarchy, docker/podman) and on Linux x86_64 with
>>>>>> unified
>>>>>> hierarchy (podman only).
>>>>>> 
>>>>>> Thoughts? Suggestions?
>>>> 
>>>> Do you think we should check the docker version being used for
>>>> the tests to make sure it
>>>> supports cgroupv2?  I assume a fairly recent version is required
>>>> to work with cgroupv2.
>>>> 
>>>> Bob.
>>>> 
>>>> 
>>>> 
>>>>> Ping?
>>>>> 
>>>>>> Thanks,
>>>>>> Severin
>>>>>> 
>>>>>> [1] 
>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039909.html
> 



More information about the core-libs-dev mailing list