[PING] RFR: 8231111: Cgroups v2: Rework Metrics in java.base so as to recognize unified hierarchy

Severin Gehwolf sgehwolf at redhat.com
Fri Dec 20 14:50:54 UTC 2019


Hi Bob,

Sorry for the delay to get this updated.

Changes done in this latest webrev:

   1. Rebased on top of 8226575: OperatingSystemMXBean should be made
      container aware. File read ops now use privileged code.
   2. Warning for mixed cgroup controllers and returning null for metrics.
   3. Added implementations for getBlkIOServiceCount() and
      getBlkIOServiced() using sum of rios/wios and rbytes/wbytes across
      devices. Added impl for getTcpMemoryUsage()
   4. Returning -2 for not supported (if the metric would return long).
      boolean metrics now return Boolean and null if not supported. Same
      for array return types. null if not supported for symmetry.
   5. -XshowSettings:system output has been updated to return "N/A" for
      when a metric is not available. E.g. "Kernel OOM killer enabled"
      Boolean.
   6. Tests have been updated to reflect this.

webrev:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06/webrev/
incremental webrev:
http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/06-incremental/webrev/

Testing: Docker tests and podman testing on cgroupsv2. I'll run it
through jdk/submit as well.

Hopefully this can get pushed with 8230305 early on in the jdk 15 cycle
:)

A couple of more inline comments below.

On Mon, 2019-12-02 at 12:13 -0500, Bob Vandette wrote:
> Sorry for the delay in responding.  See comments below:
> 
> > On Nov 29, 2019, at 4:05 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> > 
> > On Fri, 2019-11-15 at 17:51 +0100, Severin Gehwolf wrote:
> > > Hi,
> > > 
> > > Could I please get reviews of these core-libs, Linux-only, changes to
> > > the Metrics subsystem? This patch implements cgroupv2 support for
> > > Metrics which are currently cgroupv1-only. Fedora 31 switched to
> > > cgroupv2 by default so it's time to get OpenJDK recognize it.
> > > 
> > > Note that a couple of metrics are no longer supported with cgroupv2.
> > > Most notably (not an exhaustive list, though):
> > > 
> > > Metrics.getKernel*() family of methods.
> > > Metrics.getTcp*() family of methods.
> > > Metrics.getBlkIO*() family of methods.
> > > Metrics.isMemoryOOMKillEnabled()
> > > 
> > > A couple of open questions with regards to that:
> > > 
> > > 1)
> > > Most API docs of Metrics make no distiction between "unlimited" and
> > > "not supported", both returning -1 for longs, for example. This is a
> > > problem, because output of "java -XshowSettings:system -version" will
> > > not distinguish between unlimited and not supported metrics. Would it
> > > be acceptable to change the API to distinguish those cases so that
> > > LauncherHelper could display them appropriately?
> > > 
> > > 2)
> > > How should we deal with "not supported" for booleans/arrays, etc.?
> > > Would it make sense to return record objects from the Metrics API so
> > > that this could be dealt with? E.g.
> > > 
> > > Metrics m = ...
> > > MetricResult<int[]> result = m.getCpuSetCpus();
> > > switch(result.getType()) {
> > >   case NOT_SUPPORTED: /* do something */; break;
> > >   case SUPPORTED: int[] val = result.get(); break;
> > >   ...         
> > > }
> > > 
> > > I'm bringing this up, because the proposed patch doesn't deal with the
> > > above open questions as of yet. With that being said, it's mostly
> > > identical to the proposed hotspot changes in [1].
> 
> For issue 1 and 2 …
> 
> I wonder if we should change the API to throw UnsupportedOperationException for those methods 
> that are not available.  This exception is used quite a lot in the java/nio and java/net packages
> for dealing with functionality not available on a host platform.
> 
> As an alternate suggestion, we already return -2 for "not supported" for most APIs in the hotspot
> implementation.  We could use this for non boolean values in the Metrics API.  For boolean
> values, we could change the API to return “Boolean”.  A null return would signify not
> implemented.

This alternative has been implemented in the latest webrev.
LauncherHelper has been updated to print "N/A" if some property being
printed is not supported. Example output for cgroupsv2:

$ ./bin/java -XshowSettings:system -version
Operating System Metrics:
    Provider: cgroupv2
    Effective CPU Count: 4
    CPU Period: 100000us
    CPU Quota: -1
    CPU Shares: -1
    List of Processors: N/A
    List of Effective Processors: N/A
    List of Memory Nodes: N/A
    List of Available Memory Nodes: N/A
    CPUSet Memory Pressure Enabled: N/A
    Memory Limit: Unlimited
    Memory Soft Limit: Unlimited
    Memory & Swap Limit: Unlimited
    Kernel Memory Limit: N/A
    TCP Memory Limit: N/A
    Out Of Memory Killer Enabled: N/A

openjdk version "15-internal" 2020-09-15
OpenJDK Runtime Environment (build 15-internal+0-adhoc.sgehwolf.openjdk-head-2)
OpenJDK 64-Bit Server VM (build 15-internal+0-adhoc.sgehwolf.openjdk-head-2, mixed mode, sharing)

> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8231111
> > > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/04/webrev/
> > > 
> 
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/04/webrev/src/java.base/linux/classes/jdk/internal/platform/CgroupSubsystemFactory.java.html
> 
> Should we check to make sure that there are no mixed cgroupv1 and cgroupv2 mounted subsystems since
> you are not supporting mixing.

Done. null is returned for metrics and a warning printed to stderr.

> 
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/04/webrev/src/java.base/linux/classes/jdk/internal/platform/cgroupv2/CgroupV2Subsystem.java.html
> 
> It looks looks like there may be alternate ways of reporting some of the metrics that you have classified as RETVAL_NOT_SUPPORTED.
> 
> BlkIOServicedCount (io.stat)
> KernelMemory (memory.stat)
> TcpMemory (memory.stat)

The latest webrev has getBlkIOService* and getTcpMemoryUsage impls.
I've left out kernel memory metrics as it wasn't clear what this would
be. We can add that in a later patch. The size of this patch is already
rather big.

> You could try the same trick for getMemoryAndSwapMaxUsage which keeps track of the highest returned value.

We've decided to not do this. getMemoryAndSwapMaxUsage and
getMemoryMaxUsage is being returned as not supported in cgroups v2.

Thanks,
Severin

> for the benefit of other reviewers, you should provide links to the cgroupv1 and v2 docs.
> 
> https://www.kernel.org/doc/Documentation/cgroup-v2.txt
> 
> > > Testing: jdk/submit and platform docker tests on Linux x86_64 (with
> > > hybrid hierarchy, docker/podman) and on Linux x86_64 with unified
> > > hierarchy (podman only).
> > > 
> > > Thoughts? Suggestions?
> 
> Do you think we should check the docker version being used for the tests to make sure it
> supports cgroupv2?  I assume a fairly recent version is required to work with cgroupv2.
> 
> Bob.
> 
> 
> 
> > Ping?
> > 
> > > Thanks,
> > > Severin
> > > 
> > > [1] http://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039909.html



More information about the core-libs-dev mailing list