[PING] RFR: 8231111: Cgroups v2: Rework Metrics in java.base so as to recognize unified hierarchy

Severin Gehwolf sgehwolf at redhat.com
Tue Feb 11 18:04:23 UTC 2020


Hi Mandy, Bob,

Thanks again for the reviews and patience on this. Sorry it took me so
long to get back to this :-/

Updated webrev:
Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/webrev/
incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/incremental/webrev/

I've tested this with docker tests on cgroup v1 and via podman on a
cgroup v2 system. They pass. I'll be running this through jdk-submit as
well.

More below.

On Tue, 2020-01-21 at 16:09 -0800, Mandy Chung wrote:
> Hi Severin,
> 
> Thanks for the update.
> 
> On 1/21/20 11:30 AM, Severin Gehwolf wrote:
> > Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/webrev/
> > incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/incremental/webrev/
> > 
>  
> I have answered my own question.   Most of the metrics used to return 0 if unavailable due to IOException reading a file or malformed content and now are changed to return -2 due to error fetching the metric.
> 
> The following are about limits which used to return -1 if unlimited or no limit set.
>     public long getCpuQuota();
>     public long getCpuShares();
>     public long getMemoryLimit();
>     public long getMemoryAndSwapLimit();
>     public long getMemorySoftLimit();
> 
> With this patch, only getMemoryLimit and getMemoryAndSwapLimit specify to return -1 if unlimited or no limit set.  However the implementation does return -1.  All of the above specify to return -2 if unavailable due to error fetching the metric. 
> 
> I found the implementation quite hard to follow.  I spent some time reviewing the code to see if the implementation matches the spec  but I can't easily tell yet.  For example,
> CgroupSubsystemController::getLongValueMatchingLine returns -1 when IOException occurs.
> CgroupSubsystemController::getLongEntry returns 0L if IOException occurs.
> 
> CgroupV1SubsystemController::convertStringToLong returns Long.MAX_VALUE
> when value overflows

This one is intentional. It's mapped back to unlimited via
longValOrUnlimited(). The reason for this is that cgroup v1 doesn't
have a concept of "unlimited". Unlimited values will be a very large
numbers in cgroup v1 files.

> CgroupV2SubsystemController::convertStringToLong returns -1 when IOException occurs
> 
> CgroupV1Subsystem::getCpuShares return -1 if cpu.shares == 0 or 1024
> CgroupV2Subsystem::getCpuShares returns -1 if cpu.weight == 100 or 0

These two are special cases too. See the implementation note of
Metrics.getCpuShares(). In the cgroup v2 case the default value is 100
(over 1024 in cgroup v1). That's why unlimited is being returned for
those values.

> CgroupV2Subsystem::getFromCpuMax returns -1 if error in reading cpu.max or malformed content
> CgroupV2Subsystem::sumTokensIOStat returns -2 if IOException error
>    This is called by getBlkIOServiceCount and getBlkIOServiced
> 
> I think this can be improved and add the documentation to describe
> what the methods do.  Since Metrics APIs consistently return -2 if 
> unavailable due to error in fetching the metric, why some utility
> methods in *Subsystem and *SubsystemController return -1 upon error
> and 0 when unlimited?
> 
> I suspect if the getXXXValue and other methods are clearly documented
> with the error cases (possibly renaming the method name if appropriate)
> CgroupV1Subsystem and CgroupV2SubSystem will become very explicit 
> to understand.

This should be fixed now.

I've gone through the API doc of Metrics.java and have updated it. In
general, I've updated it to return -1 if metric is unavailable (due to
error in reading some files or content being empty), and -2 if not
supported. No method returns -2 currently, but it might change and it's
good to have some way of saying "not implementable" for this subsystem
in the spec. That's my take on it anyway.

There is also a new unit test for shared controller logic:
TestCgroupSubsystemController.java

It execises various cases of error/success.

That is to ensure proper symmetry across the various cases (including
IOException). I've also documented static methods in
CgroupSubsystemController. Overall, all methods now return the same
values for cgroup v1 and cgroup v2 (given the impl nuances) for the
various cases.

> 
> CgroupSubsystem.java
> 
>   44     public static final double DOUBLE_RETVAL_NOT_SUPPORTED = LONG_RETVAL_NOT_SUPPORTED;
>   49     public static final Boolean BOOL_RETVAL_NOT_SUPPORTED = null;
> 
> They are no longer needed, right?

Removed.

> 
> CgroupSubsystemFactory.java
> 
>   89             System.err.println("Warning: Mixed cgroupv1 and cgroupv2 not supported. Metrics disabled.");
> 
> 
> I expect this be a System.Logger log 

Updated.

>  114                     if (!Integer.valueOf(0).toString().equals(tokens[0])) {
> 
>  This can be simplified to if (!"0".equals(tokens[0]))

Done, thanks!

> LauncherHelper.java
> 
>  407         // Extended cgroupv1 specific metrics
>  408         if (c instanceof MetricsCgroupV1) {
>  409             MetricsCgroupV1 cgroupV1 = (MetricsCgroupV1)c;
>  410             limit = cgroupV1.getKernelMemoryLimit();
>  411             ostream.println(formatLimitString(limit, INDENT + "Kernel Memory Limit: "));
>  412             limit = cgroupV1.getTcpMemoryLimit();
>  413             ostream.println(formatLimitString(limit, INDENT + "TCP Memory Limit: "));
>  414             Boolean value = cgroupV1.isMemoryOOMKillEnabled();
>  415             ostream.println(formatBoolean(value, INDENT + "Out Of Memory Killer Enabled: "));
>  416             value = cgroupV1.isCpuSetMemoryPressureEnabled();
>  417             ostream.println(formatBoolean(value, INDENT + "CPUSet Memory Pressure Enabled: "));
>  418         }
> 
> MetricsCgroupV1 is linux-only.  It will fail the compilation when
> building on non-linux.  One option is to move this code to 
>    src/java.base/linux/share/sun/launcher/CgroupMetrics.java
> 
> Are they continued to be interesting metrics to be output from
> -XshowSetting?  I wonder if they can simply be dropped from the output.
> Bob will have an opinion.

I've removed those extra cgroup v1 specific metrics printed via
-XshowSettings:system. Not sure what to do with MetricsCgroupV1. It's
only used in tests in webrev 10. On the other hand the idea would be
for consumers to downcast it to MetricsCgroupV1 if they needed those
extra metrics.

Thanks,
Severin



More information about the core-libs-dev mailing list