[PING] RFR: 8231111: Cgroups v2: Rework Metrics in java.base so as to recognize unified hierarchy

Mandy Chung mandy.chung at oracle.com
Wed Feb 12 19:32:00 UTC 2020


I'll take a look next couple days.  I was out last few days and am 
catching up on other things.

Mandy

On 2/11/20 10:04 AM, Severin Gehwolf wrote:
> Hi Mandy, Bob,
>
> Thanks again for the reviews and patience on this. Sorry it took me so
> long to get back to this :-/
>
> Updated webrev:
> Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/webrev/
> incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/10/incremental/webrev/
>
> I've tested this with docker tests on cgroup v1 and via podman on a
> cgroup v2 system. They pass. I'll be running this through jdk-submit as
> well.
>
> More below.
>
> On Tue, 2020-01-21 at 16:09 -0800, Mandy Chung wrote:
>> Hi Severin,
>>
>> Thanks for the update.
>>
>> On 1/21/20 11:30 AM, Severin Gehwolf wrote:
>>> Full: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/webrev/
>>> incremental: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8231111/09/incremental/webrev/
>>>
>>   
>> I have answered my own question.   Most of the metrics used to return 0 if unavailable due to IOException reading a file or malformed content and now are changed to return -2 due to error fetching the metric.
>>
>> The following are about limits which used to return -1 if unlimited or no limit set.
>>      public long getCpuQuota();
>>      public long getCpuShares();
>>      public long getMemoryLimit();
>>      public long getMemoryAndSwapLimit();
>>      public long getMemorySoftLimit();
>>
>> With this patch, only getMemoryLimit and getMemoryAndSwapLimit specify to return -1 if unlimited or no limit set.  However the implementation does return -1.  All of the above specify to return -2 if unavailable due to error fetching the metric.
>>
>> I found the implementation quite hard to follow.  I spent some time reviewing the code to see if the implementation matches the spec  but I can't easily tell yet.  For example,
>> CgroupSubsystemController::getLongValueMatchingLine returns -1 when IOException occurs.
>> CgroupSubsystemController::getLongEntry returns 0L if IOException occurs.
>>
>> CgroupV1SubsystemController::convertStringToLong returns Long.MAX_VALUE
>> when value overflows
> This one is intentional. It's mapped back to unlimited via
> longValOrUnlimited(). The reason for this is that cgroup v1 doesn't
> have a concept of "unlimited". Unlimited values will be a very large
> numbers in cgroup v1 files.
>
>> CgroupV2SubsystemController::convertStringToLong returns -1 when IOException occurs
>>
>> CgroupV1Subsystem::getCpuShares return -1 if cpu.shares == 0 or 1024
>> CgroupV2Subsystem::getCpuShares returns -1 if cpu.weight == 100 or 0
> These two are special cases too. See the implementation note of
> Metrics.getCpuShares(). In the cgroup v2 case the default value is 100
> (over 1024 in cgroup v1). That's why unlimited is being returned for
> those values.
>
>> CgroupV2Subsystem::getFromCpuMax returns -1 if error in reading cpu.max or malformed content
>> CgroupV2Subsystem::sumTokensIOStat returns -2 if IOException error
>>     This is called by getBlkIOServiceCount and getBlkIOServiced
>>
>> I think this can be improved and add the documentation to describe
>> what the methods do.  Since Metrics APIs consistently return -2 if
>> unavailable due to error in fetching the metric, why some utility
>> methods in *Subsystem and *SubsystemController return -1 upon error
>> and 0 when unlimited?
>>
>> I suspect if the getXXXValue and other methods are clearly documented
>> with the error cases (possibly renaming the method name if appropriate)
>> CgroupV1Subsystem and CgroupV2SubSystem will become very explicit
>> to understand.
> This should be fixed now.
>
> I've gone through the API doc of Metrics.java and have updated it. In
> general, I've updated it to return -1 if metric is unavailable (due to
> error in reading some files or content being empty), and -2 if not
> supported. No method returns -2 currently, but it might change and it's
> good to have some way of saying "not implementable" for this subsystem
> in the spec. That's my take on it anyway.
>
> There is also a new unit test for shared controller logic:
> TestCgroupSubsystemController.java
>
> It execises various cases of error/success.
>
> That is to ensure proper symmetry across the various cases (including
> IOException). I've also documented static methods in
> CgroupSubsystemController. Overall, all methods now return the same
> values for cgroup v1 and cgroup v2 (given the impl nuances) for the
> various cases.
>
>> CgroupSubsystem.java
>>
>>    44     public static final double DOUBLE_RETVAL_NOT_SUPPORTED = LONG_RETVAL_NOT_SUPPORTED;
>>    49     public static final Boolean BOOL_RETVAL_NOT_SUPPORTED = null;
>>
>> They are no longer needed, right?
> Removed.
>
>> CgroupSubsystemFactory.java
>>
>>    89             System.err.println("Warning: Mixed cgroupv1 and cgroupv2 not supported. Metrics disabled.");
>>
>>
>> I expect this be a System.Logger log
> Updated.
>
>>   114                     if (!Integer.valueOf(0).toString().equals(tokens[0])) {
>>
>>   This can be simplified to if (!"0".equals(tokens[0]))
> Done, thanks!
>
>> LauncherHelper.java
>>
>>   407         // Extended cgroupv1 specific metrics
>>   408         if (c instanceof MetricsCgroupV1) {
>>   409             MetricsCgroupV1 cgroupV1 = (MetricsCgroupV1)c;
>>   410             limit = cgroupV1.getKernelMemoryLimit();
>>   411             ostream.println(formatLimitString(limit, INDENT + "Kernel Memory Limit: "));
>>   412             limit = cgroupV1.getTcpMemoryLimit();
>>   413             ostream.println(formatLimitString(limit, INDENT + "TCP Memory Limit: "));
>>   414             Boolean value = cgroupV1.isMemoryOOMKillEnabled();
>>   415             ostream.println(formatBoolean(value, INDENT + "Out Of Memory Killer Enabled: "));
>>   416             value = cgroupV1.isCpuSetMemoryPressureEnabled();
>>   417             ostream.println(formatBoolean(value, INDENT + "CPUSet Memory Pressure Enabled: "));
>>   418         }
>>
>> MetricsCgroupV1 is linux-only.  It will fail the compilation when
>> building on non-linux.  One option is to move this code to
>>     src/java.base/linux/share/sun/launcher/CgroupMetrics.java
>>
>> Are they continued to be interesting metrics to be output from
>> -XshowSetting?  I wonder if they can simply be dropped from the output.
>> Bob will have an opinion.
> I've removed those extra cgroup v1 specific metrics printed via
> -XshowSettings:system. Not sure what to do with MetricsCgroupV1. It's
> only used in tests in webrev 10. On the other hand the idea would be
> for consumers to downcast it to MetricsCgroupV1 if they needed those
> extra metrics.
>
> Thanks,
> Severin
>



More information about the core-libs-dev mailing list