RFR 8250984: Memory Docker tests fail on some Linux kernels w/o swap limit capabilities

Severin Gehwolf sgehwolf at redhat.com
Thu Sep 10 15:39:18 UTC 2020


On Thu, 2020-09-10 at 11:05 -0400, Bob Vandette wrote:
> Harold,
> 
> I prefer the second approach since it’s consistent with the original specification of the Metrics APIs.
> You should be able to check for the -2 case in OperatingSystemImpl.java after the limits are tested for
> > =0  in order to avoid adding any extra overhead.
> 
>  56     public long getTotalSwapSpaceSize() {
>   57         if (containerMetrics != null) {
>   58             long limit = containerMetrics.getMemoryAndSwapLimit();
> 
>   59             if (limit == CgroupSubsystem.LONG_RETVAL_NOT_SUPPORTED) { // not supported
>   60                 return CgroupSubsystem.LONG_RETVAL_NOT_SUPPORTED;
>   61             }
> 
>   62             // The memory limit metrics is not available if JVM runs on Linux host (not in a docker container)
>   63             // or if a docker container was started without specifying a memory limit (without '--memory='
>   64             // Docker option). In latter case there is no limit on how much memory the container can use and
>   65             // it can use as much memory as the host's OS allows.
>   66             long memLimit = containerMetrics.getMemoryLimit();
>   67             if (limit >= 0 && memLimit >= 0) {
>   68                 return limit - memLimit; // might potentially be 0 for limit == memLimit
>   69             }
> [HERE]
>   70         }
>   71         return getTotalSwapSpaceSize0();
>   72     }
> 

I agree. Option 2 seems the preferred one for me too. One additional
consideration would be whether or not there are other cases where
cgroup files are missing. IIRC cases for their existence are different
between cgroup v1 and cgroup v2.

Thanks,
Severin

> If you go down this path, please check through the other container & docker tests to see if there are other cases 
> where the specific message text is parsed.  I believe there was at least one another case of this.
> 
> Bob.
> 
> 
> > On Sep 10, 2020, at 10:52 AM, Harold Seigel <harold.seigel at oracle.com> wrote:
> > 
> > Hi Bob,
> > 
> > I came up with these ways to handle the test failures when swap limiting is disabled (JDK-8250984).  Please let me know if any of them sound viable.
> > 
> > One way is to add logging to CgroupSubsystemController when it fails to open a file such as .../memsw.linit_in_bytes.  The tests would enable logging and then look for these messages to determine if swap limiting was disabled.  This is yet another string for the tests to parse, but the JDK controls the contents of the strings, so there is less concern about them changing.  Here's a webrev showing this potential change:
> > 
> > http://cr.openjdk.java.net/~hseigel/bug_8250984.dkr.log/webrev/index.html
> > 
> > Another way is for methods such as CgroupSubsystemController.getLongEntry() to return a -2 status, indicating not-implemented, when it cannot access a file.  The callers of getLongEntry() could then decide whether or not to propagate that status back to their callers, or return some other default.  A partial webrev for that change is here:
> > 
> > http://cr.openjdk.java.net/~hseigel/bug_8250984.dkr.RetVal/webrev/index.html
> > 
> > We could also change methods such as CgroupV1Subsystem.getMemoryAndSwapLimit() to explicitly check for the existence of the files they want to read from and return -2 if the check fails.  This may have a performance impact?
> > 
> > Thanks, Harold
> > 
> > On 9/1/2020 12:04 PM, Bob Vandette wrote:
> > > I really dislike encoding all these strings in our tests that could possibly change.
> > > 
> > > I wish we did something like check for the existence of /sys/fs/cgroup/memory/memsw.limit_in_bytes
> > > assuming that this file is not present when swap limiting is disabled.  The problem with this approach
> > > and yours is that we need to make that these fixes we can run on docker, podman, cgroupv1 and cgroupv2.
> > > 
> > > Others are struggling with these types of issues …
> > > 
> > > 
> > > https://github.com/containers/podman/issues/6365
> > > 
> > > 
> > > The Metrics API I added provides for the possibility that the call to getMemoryAndSwapLimit
> > > could fail.  Perhaps the test should be checking for not supported and fix the API implementation
> > > to report the correct error (if it doesn’t already).
> > > 
> > >     /**
> > >      * Returns the maximum amount of physical memory and swap space,
> > >      * in bytes, that can be allocated in the Isolation Group.
> > >      *
> > >      * @return The maximum amount of memory in bytes or -1 if
> > >      *         there is no limit set or -2 if this metric is not supported.
> > >      *
> > >      */
> > >     public long getMemoryAndSwapLimit();
> > > 
> > > My .02$
> > > 
> > > Bob.
> > > 
> > > 
> > > > On Sep 1, 2020, at 11:31 AM, Harold Seigel <harold.seigel at oracle.com>
> > > >  wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > Please review this fix to enable docker tests TestMemoryAwareness.java and TestDockerMemoryMetrics.java to run on Linux kernels configured without swap limit capabilities.
> > > > 
> > > > Open Webrev: 
> > > > http://cr.openjdk.java.net/~hseigel/bug_8250984.dkr/webrev/index.html
> > > > 
> > > > 
> > > > JBS Bug: 
> > > > https://bugs.openjdk.java.net/browse/JDK-8250984
> > > > 
> > > > 
> > > > The modified tests were run on Linux kernels with and without swap limit capabilities.
> > > > 
> > > > Thanks, Harold
> > > > 
> > > > 



More information about the hotspot-runtime-dev mailing list