RFR: 8230848: OSContainer: Refactor container detection code

Tue Oct 1 18:36:39 UTC 2019

> On Oct 1, 2019, at 1:54 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> 
> Hi Bob,
> 
> Thanks for looking at this! I appreciate that.
> 
> On Tue, 2019-10-01 at 10:43 -0400, Bob Vandette wrote:
>> Severin,
>> 
>> I’d prefer if you just added cgroupv2 support with these types of changes all at the same time.
>> Having to look at two patches is just more work for nothing.
> 
> OK. I guess that's reviewer preference :) The reason why I did it that
> way is to keep the patch sizes smaller. Personally, I tend to find
> smaller, incremental patches easier to review than one gigantic patch.
> Do you want me to fold these three bugs/patches together and produce
> one big patch?

Others may disagree but I’d prefer one big patch since they are very tightly related.

> 
> no-op refactoring:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230848/03/webrev/
> 
> added cgroups v2:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230305/05/webrev/
> 
> add Metrics.java support for cgroups v2:
> <no patch as of yet>
> 
>> In terms of your factoring approach, I don’t see why you need to move so much logic out of os_linux.cpp.
> 
> I do find the code easier to follow if the exposed interfaces of
> os_container are minimal and kept more broad. To some extent the
> osContainer interface is a leaky abstraction. Example: after the
> refactoring we'd generally have (in pseudo code):
> 
> julong available_memory() {
>  if (isContainerized()) {
>    return container_available_memory();
>  }
>  return system_available_memory();
> }
> 
> over having the osContainer impl in osLinux directly:
> 
> julong available_memory() {
>  if (isContainerized()) {
>     mem_limit = container_mem_limit();
>     if (error) {
>        trace_container(...)
>     }
>     mem_usage = container_memory_usage();
>     if (error) {
>        trace_container(...)
>     }
>     if (mem_limit > 0 && mem_usage > 0) {
>        return mem_limit > mem_usage ? mem_limit - mem_usage : 0;
>     }
>  }
>  return system_available_memory();
> }
> 
> Perhaps that's just me. We should perhaps do a similar refactoring for
> physical memory…

I didn’t want to drag a lot of the logic from os_linux.cpp into the container implementation
and wanted to try to keep a separation of  pure os configuration and container configuration.
I also envisioned cases were there might be non cgroup containers that we’d need to support.
There’s already Kata containers but that one looks just like a pure os execution.

> 
>> The OSContainer interface should be used independent of which cgroup implementation is active.
> 
> Yes, agreed. I guess the disagreement is in the granularity of the
> interface functions.
> 
>> os_linux should use the existing OSContainer interface.
> 
> OK. If the OSContainer interface is set in stone I will change it back.
> 
> My concern was that it exposes too many functions for no good reason
> not needed by os_linux (the only consumer). Most functions are used by
> print_container_info() anyway, which could be kept more general.
> Perhaps I'm missing something?

The original intent was to provide a container metrics API that would be used by the VM startup,
JFR events and Management MBeans to provide more visibility into a container.
I’m still hopeful that we’ll eventually get the last two.  I’ve had discussions with the serviceability
team and there’s an interested party in doing the JFR events.

> 
>> OSContainer_linux should include the logic which selects cgroupv1 versus cgroupv2 and
>> calls through a common cgroupSubsystem_linux class.  This cgroupSubsystem_linux class
>> should provide common functions that both v1 and v2 use. 
> 
> Yes. If you look at the cgroups v2 patch, that's what is being
> proposed. CgroupSubsystemFactory::create() is being called from
> OsContainer_linux. The factory does the selection and returns the
> abstract cgroupSubsystem class (either v1 or v2):
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230305/05/webrev/
> 
>> I’d then expect to see cgroupV1Subsystem_linux.?pp and cgroupV2Subsystem_linux.?pp files which 
>> extends cgroupSubsystem and contain version specific functions.
> 
> Sure, I'd be happy to move things into version specific files if that's
> desired. Currently, I've kept them in cgroupSubsystem_linux.{c,h}pp. It
> shouldn't be hard to separeate them out. It makes sense to do it
> actually.
> 
> Thanks again for you input!

No, problem and thanks for doing this work.

BTW:  Can you tell me if there’s a .dockerenv file in the fedora/podman’s container / directory or is it a .podmanenv file?
I’m wondering if I can use the existence of these files as the triigger that we’re containerized.  This is for another bug.

Bob.

> 
> Cheers,
> Severin
> 
>> Bob.
>> 
>> 
>>> On Sep 19, 2019, at 10:20 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Please review this code refactoring which will help getting a cgroups
>>> v2 implementation for OSContainer implemented. The proposed changes are
>>> mentioned in the bug. In essence, after this patch only methods
>>> actually called in os_linux.cpp are exposed via OSContainer. Everything
>>> else remains an implementation detail. After this refactoring it should
>>> also be clearer what the actual bits implemented via cgroups v1 are in
>>> hotspot. Functionally there should be no difference before and after
>>> this patch.
>>> 
>>> If you are curious where I'm going with this, have a look at JDK-
>>> 8230305 which has a draft of an implementation of cgroups v2 building
>>> on top of this patch.
>>> 
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230848
>>> webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230848/03/webrev/
>>> 
>>> Testing: jdk-submit, tier1 tests on Linux x86_64, container tests
>>>        with docker and podman.
>>> 
>>> Thoughts?
>>> 
>>> Thanks,
>>> Severin
>>> 
>