RFR: 8230848: OSContainer: Refactor container detection code
Bob Vandette
bob.vandette at oracle.com
Tue Oct 1 18:36:39 UTC 2019
> On Oct 1, 2019, at 1:54 PM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>
> Hi Bob,
>
> Thanks for looking at this! I appreciate that.
>
> On Tue, 2019-10-01 at 10:43 -0400, Bob Vandette wrote:
>> Severin,
>>
>> I’d prefer if you just added cgroupv2 support with these types of changes all at the same time.
>> Having to look at two patches is just more work for nothing.
>
> OK. I guess that's reviewer preference :) The reason why I did it that
> way is to keep the patch sizes smaller. Personally, I tend to find
> smaller, incremental patches easier to review than one gigantic patch.
> Do you want me to fold these three bugs/patches together and produce
> one big patch?
Others may disagree but I’d prefer one big patch since they are very tightly related.
>
> no-op refactoring:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230848/03/webrev/
>
> added cgroups v2:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230305/05/webrev/
>
> add Metrics.java support for cgroups v2:
> <no patch as of yet>
>
>> In terms of your factoring approach, I don’t see why you need to move so much logic out of os_linux.cpp.
>
> I do find the code easier to follow if the exposed interfaces of
> os_container are minimal and kept more broad. To some extent the
> osContainer interface is a leaky abstraction. Example: after the
> refactoring we'd generally have (in pseudo code):
>
> julong available_memory() {
> if (isContainerized()) {
> return container_available_memory();
> }
> return system_available_memory();
> }
>
> over having the osContainer impl in osLinux directly:
>
> julong available_memory() {
> if (isContainerized()) {
> mem_limit = container_mem_limit();
> if (error) {
> trace_container(...)
> }
> mem_usage = container_memory_usage();
> if (error) {
> trace_container(...)
> }
> if (mem_limit > 0 && mem_usage > 0) {
> return mem_limit > mem_usage ? mem_limit - mem_usage : 0;
> }
> }
> return system_available_memory();
> }
>
> Perhaps that's just me. We should perhaps do a similar refactoring for
> physical memory…
I didn’t want to drag a lot of the logic from os_linux.cpp into the container implementation
and wanted to try to keep a separation of pure os configuration and container configuration.
I also envisioned cases were there might be non cgroup containers that we’d need to support.
There’s already Kata containers but that one looks just like a pure os execution.
>
>> The OSContainer interface should be used independent of which cgroup implementation is active.
>
> Yes, agreed. I guess the disagreement is in the granularity of the
> interface functions.
>
>> os_linux should use the existing OSContainer interface.
>
> OK. If the OSContainer interface is set in stone I will change it back.
>
> My concern was that it exposes too many functions for no good reason
> not needed by os_linux (the only consumer). Most functions are used by
> print_container_info() anyway, which could be kept more general.
> Perhaps I'm missing something?
The original intent was to provide a container metrics API that would be used by the VM startup,
JFR events and Management MBeans to provide more visibility into a container.
I’m still hopeful that we’ll eventually get the last two. I’ve had discussions with the serviceability
team and there’s an interested party in doing the JFR events.
>
>> OSContainer_linux should include the logic which selects cgroupv1 versus cgroupv2 and
>> calls through a common cgroupSubsystem_linux class. This cgroupSubsystem_linux class
>> should provide common functions that both v1 and v2 use.
>
> Yes. If you look at the cgroups v2 patch, that's what is being
> proposed. CgroupSubsystemFactory::create() is being called from
> OsContainer_linux. The factory does the selection and returns the
> abstract cgroupSubsystem class (either v1 or v2):
> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230305/05/webrev/
>
>> I’d then expect to see cgroupV1Subsystem_linux.?pp and cgroupV2Subsystem_linux.?pp files which
>> extends cgroupSubsystem and contain version specific functions.
>
> Sure, I'd be happy to move things into version specific files if that's
> desired. Currently, I've kept them in cgroupSubsystem_linux.{c,h}pp. It
> shouldn't be hard to separeate them out. It makes sense to do it
> actually.
>
> Thanks again for you input!
No, problem and thanks for doing this work.
BTW: Can you tell me if there’s a .dockerenv file in the fedora/podman’s container / directory or is it a .podmanenv file?
I’m wondering if I can use the existence of these files as the triigger that we’re containerized. This is for another bug.
Bob.
>
> Cheers,
> Severin
>
>> Bob.
>>
>>
>>> On Sep 19, 2019, at 10:20 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> Please review this code refactoring which will help getting a cgroups
>>> v2 implementation for OSContainer implemented. The proposed changes are
>>> mentioned in the bug. In essence, after this patch only methods
>>> actually called in os_linux.cpp are exposed via OSContainer. Everything
>>> else remains an implementation detail. After this refactoring it should
>>> also be clearer what the actual bits implemented via cgroups v1 are in
>>> hotspot. Functionally there should be no difference before and after
>>> this patch.
>>>
>>> If you are curious where I'm going with this, have a look at JDK-
>>> 8230305 which has a draft of an implementation of cgroups v2 building
>>> on top of this patch.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8230848
>>> webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8230848/03/webrev/
>>>
>>> Testing: jdk-submit, tier1 tests on Linux x86_64, container tests
>>> with docker and podman.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Severin
>>>
>
More information about the hotspot-dev
mailing list