RFR: 8322420: [Linux] cgroup v2: Limits in parent nested control groups are not detected [v2]
Severin Gehwolf
sgehwolf at openjdk.org
Wed Sep 4 10:18:22 UTC 2024
On Thu, 29 Aug 2024 16:20:21 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:
>>> I am not aware of actual scenarios like that, nor I see much reason to do something like that, even though possible. :)
>>
>> Having slept over this, I now feel we'd regress cgv1 if we don't do it. Given your suggested test, I'll play with covering this case and adding that to the testing framework in #19530. Then based on the additional complexity we can decide whether or not to cover it or not.
>
>> > I am not aware of actual scenarios like that, nor I see much reason to do something like that, even though possible. :)
>>
>> Having slept over this, I now feel we'd regress cgv1 if we don't do it. Given your suggested test, I'll play with covering this case and adding that to the testing framework in #19530. Then based on the additional complexity we can decide whether or not to cover it or not.
>
> https://github.com/openjdk/jdk/pull/20646/commits/0e17c9914f187d9ee1af32674f03b06e8173bc18 implements this. PTAL.
> @jerboaa IIUC this patch sets the memory subsystem path to correspond to the cgroup with the lowest memory limit in the hierarchy. One implication of this approach is that all the memory stats would now be retrieved from that cgroup. So now the swap limit will also be read from that cgroup even though the JVM's cgroup may have a lower value. Is this the desired behavior?
@ashu-mehra You mean some setup like this (assume cgroup v2)?
1. `foo.slice.d/memor-limit.conf` has `MemoryMax=200M`
2. `foo-mem.slice` has `MemoryMax=400M` *and* `MemorySwapMax=0`
That is, we have a memory controller path to `/sys/fs/cgroup/foo.slice` since the lowest memory limit is `200M`, yet `/sys/fs/cgroup/foo.slice/memory.swap.max` would be `max` and `/sys/fs/cgroup/foo.slice/foo-mem.slice/memory.swap.max` would be `0`. I.e. the kernel doesn't allow swap, while the JVM thinks it has unlimited swap. Yes, that's possible and a good thought experiment. So thanks!
With that said, this doesn't seem to be a problem for the common k8s container case where we have a much simpler setup. There, the path to the memory limit and swap limit is always the same (at least AFAIK). Also, as far as the JVM is concerned it uses `os::physical_memory()` for sizing the heap (it's not using swap metrics for anything significant). `os::physical_memory()` in-turn would use the cgroups code to figure out the memory limit and use that over the host physical memory. I.e. it would still be OK in such a case. The reporting for the swap limit would be incorrect, though.
Same for the CPU count code which is used by various bits by the JVM (GC, JIT, etc.). The symmetry there doesn't exist. At least I don't see any other - non-critical - metric we expose using the adjusted cpu path.
IMO the important bits are the memory limit and the cpu count. The rest isn't that critical. So should we do something about it? Possibly. Happy to file a bug with this limitation documented. We could then address it if it turns out to become a real problem. Right, now, it strikes me as unusual enough for it to be OK as it is. What do you think?
A possible fix would be to always walk the full hierarchy for any metric we look up, but I have the feeling this is You Ain't Gonna Need It (YAGNI) enough to not implement it right away. FWIW, I wasn't able to re-create such a config on cg v1 exposing this swap issue. Were you?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20646#issuecomment-2328467306
More information about the hotspot-runtime-dev
mailing list