RFR: 8331560: Refactor Hotspot container detection code so that subsystem delegates to controllers [v3]
Severin Gehwolf
sgehwolf at openjdk.org
Thu Jun 13 15:41:17 UTC 2024
On Wed, 29 May 2024 17:42:13 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:
>> Please review this preparatory PR which is an enabler for some bugfix/alignment work (e.g. [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420)). The idea is to delegate limit lookup to controllers (`memory`, `cpu`, etc.) from the overarching `CgroupSubsystem` class. This way we can - once the cg type has been detected - "adjust" a controller's path to the limit files when the controller is being created (on init) and then left alone. In this case, the two preparatory entry points are `CgroupUtil::processor_count()` taking a version-agnostic `CgroupCpuController` to do the actual look-up and `CgroupMemoryController::read_memory_limit_in_bytes()` for the same in terms of memory limits.
>>
>> This enables setting the contoller's path to the interface files (wherever it might be in the hierarchy of it's original cgroup path), look up the limit and "freeze" the path once it found a - lower - limit.
>>
>> It also ensures that both cgroup versions return `-1` or `-2` (`OSCONTAINER_ERROR`) - in both cases negative - for some notion of unlimited. I.e. it makes the upper bound by the host's physical memory apparent in the version agnostic classes.
>>
>> Testing:
>> - [x] GHA
>> - [x] gtest:cgroupTest tests
>> - [x] container tests on Linux cgv1 and cgv2 on x86_64
>>
>> Thoughts?
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits:
>
> - Review feedback
> - Clean up
> - Appropriately handle version specific printing
> - Fix inheritance hierarchy
> - Fixup after merge
> - Merge branch 'master' into jdk-8331560-cgroup-controller-delegation
> - Resolve ambiguity with static_cast
> - Remove limit_from_str from util class
> - Merge branch 'jdk-8302744-cleanup-getcontainer-info' into jdk-8331560-cgroup-controller-delegation
> - Add proper comments for parsing utility functions
> - ... and 20 more: https://git.openjdk.org/jdk/compare/9b64ece5...e177a086
Thanks for the reviews!
> This code has (in general, not just this PR) a lot of unnecessary complexity. We can easily shave that complexity down, there's a lot of low hanging fruit here it seems. I'm seriously wondering if maybe we should get to these easier parts before we do another refactoring.
Please explain what exactly you mean by "unnecessary complexity". Happy to shave off some complexity if we *can*.
The point of this patch is that we want an API (version agnostic) so as to be able to determine the path to the interface files where the lowest limit is set (since cgroups is hierarchical). The path to the interface files might be different for the memory controller (as compared to the cpu controller, for example). So in order to be able to do that we need to have a way to adjust the `subsystem_path()`, determine the limit on that path and either continue or stop (if the limit is lower than the one previously seen). Then once we've found the correct paths for each controller use the code as we used to. [JDK-8333446](https://bugs.openjdk.org/browse/JDK-8333446) adds tests where this fails currently and shows the problem.
That's the point of this patch. It adds those API points for `cpu` (`CgroupUtil::processor_count()`) and `memory` (`CgroupMemoryController::read_memory_limit_in_bytes()`). Hope that makes it clearer.
Some of the other complexity is totally orthogonal to this patch. For example, caching a value for a short time rather than reading from the interface file. See [JDK-8232207](https://bugs.openjdk.org/browse/JDK-8232207) for details.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19085#issuecomment-2166033598
More information about the hotspot-runtime-dev
mailing list