RFR: 8370572: Cgroups hierarchical memory limit is not honored after JDK-8322420 [v2]

Severin Gehwolf sgehwolf at openjdk.org
Wed Oct 29 09:40:03 UTC 2025


On Mon, 27 Oct 2025 17:59:23 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> See the bug for more discussion. 
>> 
>> We are seeing customer regressions in 21.0.9, notably on ECS Fargate. We root-caused it to [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420). That patch removed the handling of `hierarchical_memory_limit`, look at [this hunk](https://github.com/openjdk/jdk/commit/55a7cf14453b6cd1de91362927b2fa63cba400a1#diff-8910f554ed4a7bc465e01679328b3e9bd64ceaa6c85f00f0c575670e748ebba9L118-L131).
>> 
>> But at least cgroupv1 still needs them in some conditions, notably in ECS. There is a way to reproduce it with local Docker as well. The key is to set up host cgroup that would not be visible to the container, and so that the only way for container to know the memory limits would be to look into `hierarchical_*` values that kernel computes itself.
>> 
>> Unfortunately, it is not easy to revert the offending hunks from 21.0.9, as there were follow-up refactoring backports. So, to make it work, this PR reinstantiates the hunks using the new cgroups support code. It also makes code (subjectively) easier to read, and is in the spirit of past refactorings.
>> 
>> We are planning to pick this patch up to 21.0.9, at least into Corretto downstream as soon as possible to unbreak users. Therefore, the patch is also kept as crisp as possible.
>> 
>> I tried to come up with a regression test for it, but could not: local reproducers require amending _host_ configuration, which requires superuser privileges, among other hassle it introduces.
>> 
>> Additional testing:
>>  - [x] Reproducer with local Docker now passes
>>  - [x] Reproducer with ECS Fargate now passes
>>  - [x] Linux x86_64 server fastdebug, `containers/` passes on cgroupsv1 host
>>  - [x] Linux x86_64 server fastdebug, `containers/` passes on cgroupsv2 host
>
> Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Also no need to touch the other getter
>  - Whitespace

> I tried to come up with a regression test for it, but could not: local reproducers require amending _host_ configuration, which requires superuser privileges, among other hassle it introduces.

Without a proper regression test this is bound to fall through the cracks again. So are you sure this cannot be tested? It should be fine if the test needs root privileges (we could skip it if not root). But it would be better than not having one.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28006#issuecomment-3460594125


More information about the hotspot-runtime-dev mailing list