RFR: 8300645: Revert julong changes from 8194232 after 8292083
Severin Gehwolf
sgehwolf at openjdk.org
Mon Feb 13 19:39:28 UTC 2023
On Mon, 13 Feb 2023 18:24:50 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:
>> I think the `assert(memswlimit > 0, "invariant");` doesn't give a clear explanation why this is the case. And seeing multiple of this kind of assert in the same file is problematic.
>>
>> Is it possible for `/memory.limit_in_bytes` to use a number that's larger than `max_jlong`? If so, `GET_CONTAINER_INFO(jlong, ... JLONG_FORMAT)` will fail and this function will return `OSCONTAINER_ERROR`. Previously, this function would return -1.
>>
>> (the fact that `OSCONTAINER_ERROR` (-2) can be returned by this function is also problematic).
>>
>> I actually think the old code is better. It can be further tightened by changing the blind (and potentially unsafe) cast
>>
>>
>> return (jlong)hier_memswlimit;
>>
>>
>> to
>>
>>
>> return as_positive_memory_limit(hier_memswlimit);
>>
>>
>> as_positive_memory_limit will assert that the value is not greater than `max_jlong`, which should be guaranteed by the above calculation. More explanations can be added as comments in `as_positive_memory_limit`.
>
>>-2 is being returned by the macro if the file doesn't exist or some other error occurs (basically this case).
>
> Yes, but as you said it'll trigger a short return and never set `memlimit` if it fails, so we'll never see `-2` in `memlimit` regardless.
>
> Now I understand your reasoning, and I get the idea. I do think that if you don't know the macro you'll be confused in one way with the assert, and if you do know the macro you'll be confused without the assert :-). The "real" answer is probably to fix these macros, I think we can do it better. Of course, future RFE.
>
> You read this code more often than I do, I defer to you on keeping or not keeping the `assert`s. Ping me when you've fixed the log issue please and I'll approve the PR.
> I think the `assert(memswlimit > 0, "invariant");` doesn't give a clear explanation why this is the case. And seeing multiple of this kind of assert in the same file is problematic.
OK, I'll remove the asserts.
> Is it possible for `/memory.limit_in_bytes` to use a number that's larger than `max_jlong`?
Theoretically yes. This code really is accounting for `not-a-fixed-unlimited` value being specified for cgroups v1. So I argue we don't care. `max_jlong` is `9,223,372,036,854,775,807`. `max_julong` is `18,446,744,073,709,551,615`. Both are beyond *exabyte* values and we do compare against the actual physical memory of the system (which I'd argue won't be beyond exabytes any time soon?) anyway. We are dealing with a memory limit here, which are supposed to be *smaller* than the physical host values (common case).
> If so, `GET_CONTAINER_INFO(jlong, ... JLONG_FORMAT)` will fail and this function will return `OSCONTAINER_ERROR`. Previously, this function would return -1.
>
> (the fact that `OSCONTAINER_ERROR` (-2) can be returned by this function is also problematic).
We don't care in this case, though. All negative values returned are being treated as "no limit". I.e. in both cases the container limit will be unlimited and there is no change from a users' perspective in that regard.
This is an attempt to fixing log traces, which the user actually *can* see and they don't really make sense, currently. For the theoretical case of the value exceeding `max_jlong` and then returning `OS_CONTAINER_ERROR` we are fine. The physical host memory value will be used instead. Even so, the trace would indicate `-2` in the logs indicating this. It's also not a question of correctness since we can assume that if the value is greater than `2^63-1` we have a unlimited config, unless there are really beyond exabytes memory containers out there.
> I actually think the old code is better.
The old code wanted to deal with "recognizing unlimited container memory values". This is solved - since [JDK-8292083](https://bugs.openjdk.org/browse/JDK-8292083), by using the physical machine's memory as an upper bound of that value. There is really no point to run containers with a memory limit *larger* than the physical host. So I have to disagree. The old code of [JDK-8194232](https://bugs.openjdk.org/browse/JDK-8194232) was a kludge. The new code (with the physical's machine memory as an upper bound) is more sound.
Cases beyond `max_jlong` values are theoretical at best, and won't make a difference in this code. Once we have a large positive number, we know what to do.
-------------
PR: https://git.openjdk.org/jdk/pull/12166
More information about the hotspot-runtime-dev
mailing list