RFR: 8365606: Container code should not be using jlong/julong
Andrew Haley
aph at openjdk.org
Fri Oct 24 09:39:02 UTC 2025
On Fri, 10 Oct 2025 13:09:48 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:
> Please review this revised version of getting rid of `jlong` and `julong` in internal HotSpot code. The single remaining usage is using `os::elapsed_counter()` which I think is still ok. This refactoring is for the container detection code to (mostly) do away with negative return values.
>
> It gets rid of the trifold-use of return value: 1.) error, 2) unlimited values 3) actual numbers/values/limits. Instead, all container related values are now being read from the interface files as `uint64_t` and afterwards interpreted in the way that make sense for the API implementations. For example, `cpu` values will essentially be treated as `int`s as before, potentially returning a negative value `-1` for unlimited. For memory sizes the type `physical_memory_size_type` has been chosen. When there is no limit for a specific memory size a value `value_unlimited` is being returned.
>
> All error cases have been changed to returning `false` in the API functions (and no value is being set in the passed in reference for the value). The effect of this is that all container related functions now return a `bool` and require a reference to be passed in for the `value` that is being asked for.
>
> All usages of the API have been changed to use the revised API. There is no more usages for `OSCONTAINER_ERROR` (`-2) in HotSpot code.
>
> While working on this, I've noticed that there are still some calls deep in the cgroup subsystem code to query "machine" info (e.g. `os::Linux::active_processor_count()`). I've filed [JDK-8369503](https://bugs.openjdk.org/browse/JDK-8369503) to get this cleaned-up as this patch was already getting large.
>
> Testing (looking good):
> - [x] GHA
> - [x] All container tests (including problem listed ones) on Linux x86_64 with cg v1 and cg v2. See [this comment](https://github.com/openjdk/jdk/pull/27743#issuecomment-3390060127) below.
> - [x] Some ad-hoc manual testing in containers using JFR (`jdk.SwapSpace` event) and `VM.info` diagnostic command.
>
> Thoughts? Opinions?
I'm not surprised about the lack of reviews because it's long and the result is rather ugly. (Sorry, but I had to say it.)
This is less jarring to read:
struct Result {
const int _value; const bool _ok;
Result(int value, bool ok) : _value(value), _ok(ok) {}
};
...
Result CgroupSubsystem::active_processor_count() {
...
cpu_count = os::Linux::active_processor_count();
if (!CgroupUtil::processor_count(contrl->controller(), cpu_count, value)) {
return Result(value, false);
}
assert(value > 0 && value <= cpu_count, "must be");
// Update cached metric to avoid re-reading container settings too often
cpu_limit->set_value(value, OSCONTAINER_CACHE_TIMEOUT);
return Result(value, true);
}
called as:
auto [result, ok] = cgroup_subsystem->active_processor_count();
if (ok) ...
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27743#issuecomment-3442136772
More information about the hotspot-dev
mailing list