RFR: 8365606: Container code should not be using jlong/julong [v2]
Casper Norrbin
cnorrbin at openjdk.org
Tue Oct 28 09:41:46 UTC 2025
On Fri, 24 Oct 2025 09:45:25 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:
>> Please review this revised version of getting rid of `jlong` and `julong` in internal HotSpot code. The single remaining usage is using `os::elapsed_counter()` which I think is still ok. This refactoring is for the container detection code to (mostly) do away with negative return values.
>>
>> It gets rid of the trifold-use of return value: 1.) error, 2) unlimited values 3) actual numbers/values/limits. Instead, all container related values are now being read from the interface files as `uint64_t` and afterwards interpreted in the way that make sense for the API implementations. For example, `cpu` values will essentially be treated as `int`s as before, potentially returning a negative value `-1` for unlimited. For memory sizes the type `physical_memory_size_type` has been chosen. When there is no limit for a specific memory size a value `value_unlimited` is being returned.
>>
>> All error cases have been changed to returning `false` in the API functions (and no value is being set in the passed in reference for the value). The effect of this is that all container related functions now return a `bool` and require a reference to be passed in for the `value` that is being asked for.
>>
>> All usages of the API have been changed to use the revised API. There is no more usages for `OSCONTAINER_ERROR` (`-2) in HotSpot code.
>>
>> While working on this, I've noticed that there are still some calls deep in the cgroup subsystem code to query "machine" info (e.g. `os::Linux::active_processor_count()`). I've filed [JDK-8369503](https://bugs.openjdk.org/browse/JDK-8369503) to get this cleaned-up as this patch was already getting large.
>>
>> Testing (looking good):
>> - [x] GHA
>> - [x] All container tests (including problem listed ones) on Linux x86_64 with cg v1 and cg v2. See [this comment](https://github.com/openjdk/jdk/pull/27743#issuecomment-3390060127) below.
>> - [x] Some ad-hoc manual testing in containers using JFR (`jdk.SwapSpace` event) and `VM.info` diagnostic command.
>>
>> Thoughts? Opinions?
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
>
> - Merge branch 'master' into jdk-8365606-jlong-julong-refactor
> - Fix print_container_info output
> - whitespace clean-ups and other small fixes
> - Fix log format in container macro and scanf format
> - Fix duplicate include in osContainer_linux
> - 8365606: Container code should not be using jlong/julong
Thanks for taking this on! I think this is a solid improvement, and it's great to finally see the Java types phased out. I had a few thoughts/comments, mostly around the os-level code and the logging of constants, since those constants don't really have much meaning anymore. Otherwise great work!
src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 90:
> 88: if (!is_ok) { \
> 89: log_trace(os, container)(log_string " failed: -2"); \
> 90: return false; \
Do we need to keep the `-2` here? Or could we perhaps change to a better message?
src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 93:
> 91: } \
> 92: if (retval == value_unlimited) { \
> 93: log_trace(os, container)(log_string " is: -1"); \
Same here, could perhaps do `log_trace(os, container)(log_string " is: unlimited")`instead.
src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 465:
> 463: // negative value as a large unsiged int
> 464: if (!reader()->read_number("/cpu.cfs_quota_us", quota)) {
> 465: log_trace(os, container)("CPU Quota failed: -2");
Do we need to keep the `-2` here? Or could we perhaps change to a better message?
src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 61:
> 59: * true if the result reference got updated
> 60: * false if there was an error
> 61: */
We set result to `-1` and return true on a no share setup here, but return `false` and don't on cgroup v1. The comment is contradicting.
src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 115:
> 113: * true if the result reference has been set
> 114: * false on error
> 115: */
The beginning part of the comment isn't updated to mention the `result` reference, unlike the other comments.
src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 178:
> 176: bool is_ok = reader()->read_numerical_key_value("/cpu.stat", "usage_usec", value);
> 177: if (!is_ok) {
> 178: log_trace(os, container)("CPU Usage failed: -2");
Do we need to keep the `-2` here? Or could we perhaps change to a better message?
src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 237:
> 235: if (!reader()->read_number_handle_max("/memory.swap.max", swap_limit_val)) {
> 236: // Some container tests rely on this trace logging to happen.
> 237: log_trace(os, container)("Swap Limit failed: -2");
Do we need to keep the `-2` here? Or could we perhaps change to a better message?
src/hotspot/os/linux/cgroupV2Subsystem_linux.cpp line 379:
> 377: * Calculate the maximum number of tasks available to the process. Set the
> 378: * value in the passed in 'value' reference. The value might be -1 when
> 379: * there is no limit.
How can we get `-1`? Or do you mean `(uint64_t)-1`?
src/hotspot/os/linux/os_linux.cpp line 220:
> 218: if (OSContainer::is_containerized() && OSContainer::available_memory_in_bytes(avail_mem)) {
> 219: log_trace(os)("available container memory: " PHYS_MEM_TYPE_FORMAT, avail_mem);
> 220: value = avail_mem;
Should be able to pass in `value` directly instead of using `avail_mem`.
src/hotspot/os/linux/os_linux.cpp line 261:
> 259: if (OSContainer::is_containerized() && OSContainer::available_memory_in_bytes(free_mem)) {
> 260: log_trace(os)("free container memory: " PHYS_MEM_TYPE_FORMAT, free_mem);
> 261: value = free_mem;
Should be able to pass in `value` directly instead of using `free_mem`.
src/hotspot/os/linux/os_linux.cpp line 348:
> 346: return true;
> 347: }
> 348: }
This whole function is getting a bit too long in my opinion.
Maybe everything inside the `if OSContainer::is_containerized() {}` could be moved into a new function `OSContainer::available_swap_in_bytes`, similar to the already existing `OSContainer::available_memory_in_bytes`. That way, we could abstract away all the `OSContainer` calls.
The only consequence would be that the `log_trace` wouldn't work any more. I couldn't find any test that depends on the exact output, so it could perhaps be split up instead.
src/hotspot/os/linux/os_linux.cpp line 4863:
> 4861: if (OSContainer::is_containerized() && OSContainer::active_processor_count(active_cpus)) {
> 4862: log_trace(os)("active_processor_count: determined by OSContainer: %d",
> 4863: active_cpus);
When running containerized, we would now always fetch the os cpu count at least once.
`CgroupSubsystem::active_processor_count`, which this calls down to has the cache to actively avoid getting the cpu count too frequently, and only gets the number of cpus with `os::Linux::active_processor_count` when the cache expires.
I don't know if this is still an issue today, but since it's there I still think we should avoid getting the cpus if unnecessary.
src/hotspot/share/runtime/os.cpp line 2215:
> 2213: }
> 2214: value = mem_usage;
> 2215: return true;
Can we collapse this and just set the `value` reference directly instead in the container functions? Something like:
```c++
if (OSContainer::is_containerized()) {
return OSContainer::memory_usage_in_bytes(mem_usage);
}
-------------
PR Review: https://git.openjdk.org/jdk/pull/27743#pullrequestreview-3365531156
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2460075401
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2460081327
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2468689176
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2465321137
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2465329926
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2465347559
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2465361453
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2465754752
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2452045546
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2452052597
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2452159278
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2452239511
PR Review Comment: https://git.openjdk.org/jdk/pull/27743#discussion_r2451900698
More information about the hotspot-jfr-dev
mailing list