RFR: 8302744: Refactor Hotspot container detection code
Johan Sjölen
jsjolen at openjdk.org
Mon May 6 10:53:54 UTC 2024
On Thu, 2 May 2024 12:36:11 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:
> Please review this container detection code refactoring in hotspot. The main point of this is to
>
> - get rid of the `GET_CONTAINER_INFO` macros which hide too many things under the hood
> - prevent refactoring of the code (since `GET_CONTAINER_INFO` macros short-return and are therefore not portable; at least not without some risk)
> - make the code easier to understand
> - allow for better testing via `gtest`
> - separate multi-line parsing from single line parsing for clarity.
>
> Testing:
> - [x] GHA
> - [x] `gtest:cgroupTest` tests
> - [x] Container tests on Linux with cgroup v1 (legacy) and cgroup v2. All pass.
>
> Thoughts?
Hi Severin,
I've gone through a bit more of the code now.
I'm really wondering if simply doing the macro expansion is the right way to go here. So many of these do the same thing, isn't there a better pattern? I'm thinking something like this:
1. Change the calling convention to taking an out-parameter reference and return an int for error
```c++
int CgroupV1MemoryController::uses_mem_hierarchy(jlong& use_hierarchy) {
jlong temp;
int err = cg_file_contents_ctrl(this, "/memory.use_hierarchy", JLONG_FORMAT, &temp);
if (err != 0) {
log_trace(os, container)("Use Hierarchy is: %d", OSCONTAINER_ERROR);
return err;
}
use_hierarchy = temp;
log_trace(os, container)("Use Hierarchy is: " JLONG_FORMAT, use_hierarchy);
return 0;
}
2. Abstract out the message such that it is obvious when it failed and not:
```c++
int CgroupV1MemoryController::uses_mem_hierarchy(jlong& use_hierarchy) {
const char* str = "Use Hierarchy";
jlong temp;
int err = cg_file_contents_ctrl(this, "/memory.use_hierarchy", JLONG_FORMAT, &temp);
if (err != 0) {
log_trace(os, container)("%s failed: %d", str, OSCONTAINER_ERROR);
return err;
}
log_trace(os, container)("%s is: " JLONG_FORMAT, str, use_hierarchy);
return 0;
}
3. Port this pattern over to other similar methods and introduce a common macro for creating them. Here's some make believe code
```c++
#define V1_SIMPLE_ACCESSORS_LIST \
SIMPLE_ACCESSOR(julong, kernel_memory_usage_in_bytes, _memory->controller(), "/memory.kmem.limit_in_bytes", "Kernel Memory Limit", JULONG_FORMAT, kmem_limit) \
SIMPLE_ACCESSOR(jlong, uses_mem_hierarchy, this, "/memory.use_hierarchy", "Use Hierarchy", JLONG_FORMAT, use_hierarchy) \
SIMPLE_ACCESSOR(jlong, read_mem_swappiness, _memory->controller(), "/memory.swappiness", "Swappiness", JULONG_FORMAT, swappiness)
#define SIMPLE_ACCESSOR(ret_type, name, accessor, filename, logformat, logformat2, retname) \
int CgroupV1Subsystem::name(ret_type& retname) { \
ret_type temp; \
int err = cg_file_contents_ctrl(accessor, filename, logformat2, &temp); \
if (err != 0) { log_debug(os)("%s: failed %d" logformat, err); return err; } \
log_debug(os)("%s: failed " logformat2 logformat, temp); \
retname = temp; \
return 0; }
V1_SIMPLE_ACCESSORS_LIST
#undef SIMPLE_ACCESSOR
Which generates the following code (checked using `gcc -E` and formatted using `clang-format`)
```c++
int CgroupV1Subsystem::kernel_memory_usage_in_bytes(julong &kmem_limit) {
julong temp;
int err = cg_file_contents_ctrl(_memory->controller(),
"/memory.kmem.limit_in_bytes", JULONG_FORMAT,
&temp);
if (err != 0) {
log_debug(os)("%s: failed %d"
"Kernel Memory Limit",
err);
return err;
}
log_debug(os)("%s: failed " JULONG_FORMAT "Kernel Memory Limit", temp);
kmem_limit = temp;
return 0;
}
int CgroupV1Subsystem::uses_mem_hierarchy(jlong &use_hierarchy) {
jlong temp;
int err =
cg_file_contents_ctrl(this, "/memory.use_hierarchy", JLONG_FORMAT, &temp);
if (err != 0) {
log_debug(os)("%s: failed %d"
"Use Hierarchy",
err);
return err;
}
log_debug(os)("%s: failed " JLONG_FORMAT "Use Hierarchy", temp);
use_hierarchy = temp;
return 0;
}
int CgroupV1Subsystem::read_mem_swappiness(jlong &swappiness) {
jlong temp;
int err = cg_file_contents_ctrl(_memory->controller(), "/memory.swappiness",
JULONG_FORMAT, &temp);
if (err != 0) {
log_debug(os)("%s: failed %d"
"Swappiness",
err);
return err;
}
log_debug(os)("%s: failed " JULONG_FORMAT "Swappiness", temp);
swappiness = temp;
return 0;
}
Some of these methods do some more complex logic, I've noticed, so perhaps there isn't a silver bullet. Anyway, let me know what you think of this, it'd be good to simplify this code.
src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 79:
> 77: PRAGMA_DIAG_PUSH
> 78: PRAGMA_FORMAT_NONLITERAL_IGNORED
> 79: template <typename T> int __cg_file_contents_impl(const char *absolute_path,
Can we please have the `template <typename T>` separate from the rest of the function type? What's the significance of the double underscore prefix in the naming?
src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 80:
> 78: PRAGMA_FORMAT_NONLITERAL_IGNORED
> 79: template <typename T> int __cg_file_contents_impl(const char *absolute_path,
> 80: const char *scan_fmt,
This is pre-existing, but can we change this in the code you touch? Current Hotspot style is to have the `*` hugging the type and not the variable name, so `const char* absolute_path`.
src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 96:
> 94: return OSCONTAINER_ERROR;
> 95: }
> 96: fclose(fp);
We're done with `fp` regardless of branch, move the fclose up to after fgets.
src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 122:
> 120: return OSCONTAINER_ERROR;
> 121: }
> 122: if (scan_fmt == nullptr || returnval == nullptr) {
Can't these be `assert`s?
src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 148:
> 146: T returnval) {
> 147: FILE* fp = os::fopen(absolute_path, "r");
> 148: if (fp == nullptr) {
Pre-existing: This doesn't return after it turns out that opening the file failed. Perhaps this should be fixed?
src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 84:
> 82: return (jlong)OSCONTAINER_ERROR;
> 83: }
> 84: log_trace(os, container)("Use Hierarchy is: " JLONG_FORMAT, use_hierarchy);
A future RFE should change these to taking an out parameter and returning the int directly, no casting involved. Also, generally strange that we log the same thing regardless if it's an error or not.
-------------
PR Review: https://git.openjdk.org/jdk/pull/19060#pullrequestreview-2040360265
PR Review Comment: https://git.openjdk.org/jdk/pull/19060#discussion_r1590764203
PR Review Comment: https://git.openjdk.org/jdk/pull/19060#discussion_r1590760396
PR Review Comment: https://git.openjdk.org/jdk/pull/19060#discussion_r1590762667
PR Review Comment: https://git.openjdk.org/jdk/pull/19060#discussion_r1590766642
PR Review Comment: https://git.openjdk.org/jdk/pull/19060#discussion_r1590769036
PR Review Comment: https://git.openjdk.org/jdk/pull/19060#discussion_r1590776229
More information about the hotspot-runtime-dev
mailing list