RFR: 8302744: Refactor Hotspot container detection code

Mon May 6 17:14:54 UTC 2024

On Thu, 2 May 2024 12:36:11 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

> Please review this container detection code refactoring in hotspot. The main point of this is to
> 
> - get rid of the `GET_CONTAINER_INFO` macros which hide too many things under the hood
> - prevent refactoring of the code (since `GET_CONTAINER_INFO` macros short-return and are therefore not portable; at least not without some risk)
> - make the code easier to understand
> - allow for better testing via `gtest`
> - separate multi-line parsing from single line parsing for clarity.
> 
> Testing:
> - [x] GHA
> - [x] `gtest:cgroupTest` tests
> - [x] Container tests on Linux with cgroup v1 (legacy) and cgroup v2. All pass.
> 
> Thoughts?

Hi Severin,

One thing I always disliked is that we wrap the scanf in a wrapper. That does cost us safety. We don't pass in format string literals, therefore the compiler cannot check the format strings for us.

Your patch makes it slightly worse: the old version just happened to use the log_fmt string in UL log lines as part of the GET_CONTAINER_INFO macro. They expanded as literals at the call site. This caused format string errors at least for wrong log format strings. Since you now moved the log lines the templatized function (or, removed them?), we don't get these warnings anymore.

Apart from the format string problem:

We have a templatized function that abstracts away the return type of the scanf operation. Okay. But we still manually hand in the scanf format string. So, nothing is gained from the templatization (before your patch and now).

Looking closely, afaics we only have a few cases:
a) %1023s, return a string
b) %ul, return 64bit unsigned
c) %d, return signed int (only found cpu-weigth, and documentation did not indicate this could be negative)
d) cpu-max is the odd duck out, consisting of a string and a number (eg "max 100000"). We ignore the string, just return the number.

case (b) and (c) could be handled as one, since you can read an unsigned also with ulong format.

So, this means we only have two cases - reading a string (of 1K length) and an unsigned numeric. And then the weird cpu-max thing. So I wonder whether we really need a templatized scan function at all. Why not just:
- a function scanning a 64bit unsigned from a file (I think even have that already)
- a function scanning a string from a file
- and the cpu-max thing could be handled by reading the line as string, then parsing the string with sscanf.

That would cut down a lot of complexity. And we would get compiler warnings for the format strings, no need for the pragmas anymore. What do you think?

Cheers, Thomas

src/hotspot/os/linux/cgroupSubsystem_linux.hpp line 87:

> 85:     return OSCONTAINER_ERROR;
> 86:   }
> 87: 

We can simplify this coding by just using fscanf. No need to call fgets + sscanf.

-------------

PR Review: https://git.openjdk.org/jdk/pull/19060#pullrequestreview-2041234608
PR Review Comment: https://git.openjdk.org/jdk/pull/19060#discussion_r1591291086