RFR: 8302744: Refactor Hotspot container detection code [v2]

Thomas Stuefe stuefe at openjdk.org
Wed May 22 14:24:13 UTC 2024


On Tue, 21 May 2024 13:58:22 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:

>> Please review this container detection code refactoring in hotspot. The main point of this is to
>> 
>> - get rid of the `GET_CONTAINER_INFO` macros which hide too many things under the hood
>> - prevent refactoring of the code (since `GET_CONTAINER_INFO` macros short-return and are therefore not portable; at least not without some risk)
>> - make the code easier to understand
>> - allow for better testing via `gtest`
>> - separate multi-line parsing from single line parsing for clarity.
>> 
>> Testing:
>> - [x] GHA
>> - [x] `gtest:cgroupTest` tests
>> - [x] Container tests on Linux with cgroup v1 (legacy) and cgroup v2. All pass.
>> 
>> Thoughts?
>
> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision:
> 
>  - Fix TestMemoryAwareness for cgroup v2
>    
>    It only reads the swap value, thus doesn't print:
>    "Memory and Swap Limit is" but rather prints:
>    "Swap Limit is". This is fine, since for cg v2
>    the memory limit and the swap limits are in different
>    files.
>  - Fix whitespace
>  - Handle cpu quota for cgroups v1 specially
>  - Try a different approach.
>    
>    Hybrid MACRO usage and locked down reading primitives.
>  - Add key-value read function
>  - Add basic testing for read functions
>  - Add generic julong/char* read functions
>  - Merge branch 'master' into jdk-8302744-cleanup-getcontainer-info
>  - 8302744: Refactor Hotspot container detection code

@jerboaa 

A lot better! 

If you move the sscanf up to `read_(number|numerical_tuple_value)()`, things become even simpler and you can remove the templatization altogether.

Also, at the moment, when you call read_string, three different buffers are involved, and three buffer copies. And one of the buffers is malloc'ed, which is not ideal for a high frequency call. And it forces you to deal with freeing that buffer.

You have:

- `read_number` -> `read_from_file`
- `read_numerical_tuple_value` -> `read_from_file`
- `read_string` -> `read_from_file`

You can move the sscanf up to the callers:

- Let `read_from_file()` only read the first line from a file and return that. No sscanf. Let it take an output buffer (e.g. `char output[1024]`) as argument.
- Let `read_number()` call `read_from_file` with a local buffer, then call sscanf with its `JULONG_FORMAT`
- Let `read_numerical_tuple_value()` call `read_from_file` with a local buffer, then call sscanf with the TupleFormat
- Let `read_string` call `read_from_file` with a local buffer. No need for sscanf. It can just return the strduped buffer.

No need for templatization of `read_from_file` then.

An extension of that idea would be to not use strdup in `read_string` at all, but to require a caller-provided buffer. Then, we don't have to copy data around, and you could just remove `read_string` altogether, since it adds nothing of value. Callers can call `read_from_file` directly.

Removing the strdup from the read_string path would mean you can also remove all the `os::free` calls in limit_from_str. No need to think about who owns which buffers.

You would introduce local buffers in those functions that call read_string. In those that work on caller-provided buffers, return either the caller-provided buffer or NULL to indicate an error (which is the typical C-style pattern), e.g.:



char* CgroupV2Subsystem::pids_max_val(char buffer[1024]) {
  CONTAINER_READ_STRING_CHECKED(_unified, "/pids.max", "Maximum number of tasks", buffer);
  return buffer;
}

jlong CgroupV2Subsystem::pids_max() {
  char buffer[1024]
  char * pidsmax_str = pids_max_val(buffer);
  return limit_from_str(buffer);
}


Yet another way to avoid malloc would be using ResourceAreas. But that introduces dependencies to initialization, and I hesitate having cgroups handling depend on VM initialization.

-------------

PR Review: https://git.openjdk.org/jdk/pull/19060#pullrequestreview-2071407841


More information about the hotspot-runtime-dev mailing list