RFR: 8286212: Cgroup v1 initialization causes NPE on some systems [v3]
Ioi Lam
iklam at openjdk.java.net
Thu May 19 06:21:58 UTC 2022
On Wed, 18 May 2022 18:14:52 GMT, Severin Gehwolf <sgehwolf at openjdk.org> wrote:
>> Please review this change to the cgroup v1 subsystem which makes it more resilient on some of the stranger systems. Unfortunately, I wasn't able to re-create a similar system as the reporter. The idea of using the longest substring match for the cgroupv1 file paths was based on the fact that on systemd systems processes run in separate scopes and the maven forked test runner might exhibit this property. For that it makes sense to use the common ancestor path. Nothing changes in the common cases where the `cgroup_path` matches `_root` and when the `_root` is `/` (container the former, host system the latter).
>>
>> In addition, the code paths which are susceptible to throw NPE have been hardened to catch those situations. Should it happen in the future it makes more sense (to me) to not have accurate container detection in favor of continuing to keep running.
>>
>> Finally, with the added unit-tests a bug was uncovered on the "substring" match case of cgroup paths in hotspot. `p` returned from `strstr` can never point to `_root` as it's used as the "needle" to find in "haystack" `cgroup_path` (not the other way round).
>>
>> Testing:
>> - [x] Added unit tests
>> - [x] GHA
>> - [x] Container tests on cgroups v1 Linux. Continue to pass
>
> Severin Gehwolf has updated the pull request incrementally with two additional commits since the last revision:
>
> - Refactor hotspot gtest
> - Separate into function. Fix comment.
Changes requested by iklam (Reviewer).
src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 60:
> 58: strncpy(buf, _mount_point, MAXPATHLEN);
> 59: buf[MAXPATHLEN-1] = '\0';
> 60: _path = os::strdup(buf);
Comment on the old code: why this cannot be simply
_path = os::strdup(_mount_point);
Also, all the strncat calls in this function seem problematic, and should be converted to stringStream for consistency.
src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 63:
> 61: } else {
> 62: char *p = strstr(cgroup_path, _root);
> 63: if (p != NULL && p == cgroup_path) {
What happens if cgroup_path is "/foo/bar" and _root is "/fo"?
Maybe this block should be combined with the new `else` block you're adding? However, the behavior seems opposite between these two blocks of code:
The upper block: _root is a prefix of cgroup_path, we take the **tail** of cgroup_path
TestCase substring_match = {
"/sys/fs/cgroup/memory", // mount_path
"/user.slice/user-1000.slice", // root_path
"/user.slice/user-1000.slice/user at 1001.service", // cgroup_path
"/sys/fs/cgroup/memory/user at 1001.service" // expected_path
};
The lower block: The beginning of _root is a prefix of cgroup_path, we take the **head** of cgroup_path
TestCase prefix_matched_cg = {
"/sys/fs/cgroup/memory", // mount_path
"/user.slice/user-1000.slice/session-50.scope", // root_path
"/user.slice/user-1000.slice/session-3.scope", // cgroup_path
"/sys/fs/cgroup/memory/user.slice/user-1000.slice" // expected_path
};
I find the behavior hard to understand. I think the rules (and reasons) should be added to the comment block above the function.
src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 86:
> 84: const char* cgroup_p = cgroup_path;
> 85: int last_slash = find_last_slash_pos(root_p, cgroup_p);
> 86: assert(last_slash >= 0, "not an absolute path?");
Are root_p and cgroup_p directly read from the /proc/xxx files. If so, do we validate the input to make sure they are absolute paths?
It seems like our code cannot handle trailing '/' in the input. If so, we should clear all trailing '/' from the input string. Then, in functions that process them, we should assert that they don't end with slash. See my comment in find_last_slash_pos().
src/hotspot/os/linux/cgroupV1Subsystem_linux.cpp line 102:
> 100: for (int i = 0; *s1 == *s2 && *s1 != 0; i++) {
> 101: if (*s1 == '/') {
> 102: last_matching_slash_pos = i;
I found the behavior a little hard to understand. Is it intentional?
"/cat/cow", "/cat/dog" -> "/cat/"
"/cat/", "/cat/dog" -> "/cat/"
"/cat", "/cat/dog" -> "/"
The name `find_last_slash_pos` doesn't properly describe the behavior. I thought about `find_common_path_prefix`, but that doesn't match the current behavior (for the 3rd case, the common path prefix is `"/cat"`).
-------------
PR: https://git.openjdk.java.net/jdk/pull/8629
More information about the serviceability-dev
mailing list