RFR: 8253435: Cgroup: 'stomping of _mount_path' crash if manually mounted cpusets exist [v2]
Severin Gehwolf
sgehwolf at openjdk.java.net
Thu Sep 24 17:23:04 UTC 2020
On Thu, 24 Sep 2020 09:02:03 GMT, Volker Simonis <simonis at openjdk.org> wrote:
>> Hi,
>>
>> can I please have a review (or an idea for a better fix) for this PR?
>>
>> If a tool like [cpuset](https://github.com/lpechacek/cpuset) is used to manually create and manage
>> [cpusets](https://man7.org/linux/man-pages/man7/cpuset.7.html) the cgroups detections will be confused and crash in a
>> debug build or behave unexpectedly in a product build. The problem is that the additionally mounted cpuset will be
>> interpreted as if it was belonging to Cgroup controller: $ grep cgroup /proc/self/mountinfo
>> 36 25 0:30 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9 - tmpfs tmpfs ro,mode=755
>> 49 36 0:43 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,memory
>> 50 36 0:44 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:24 - cgroup cgroup rw,rdma
>> ...
>> 43 36 0:37 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,cpuset
>> 121 32 0:37 / /cpusets rw,relatime shared:69 - cgroup none rw,cpuset
>> The current fix solves this problem for manually created cpusets which don't have a "mount source" but this is yet
>> another heuristic. I'm open to better solutions for detecting cpusets which don't don't belong to a Cgroup.
>> Thanks,
>> Volker
>
> Volker Simonis has refreshed the contents of this pull request, and previous commits have been removed. The incremental
> views will show differences compared to the previous content of the PR. The pull request contains one new commit since
> the last revision:
> 8253435: Cgroup: 'stomping of _mount_path' crash if manually mounted cpusets exist
I'm not sure this extra gymnastics will be worth the added complexity. This makes the code somewhat harder to follow
and, essentially, we end up checking whether or not the cgroup controller is being mounted under `/sys/fs/cgroup`. The
code tracks any interesting controller, like `memory`, `cpu`, `cpuset` and `cpuacct` and records the mount point. It's
going to be `/sys/fs/cgroup` for almost all cases, would it not?
The skip for non-/sys/fs/cgroup controllers ends up being either `/fo/bar/baz` or whatever the lead-up path to the
first seen "interesting" controller is or `/sys/fs/cgroup`. I'm not convinced it's really anything other than
`/sys/fs/cgroup`. How about a simpler solution like this?
https://github.com/jerboaa/jdk/commit/1323315036c0b7625cd1a05690b5944df8457bad
This seems to work for me.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 122:
> 120: if (*mount_path == NULL) {
> 121: *mount_path = os::strdup(tmpmount);
> 122: if (strrchr(*mount_path, '/')) {
I guess we could avoid calling `strrchr` twice here by using a local variable. How about this?
char* last_slash = strrchr(*mount_path, '/');
if (last_slash != NULL) {
*last_slash = '\0'; // truncate controller
}
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 318:
> 316: // Skip controllers created manually or by cset/cpuset (https://github.com/lpechacek/cpuset). E.g.:
> 317: // 121 32 0:37 / /cpusets rw,relatime shared:69 - cgroup none rw,cpuset
> 318: // Controllers beloning to a Cgroup are usually mounted under "/sys/fs/cgroup" while
s/beloning/belonging/
test/hotspot/jtreg/containers/cgroup/CgroupSubsystemFactory.java line 271:
> 269: test.testCgroupv2NoCgroup2Fs(wb);
> 270: test.testCgroupv1MultipleCpusetMounts(wb, test.cgroupv1MntInfoDoubleCpuset);
> 271: test.testCgroupv1MultipleCpusetMounts(wb, test.cgroupv1MntInfoDoubleCpuset2);
For this test to actually work properly we need an additional new line in the hybrid snippet. Sorry about this, was
probably my fault at the time. Like this:
+++ b/test/hotspot/jtreg/containers/cgroup/CgroupSubsystemFactory.java
@@ -104,7 +104,7 @@ public class CgroupSubsystemFactory {
"41 30 0:37 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:13 - cgroup none rw,seclabel,devices\n" +
"42 30 0:38 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:14 - cgroup none rw,seclabel,cpuset\n" +
"43 30 0:39 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:15 - cgroup none rw,seclabel,blkio\n" +
- "44 30 0:40 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:16 - cgroup none rw,seclabel,freezer";
+ "44 30 0:40 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:16 - cgroup none
rw,seclabel,freezer\n";
private String mntInfoHybridRest = cgroupv1MountInfoLineMemory + mntInfoHybridStub;
private String mntInfoHybridMissingMemory = mntInfoHybridStub;
private String mntInfoHybrid = cgroupV2LineHybrid + mntInfoHybridRest;
Without it the generated `cgroupv1MntInfoDoubleCpuset2` file ends up containing:
31 30 0:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:5 - cgroup2 none rw,seclabel,nsdelegate
35 30 0:31 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:7 - cgroup none rw,seclabel,memory
30 23 0:26 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:4 - tmpfs tmpfs ro,seclabel,mode=755
32 30 0:28 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup none
rw,seclabel,xattr,name=systemd 36 30 0:32 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:8 - cgroup none
rw,seclabel,pids 37 30 0:33 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:9 - cgroup none
rw,seclabel,perf_event 38 30 0:34 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:10 - cgroup
none rw,seclabel,net_cls,net_prio 39 30 0:35 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:11 -
cgroup none rw,seclabel,hugetlb 40 30 0:36 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:12 -
cgroup none rw,seclabel,cpu,cpuacct 41 30 0:37 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:13 -
cgroup none rw,seclabel,devices 42 30 0:38 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:14 - cgroup
none rw,seclabel,cpuset 43 30 0:39 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:15 - cgroup none
rw,seclabel,blkio 44 30 0:40 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:16 - cgroup none
rw,seclabel,freezer121 32 0:37 / /cpusets rw,relatime shared:69 - cgroup none rw,cpuset which doesn't trigger the bug.
src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 335:
> 333: cg_infos[CPUSET_IDX]._mount_path = os::strdup(tmpmount);
> 334: cg_infos[CPUSET_IDX]._root_mount_path = os::strdup(tmproot);
> 335: cg_infos[CPUSET_IDX]._data_complete = true;
It's not clear to me why `check_mount_path(&mount_path, tmpmount);` isn't called on this code path. Could you explain?
-------------
Changes requested by sgehwolf (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/295
More information about the hotspot-runtime-dev
mailing list