RFR: 8239559: Cgroups v2: Incorrect detection logic on some systems

Severin Gehwolf sgehwolf at redhat.com
Fri Feb 21 14:30:01 UTC 2020


Hi Bob,

On Fri, 2020-02-21 at 09:11 -0500, Bob Vandette wrote:
> Severin,
> 
> Don’t we need the contents of /proc/self/mountinfo in order to construct the path to the cgroup controllers?

There is only one for unified (cgroups v2), but yes it's beeing used.
See CgroupV2Subsystem.initSubsystem() and
CgroupV1Subsystem.initSubsystem(). For affected systems, no controllers
are mounted, so the effect will be null Metrics, as before JDK-8231111. 
Maybe I didn't understand the question, sorry.

> On Thu, 2020-02-20 at 14:50 +0000, Baesken, Matthias wrote:
> > Hi  Severin,
> > 
> > grep cgroup /proc/self/mountinfo 
> > 
> > returns  nothing.
> > 
> > Best Regards, Matthias
> > 
> 
> Assuming your fix is correct, don’t we also need to apply the same change to the hotspot source cgroupSubsystem_linux.cpp?

Yes, tracked with JDK-8239785.

Thanks,
Severin

> Bob;
> 
> 
> > On Feb 21, 2020, at 8:32 AM, Severin Gehwolf <sgehwolf at redhat.com> wrote:
> > 
> > Hi,
> > 
> > Could I please get a review of this fix to the detection heuristic of
> > cgroup v1 vs cgroup v2? Matthias (in CC) discovered that on some old
> > systems the JDK Metrics code throws InternalError caused by wrong
> > detection logic when Metrics are being created on Linux.
> > 
> > The reason for this is that hierarchy IDs of 0 in /proc/cgroups is
> > being used as a heuristic to detect cgroups v2 systems. Apparently some
> > old systems like RHEL 6 and SLES 11 have no cgroups controllers
> > mounted, thus, triggering a false positive.
> > 
> > The fix is to also look at /proc/self/mountinfo and correct logic in
> > this case.
> > 
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8239559
> > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8239559/01/webrev/
> > 
> > Testing: docker/cgroups tests on hybrid (cgroups v1) and unified
> > hierarchy (cgroups v2). New regression test. Looks good here.
> > 
> > Unfortunately, I wasn't able to reproduce this on an actual affected
> > system. I somewhat reproduced via the derived regression test based on
> > data from reporters. I'd appreciate any testing on systems where this
> > reproduces.
> > 
> > Thanks,
> > Severin
> > 



More information about the core-libs-dev mailing list