RFR: 8241423: NUMA APIs may fail to work in the docker due to operation not permitted

Bob Vandette bob.vandette at oracle.com
Fri Apr 10 14:27:52 UTC 2020


> On Apr 3, 2020, at 3:58 AM, jiefu(傅杰) <jiefu at tencent.com> wrote:
> 
> Hi Bob,
> 
> Thanks for your review and helpful comments. 
> 
> I'm not a docker expert. 
> Apart from the zgc crash [1], we didn't come across other problems in the docker.
> 
> It seems that this bug has nothing to do with the resource limit.
> The root cause is that some NUMA-related syscalls are disabled in the docker for safety reasons.

I understand but it would be useful to ensure that all aspects of supporting NUMA in containers
work while addressing this issue.  What good is it to enable NUMA but have it not work properly.
I’m only asking for some validation since you appear to have a proper setup.
 
> 
> Please note that we already have numa_available() check here [2].
> But it failed to detect such cases.
Ah yes, I see.

> 
> What do you think?

It would be better if we could find a more supported way  of determining if our process has
the required access rather than counting on a failing syscall.

What about using prctl?

http://man7.org/linux/man-pages/man2/prctl.2.html

This call can be used to query if a specific capability is enabled.

Bob.


> 
> Thanks a lot.
> Best regards,
> Jie
> 
> [1] https://bugs.openjdk.java.net/browse/JDK-8241354
> [2] http://hg.openjdk.java.net/jdk/jdk/file/f50a7df94744/src/hotspot/os/linux/os_linux.cpp#l3182
> 
> On 2020/4/3, 3:58 AM, "Bob Vandette" <bob.vandette at oracle.com> wrote:
> 
>    Jie,
> 
>    Before we discuss this specific fix, I’d like to know if you have confirmed that Hotspot’s NUMA
>    support actually functions properly when running in containers (with proper privs)?
> 
>    Also, do the libnuma functions work properly in response to cgroup limitations imposed by docker run --cpuset-mems?
> 
>    Some of the traditional kernel functions reporting resource limits only report host values and do not
>    correctly report limits specified for containers.   To resolve this issue I have added an osContainer
>    class to hotspot.  Included in this class is a function that reports the memory nodes available 
>    to hotspot when running in a container.   It might be necessary to query this function when
>    trying to configure the hotspot NUMA support.
> 
>    Back to your webrev, is it not possible to get the address for numa_available and
>    then try to calling it in order to determine if NUMA can be used?
> 
>    If it is determined that you don’t have sufficient access, I would suggest disabling UseNUMA
>    all together.
> 
>    Bob
> 
>> On Mar 23, 2020, at 11:58 AM, jiefu(傅杰) <jiefu at tencent.com> wrote:
>> 
>> Hi all,
>> 
>> JBS:    https://bugs.openjdk.java.net/browse/JDK-8241423
>> Webrev: http://cr.openjdk.java.net/~jiefu/8241423/webrev.00/
>> 
>> A VM fatal error may be observed if ZGC is used (see JDK-8241354).
>> The background is that some of our products run in the docker.
>> And for safety reasons, SYS_get_mempolicy is not allowed by default [1].
>> 
>> At first, we thought it just a zgc-only problem and filed JDK-8241354.
>> But Thomas had reminded me that other collectors are also affected [2].
>> So it would be better to fix them together.
>> 
>> After more investigation, we found that NUMA APIs are actually dependent on several syscalls, such as get_mempolicy, mbind and set_mempolicy.
>> When the required syscalls are unavailable, NUMA APIs fail to work as expected.
>> 
>> The fix is to check whether the required syscalls are available.
>> In theory, all NUMA-related syscalls should be checked.
>> But it seems hard to do so because some of them will cause unexpected side effect.
>> To fix our issue, checking get_mempolicy is enough.
>> And just as Per suggested that we can refine this later if it turns out to be a problem [3].
>> 
>> Please review it and give me some advice.
>> 
>> Thanks a lot.
>> Best regards,
>> Jie
>> 
>> [1] https://docs.docker.com/engine/security/seccomp/
>> [2] https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028923.html
>> [3] https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028933.html
> 
> 
> 
> 



More information about the hotspot-dev mailing list