8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail)

jiefu(傅杰) jiefu at tencent.com
Mon Mar 23 09:06:05 UTC 2020


Hi StefanK,

Thanks for your review and very nice suggestions.

After more investigation, I found that several NUMA apis won't work in the docker, such as get_mempolicy, numa_tonode_memory, ...
So it isn't only the get_mempolicy that is problematic.

And Thomas had reminded me that the other gcs are affected by this issue too.
So it would be better to fix them together.

What do you think of http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ?

Thanks a lot.
Best regards,
Jie

On 2020/3/23, 4:43 PM, "Stefan Karlsson" <stefan.karlsson at oracle.com> wrote:

    Hi Jie,
    
    On 2020-03-22 14:35, jiefu(傅杰) wrote:
    > Hi Erik,
    >
    > Thanks for your review and valuable comments.
    >
    > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/
    >
    > Please review it.
    
    Thanks for providing this patch.
    
    If it is only the get_mempolicy that is problematic, then I wonder if it 
    would be better to leave the UseNUMA flag untouched and only turn off 
    the ZGC specific NUMA parts. Maybe something like this:
    
    static bool check_get_mempolicy_support() {
       int dummy = 0;
       int mode = -1;
       // Check whether get_mempolicy is supported or not
       if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy, 
    MPOL_F_NODE | MPOL_F_ADDR) == -1) {
         if (!FLAG_IS_DEFAULT(UseNUMA)) {
           warning("ZGC NUMA support is disabled since get_mempolicy is 
    unsupported.");
         }
         return false;
       }
    
       return true;
    }
    
    void ZNUMA::initialize_platform() {
       _enabled = UseNUMA && check_get_mempolicy_support();
    }
    
    An alternative would be to take this a step further (probably as a 
    separate RFR) and provide a user friendly output in our -Xlog:gc+init 
    output:
    
    [0.015s][info][gc,init] Initializing The Z Garbage Collector
    [0.015s][info][gc,init] Version: 
    15-internal+0-2020-03-04-0947497.stefank... (fastdebug)
    [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE
    [0.015s][info][gc,init] CPUs: 32 total, 32 available
    [0.015s][info][gc,init] Memory: 128851M
    [0.015s][info][gc,init] Large Page Support: Disabled
    [0.015s][info][gc,init] Medium Page Size: 32M
    [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent
    
    Borrowing the structure from how UseLargePages are setup and printed:
    
    void ZLargePages::initialize_platform() {
       if (UseLargePages) {
         if (UseTransparentHugePages) {
           _state = Transparent;
         } else {
           _state = Explicit;
         }
       } else {
         _state = Disabled;
       }
    }
    
    const char* ZLargePages::to_string() {
       switch (_state) {
       case Explicit:
         return "Enabled (Explicit)";
    
       case Transparent:
         return "Enabled (Transparent)";
    
       default:
         return "Disabled";
       }
    }
    
    Thanks,
    StefanK
    
    >
    > Thanks a lot.
    > Best regards,
    > Jie
    >
    > On 2020/3/22, 4:26 PM, "Erik Österlund" <erik.osterlund at oracle.com> wrote:
    >
    >      Hi Jie,
    >      
    >      It seems to me that if the environment doesn’t supply the required NUMA APIs, then we really should disable UseNUMA instead. I propose we check the availability of the syscall during initialization instead, and switch off all NUMA functionality when appropriate. And we should only print a warning if the user explicitly supplied UseNUMA on the command line.
    >      
    >      Thanks,
    >      /Erik
    >      
    >      > On 20 Mar 2020, at 13:15, jiefu(傅杰) <jiefu at tencent.com> wrote:
    >      >
    >      > Hi all,
    >      >
    >      > JBS:    https://bugs.openjdk.java.net/browse/JDK-8241354
    >      > Webrev: http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/
    >      >
    >      > A VM fatal error may be observed if ZGC is used.
    >      >
    >      > The background is that some of our products will run in the docker.
    >      > For some safety reason, SYS_get_mempolicy is not allowed in the docker.
    >      >
    >      > It might be not a good practice to generate a fatal error when get_mempolicy fails.
    >      > What do you think?
    >      >
    >      > Thanks a lot.
    >      > Best regards,
    >      > Jie
    >      
    >      
    >      
    >
    
    
    



More information about the hotspot-gc-dev mailing list