8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail)

jiefu(傅杰) jiefu at tencent.com
Mon Mar 23 13:16:55 UTC 2020


Thanks StefanK and Per for your review and nice suggestions.

I had filed a new JBS: https://bugs.openjdk.java.net/browse/JDK-8241423
And will send a new RFR in the hotspot-dev list later.

Thanks a lot.
Best regards,
Jie

On 2020/3/23, 6:06 PM, "Per Liden" <per.liden at oracle.com> wrote:

    Hi,
    
    On 3/23/20 10:41 AM, Stefan Karlsson wrote:
    > On 2020-03-23 10:06, jiefu(傅杰) wrote:
    >> Hi StefanK,
    >>
    >> Thanks for your review and very nice suggestions.
    >>
    >> After more investigation, I found that several NUMA apis won't work in 
    >> the docker, such as get_mempolicy, numa_tonode_memory, ...
    >> So it isn't only the get_mempolicy that is problematic.
    >>
    >> And Thomas had reminded me that the other gcs are affected by this 
    >> issue too.
    >> So it would be better to fix them together.
    >>
    >> What do you think of 
    >> http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ?
    > 
    > numa_available() is a HotSpot wrapper around the numa_available 
    > function. I don't think you should add this kind of logic inside that 
    > function. Could move it up to libnuma_init instead?
    
    I agree, numa_available() doesn't look like the right place for this, 
    libnuma_init sounds better.
    
    Also, we should also note that, in theory, some of the NUMA-related 
    syscalls (mbind, get_mempolicy, move_pages, etc) could be available but 
    not others. I'm not sure such configurations ever actually appear in the 
    wild though, and if we should care. I suspect checking for one of them 
    is good enough for now, and we can refine this later if it turns out to 
    be a problem.
    
    cheers,
    Per
    
    > 
    > If you intend this to be a generic (non-ZGC) change, then I think it 
    > would be good to create a new RFR and send it to hotspot-dev, so that 
    > the Runtime team and others also see it.
    > 
    > Thanks,
    > StefanK
    > 
    >>
    >> Thanks a lot.
    >> Best regards,
    >> Jie
    >>
    >> On 2020/3/23, 4:43 PM, "Stefan Karlsson" <stefan.karlsson at oracle.com> 
    >> wrote:
    >>
    >>      Hi Jie,
    >>      On 2020-03-22 14:35, jiefu(傅杰) wrote:
    >>      > Hi Erik,
    >>      >
    >>      > Thanks for your review and valuable comments.
    >>      >
    >>      > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/
    >>      >
    >>      > Please review it.
    >>      Thanks for providing this patch.
    >>      If it is only the get_mempolicy that is problematic, then I 
    >> wonder if it
    >>      would be better to leave the UseNUMA flag untouched and only turn 
    >> off
    >>      the ZGC specific NUMA parts. Maybe something like this:
    >>      static bool check_get_mempolicy_support() {
    >>         int dummy = 0;
    >>         int mode = -1;
    >>         // Check whether get_mempolicy is supported or not
    >>         if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy,
    >>      MPOL_F_NODE | MPOL_F_ADDR) == -1) {
    >>           if (!FLAG_IS_DEFAULT(UseNUMA)) {
    >>             warning("ZGC NUMA support is disabled since get_mempolicy is
    >>      unsupported.");
    >>           }
    >>           return false;
    >>         }
    >>         return true;
    >>      }
    >>      void ZNUMA::initialize_platform() {
    >>         _enabled = UseNUMA && check_get_mempolicy_support();
    >>      }
    >>      An alternative would be to take this a step further (probably as a
    >>      separate RFR) and provide a user friendly output in our 
    >> -Xlog:gc+init
    >>      output:
    >>      [0.015s][info][gc,init] Initializing The Z Garbage Collector
    >>      [0.015s][info][gc,init] Version:
    >>      15-internal+0-2020-03-04-0947497.stefank... (fastdebug)
    >>      [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE
    >>      [0.015s][info][gc,init] CPUs: 32 total, 32 available
    >>      [0.015s][info][gc,init] Memory: 128851M
    >>      [0.015s][info][gc,init] Large Page Support: Disabled
    >>      [0.015s][info][gc,init] Medium Page Size: 32M
    >>      [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent
    >>      Borrowing the structure from how UseLargePages are setup and 
    >> printed:
    >>      void ZLargePages::initialize_platform() {
    >>         if (UseLargePages) {
    >>           if (UseTransparentHugePages) {
    >>             _state = Transparent;
    >>           } else {
    >>             _state = Explicit;
    >>           }
    >>         } else {
    >>           _state = Disabled;
    >>         }
    >>      }
    >>      const char* ZLargePages::to_string() {
    >>         switch (_state) {
    >>         case Explicit:
    >>           return "Enabled (Explicit)";
    >>         case Transparent:
    >>           return "Enabled (Transparent)";
    >>         default:
    >>           return "Disabled";
    >>         }
    >>      }
    >>      Thanks,
    >>      StefanK
    >>      >
    >>      > Thanks a lot.
    >>      > Best regards,
    >>      > Jie
    >>      >
    >>      > On 2020/3/22, 4:26 PM, "Erik Österlund" 
    >> <erik.osterlund at oracle.com> wrote:
    >>      >
    >>      >      Hi Jie,
    >>      >
    >>      >      It seems to me that if the environment doesn’t supply the 
    >> required NUMA APIs, then we really should disable UseNUMA instead. I 
    >> propose we check the availability of the syscall during initialization 
    >> instead, and switch off all NUMA functionality when appropriate. And 
    >> we should only print a warning if the user explicitly supplied UseNUMA 
    >> on the command line.
    >>      >
    >>      >      Thanks,
    >>      >      /Erik
    >>      >
    >>      >      > On 20 Mar 2020, at 13:15, jiefu(傅杰) 
    >> <jiefu at tencent.com> wrote:
    >>      >      >
    >>      >      > Hi all,
    >>      >      >
    >>      >      > JBS:    https://bugs.openjdk.java.net/browse/JDK-8241354
    >>      >      > Webrev: 
    >> http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/
    >>      >      >
    >>      >      > A VM fatal error may be observed if ZGC is used.
    >>      >      >
    >>      >      > The background is that some of our products will run in 
    >> the docker.
    >>      >      > For some safety reason, SYS_get_mempolicy is not allowed 
    >> in the docker.
    >>      >      >
    >>      >      > It might be not a good practice to generate a fatal 
    >> error when get_mempolicy fails.
    >>      >      > What do you think?
    >>      >      >
    >>      >      > Thanks a lot.
    >>      >      > Best regards,
    >>      >      > Jie
    >>      >
    >>      >
    >>      >
    >>      >
    >>
    > 
    



More information about the hotspot-gc-dev mailing list