8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail)

Per Liden per.liden at oracle.com
Mon Mar 23 10:05:15 UTC 2020


Hi,

On 3/23/20 10:41 AM, Stefan Karlsson wrote:
> On 2020-03-23 10:06, jiefu(傅杰) wrote:
>> Hi StefanK,
>>
>> Thanks for your review and very nice suggestions.
>>
>> After more investigation, I found that several NUMA apis won't work in 
>> the docker, such as get_mempolicy, numa_tonode_memory, ...
>> So it isn't only the get_mempolicy that is problematic.
>>
>> And Thomas had reminded me that the other gcs are affected by this 
>> issue too.
>> So it would be better to fix them together.
>>
>> What do you think of 
>> http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ?
> 
> numa_available() is a HotSpot wrapper around the numa_available 
> function. I don't think you should add this kind of logic inside that 
> function. Could move it up to libnuma_init instead?

I agree, numa_available() doesn't look like the right place for this, 
libnuma_init sounds better.

Also, we should also note that, in theory, some of the NUMA-related 
syscalls (mbind, get_mempolicy, move_pages, etc) could be available but 
not others. I'm not sure such configurations ever actually appear in the 
wild though, and if we should care. I suspect checking for one of them 
is good enough for now, and we can refine this later if it turns out to 
be a problem.

cheers,
Per

> 
> If you intend this to be a generic (non-ZGC) change, then I think it 
> would be good to create a new RFR and send it to hotspot-dev, so that 
> the Runtime team and others also see it.
> 
> Thanks,
> StefanK
> 
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>> On 2020/3/23, 4:43 PM, "Stefan Karlsson" <stefan.karlsson at oracle.com> 
>> wrote:
>>
>>      Hi Jie,
>>      On 2020-03-22 14:35, jiefu(傅杰) wrote:
>>      > Hi Erik,
>>      >
>>      > Thanks for your review and valuable comments.
>>      >
>>      > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/
>>      >
>>      > Please review it.
>>      Thanks for providing this patch.
>>      If it is only the get_mempolicy that is problematic, then I 
>> wonder if it
>>      would be better to leave the UseNUMA flag untouched and only turn 
>> off
>>      the ZGC specific NUMA parts. Maybe something like this:
>>      static bool check_get_mempolicy_support() {
>>         int dummy = 0;
>>         int mode = -1;
>>         // Check whether get_mempolicy is supported or not
>>         if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy,
>>      MPOL_F_NODE | MPOL_F_ADDR) == -1) {
>>           if (!FLAG_IS_DEFAULT(UseNUMA)) {
>>             warning("ZGC NUMA support is disabled since get_mempolicy is
>>      unsupported.");
>>           }
>>           return false;
>>         }
>>         return true;
>>      }
>>      void ZNUMA::initialize_platform() {
>>         _enabled = UseNUMA && check_get_mempolicy_support();
>>      }
>>      An alternative would be to take this a step further (probably as a
>>      separate RFR) and provide a user friendly output in our 
>> -Xlog:gc+init
>>      output:
>>      [0.015s][info][gc,init] Initializing The Z Garbage Collector
>>      [0.015s][info][gc,init] Version:
>>      15-internal+0-2020-03-04-0947497.stefank... (fastdebug)
>>      [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE
>>      [0.015s][info][gc,init] CPUs: 32 total, 32 available
>>      [0.015s][info][gc,init] Memory: 128851M
>>      [0.015s][info][gc,init] Large Page Support: Disabled
>>      [0.015s][info][gc,init] Medium Page Size: 32M
>>      [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent
>>      Borrowing the structure from how UseLargePages are setup and 
>> printed:
>>      void ZLargePages::initialize_platform() {
>>         if (UseLargePages) {
>>           if (UseTransparentHugePages) {
>>             _state = Transparent;
>>           } else {
>>             _state = Explicit;
>>           }
>>         } else {
>>           _state = Disabled;
>>         }
>>      }
>>      const char* ZLargePages::to_string() {
>>         switch (_state) {
>>         case Explicit:
>>           return "Enabled (Explicit)";
>>         case Transparent:
>>           return "Enabled (Transparent)";
>>         default:
>>           return "Disabled";
>>         }
>>      }
>>      Thanks,
>>      StefanK
>>      >
>>      > Thanks a lot.
>>      > Best regards,
>>      > Jie
>>      >
>>      > On 2020/3/22, 4:26 PM, "Erik Österlund" 
>> <erik.osterlund at oracle.com> wrote:
>>      >
>>      >      Hi Jie,
>>      >
>>      >      It seems to me that if the environment doesn’t supply the 
>> required NUMA APIs, then we really should disable UseNUMA instead. I 
>> propose we check the availability of the syscall during initialization 
>> instead, and switch off all NUMA functionality when appropriate. And 
>> we should only print a warning if the user explicitly supplied UseNUMA 
>> on the command line.
>>      >
>>      >      Thanks,
>>      >      /Erik
>>      >
>>      >      > On 20 Mar 2020, at 13:15, jiefu(傅杰) 
>> <jiefu at tencent.com> wrote:
>>      >      >
>>      >      > Hi all,
>>      >      >
>>      >      > JBS:    https://bugs.openjdk.java.net/browse/JDK-8241354
>>      >      > Webrev: 
>> http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/
>>      >      >
>>      >      > A VM fatal error may be observed if ZGC is used.
>>      >      >
>>      >      > The background is that some of our products will run in 
>> the docker.
>>      >      > For some safety reason, SYS_get_mempolicy is not allowed 
>> in the docker.
>>      >      >
>>      >      > It might be not a good practice to generate a fatal 
>> error when get_mempolicy fails.
>>      >      > What do you think?
>>      >      >
>>      >      > Thanks a lot.
>>      >      > Best regards,
>>      >      > Jie
>>      >
>>      >
>>      >
>>      >
>>
> 



More information about the hotspot-gc-dev mailing list