8241354: ZGC: fatal error: Failed to get NUMA id due to get_mempolicy operation not permitted(Internet mail)
jiefu(傅杰)
jiefu at tencent.com
Mon Mar 23 13:16:55 UTC 2020
Thanks StefanK and Per for your review and nice suggestions.
I had filed a new JBS: https://bugs.openjdk.java.net/browse/JDK-8241423
And will send a new RFR in the hotspot-dev list later.
Thanks a lot.
Best regards,
Jie
On 2020/3/23, 6:06 PM, "Per Liden" <per.liden at oracle.com> wrote:
Hi,
On 3/23/20 10:41 AM, Stefan Karlsson wrote:
> On 2020-03-23 10:06, jiefu(傅杰) wrote:
>> Hi StefanK,
>>
>> Thanks for your review and very nice suggestions.
>>
>> After more investigation, I found that several NUMA apis won't work in
>> the docker, such as get_mempolicy, numa_tonode_memory, ...
>> So it isn't only the get_mempolicy that is problematic.
>>
>> And Thomas had reminded me that the other gcs are affected by this
>> issue too.
>> So it would be better to fix them together.
>>
>> What do you think of
>> http://cr.openjdk.java.net/~jiefu/8241354/webrev.02/ ?
>
> numa_available() is a HotSpot wrapper around the numa_available
> function. I don't think you should add this kind of logic inside that
> function. Could move it up to libnuma_init instead?
I agree, numa_available() doesn't look like the right place for this,
libnuma_init sounds better.
Also, we should also note that, in theory, some of the NUMA-related
syscalls (mbind, get_mempolicy, move_pages, etc) could be available but
not others. I'm not sure such configurations ever actually appear in the
wild though, and if we should care. I suspect checking for one of them
is good enough for now, and we can refine this later if it turns out to
be a problem.
cheers,
Per
>
> If you intend this to be a generic (non-ZGC) change, then I think it
> would be good to create a new RFR and send it to hotspot-dev, so that
> the Runtime team and others also see it.
>
> Thanks,
> StefanK
>
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>> On 2020/3/23, 4:43 PM, "Stefan Karlsson" <stefan.karlsson at oracle.com>
>> wrote:
>>
>> Hi Jie,
>> On 2020-03-22 14:35, jiefu(傅杰) wrote:
>> > Hi Erik,
>> >
>> > Thanks for your review and valuable comments.
>> >
>> > Updated: http://cr.openjdk.java.net/~jiefu/8241354/webrev.01/
>> >
>> > Please review it.
>> Thanks for providing this patch.
>> If it is only the get_mempolicy that is problematic, then I
>> wonder if it
>> would be better to leave the UseNUMA flag untouched and only turn
>> off
>> the ZGC specific NUMA parts. Maybe something like this:
>> static bool check_get_mempolicy_support() {
>> int dummy = 0;
>> int mode = -1;
>> // Check whether get_mempolicy is supported or not
>> if (ZSyscall::get_mempolicy(&mode, NULL, 0, (void*)&dummy,
>> MPOL_F_NODE | MPOL_F_ADDR) == -1) {
>> if (!FLAG_IS_DEFAULT(UseNUMA)) {
>> warning("ZGC NUMA support is disabled since get_mempolicy is
>> unsupported.");
>> }
>> return false;
>> }
>> return true;
>> }
>> void ZNUMA::initialize_platform() {
>> _enabled = UseNUMA && check_get_mempolicy_support();
>> }
>> An alternative would be to take this a step further (probably as a
>> separate RFR) and provide a user friendly output in our
>> -Xlog:gc+init
>> output:
>> [0.015s][info][gc,init] Initializing The Z Garbage Collector
>> [0.015s][info][gc,init] Version:
>> 15-internal+0-2020-03-04-0947497.stefank... (fastdebug)
>> [0.015s][info][gc,init] NUMA Support: Unsupported <== HERE
>> [0.015s][info][gc,init] CPUs: 32 total, 32 available
>> [0.015s][info][gc,init] Memory: 128851M
>> [0.015s][info][gc,init] Large Page Support: Disabled
>> [0.015s][info][gc,init] Medium Page Size: 32M
>> [0.015s][info][gc,init] Workers: 20 parallel, 4 concurrent
>> Borrowing the structure from how UseLargePages are setup and
>> printed:
>> void ZLargePages::initialize_platform() {
>> if (UseLargePages) {
>> if (UseTransparentHugePages) {
>> _state = Transparent;
>> } else {
>> _state = Explicit;
>> }
>> } else {
>> _state = Disabled;
>> }
>> }
>> const char* ZLargePages::to_string() {
>> switch (_state) {
>> case Explicit:
>> return "Enabled (Explicit)";
>> case Transparent:
>> return "Enabled (Transparent)";
>> default:
>> return "Disabled";
>> }
>> }
>> Thanks,
>> StefanK
>> >
>> > Thanks a lot.
>> > Best regards,
>> > Jie
>> >
>> > On 2020/3/22, 4:26 PM, "Erik Österlund"
>> <erik.osterlund at oracle.com> wrote:
>> >
>> > Hi Jie,
>> >
>> > It seems to me that if the environment doesn’t supply the
>> required NUMA APIs, then we really should disable UseNUMA instead. I
>> propose we check the availability of the syscall during initialization
>> instead, and switch off all NUMA functionality when appropriate. And
>> we should only print a warning if the user explicitly supplied UseNUMA
>> on the command line.
>> >
>> > Thanks,
>> > /Erik
>> >
>> > > On 20 Mar 2020, at 13:15, jiefu(傅杰)
>> <jiefu at tencent.com> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > JBS: https://bugs.openjdk.java.net/browse/JDK-8241354
>> > > Webrev:
>> http://cr.openjdk.java.net/~jiefu/8241354/webrev.00/
>> > >
>> > > A VM fatal error may be observed if ZGC is used.
>> > >
>> > > The background is that some of our products will run in
>> the docker.
>> > > For some safety reason, SYS_get_mempolicy is not allowed
>> in the docker.
>> > >
>> > > It might be not a good practice to generate a fatal
>> error when get_mempolicy fails.
>> > > What do you think?
>> > >
>> > > Thanks a lot.
>> > > Best regards,
>> > > Jie
>> >
>> >
>> >
>> >
>>
>
More information about the hotspot-gc-dev
mailing list