UseNUMA membind Issue in openJDK

Tue Jun 12 21:56:20 UTC 2018

Hi Swati, Gustavo,

I’m not the best qualified to review the change – I just reported the issue as a JDK bug!

I’d be happy to test a fix but I’m having trouble following the patch. Did Gustavo post a patch to your patch, or is that a full independent patch?

Also, if you or Gustavo have permissions to post a webrev to http://cr.openjdk.java.net/ that would make reviewing a little easier. I’d be happy to post a webrev for you if not.

http://openjdk.java.net/guide/codeReview.html


  *   Derek


From: Swati Sharma [mailto:swatibits14 at gmail.com]
Sent: Monday, June 11, 2018 6:01 AM
To: Gustavo Romero <gromero at linux.vnet.ibm.com>
Cc: White, Derek <Derek.White at cavium.com>; hotspot-dev at openjdk.java.net; zgu at redhat.com; David Holmes <david.holmes at oracle.com>; Prakash.Raghavendra at amd.com; Prasad.Vishwanath at amd.com
Subject: Re: UseNUMA membind Issue in openJDK

Hi Gustavo,

May be you can remove the method "numa_bitmask_nbytes" as it's not getting used.
I am ok with the changes,If Derek confirms we can go ahead.

My name is there on the page "Swati Sharma - OpenJDK" , I have already signed the OCA on individual basis.

Thanks,
Swati

On Sat, Jun 9, 2018 at 5:06 AM, Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>> wrote:
Hi Swati,

Sorry, as usual I had to reserve a machine before trying it.

I wanted to test it against a POWER9 with a NVIDIA Tesla V100 device attached.

On such a machines numa nodes are quite sparse so I thought it would not be bad
to check against them:

available: 8 nodes (0,8,250-255)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 261693 MB
node 0 free: 233982 MB
node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 8 size: 261748 MB
node 8 free: 257078 MB
node 250 cpus:
node 250 size: 0 MB
node 250 free: 0 MB
node 251 cpus:
node 251 size: 0 MB
node 251 free: 0 MB
node 252 cpus:
node 252 size: 15360 MB
node 252 free: 15360 MB
node 253 cpus:
node 253 size: 0 MB
node 253 free: 0 MB
node 254 cpus:
node 254 size: 0 MB
node 254 free: 0 MB
node 255 cpus:
node 255 size: 15360 MB
node 255 free: 15360 MB
node distances:
node   0   8  250  251  252  253  254  255
  0:  10  40  80  80  80  80  80  80
  8:  40  10  80  80  80  80  80  80
 250:  80  80  10  80  80  80  80  80
 251:  80  80  80  10  80  80  80  80
 252:  80  80  80  80  10  80  80  80
 253:  80  80  80  80  80  10  80  80
 254:  80  80  80  80  80  80  10  80
 255:  80  80  80  80  80  80  80  10


Please, find my comments below, inlined.

On 06/01/2018 08:10 AM, Swati Sharma wrote:
I will fix the thread binding issue in a separate patch.

I would like to address it in this change. I think it's not good to leave such a
"dangling" behavior for the cpus once the memory bind issue is addressed.

I suggest the following simple check to fix it (in accordance to what we've
discussed previously, i.e. remap cpu/node considering configuration, bind, and
distance in  rebuild_cpu_to_node_map():

-    if (!isnode_in_configured_nodes(nindex_to_node()->at(i))) {
+    if (!isnode_in_configured_nodes(nindex_to_node()->at(i)) ||
+        !isnode_in_bound_nodes(nindex_to_node()->at(i))) {
       closest_distance = INT_MAX;
...
       for (size_t m = 0; m < node_num; m++) {
-        if (m != i && isnode_in_configured_nodes(nindex_to_node()->at(m))) {
+        if (m != i &&
+            isnode_in_configured_nodes(nindex_to_node()->at(m)) &&
+            isnode_in_bound_nodes(nindex_to_node()->at(m))) {

I tested it against the aforementioned topology and against the following one:

available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 0 size: 55685 MB
node 0 free: 53196 MB
node 1 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 1 size: 53961 MB
node 1 free: 49795 MB
node 2 cpus:
node 2 size: 21231 MB
node 2 free: 21171 MB
node 3 cpus:
node 3 size: 22492 MB
node 3 free: 22432 MB
node distances:
node   0   1   2   3
  0:  10  20  40  40
  1:  20  10  40  40
  2:  40  40  10  20
  3:  40  40  20  10


Updated the previous patch by removing the structure and using the methods
provided by numa API.Here is the updated one with the changes(attached also).

Thanks.

========================PATCH=========================

diff --git a/src/hotspot/os/linux/os_linux.cpp b/src/hotspot/os/linux/os_linux.cpp
--- a/src/hotspot/os/linux/os_linux.cpp
+++ b/src/hotspot/os/linux/os_linux.cpp

...
@@ -4962,8 +4972,9 @@
      if (!Linux::libnuma_init()) {
        UseNUMA = false;
      } else {
-      if ((Linux::numa_max_node() < 1)) {
-        // There's only one node(they start from 0), disable NUMA.
+      if ((Linux::numa_max_node() < 1) || Linux::isbound_to_single_node()) {
+        // If there's only one node(they start from 0) or if the process
                                      ^ let's fix this missing space

...

+  // Check if bound to only one numa node.
+  // Returns true if bound to a single numa node, otherwise returns false.
+  static bool isbound_to_single_node() {
+    int single_node = 0;
+    struct bitmask* bmp = NULL;
+    unsigned int node = 0;
+    unsigned int max_number_of_nodes = 0;
+    if (_numa_get_membind != NULL && _numa_bitmask_nbytes != NULL) {
+      bmp = _numa_get_membind();
+      max_number_of_nodes = _numa_bitmask_nbytes(bmp) * 8;
+    } else {
+      return false;
+    }
+    for (node = 0; node < max_number_of_nodes; node++) {
+       if (_numa_bitmask_isbitset(bmp, node)) {
+         single_node++;
+         if (single_node == 2) {
+           return false;
+         }
+       }
+    }
+    if (single_node == 1) {
+      return true;
+    } else {
+      return false;
+    }
+  }

Now that numa_bitmask_isbitset() is being used (instead of the previous  version
that iterated through an array of longs, I suggest to tweak it a bit, removing
the if (single_node == 2) check.

I don't think removing it will hurt. In fact, numa_bitmask_nbytes() returns the
total amount of bytes the bitmask can hold. However the total number of nodes in
the system is usually much smaller than numa_bitmask_nbytes() * 8.

So for a x86_64 system like that with only 2 numa nodes:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 131018 MB
node 0 free: 101646 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 98304 MB
node 1 free: 91692 MB
node distances:
node   0   1
  0:  10  11
  1:  11  10

numa_bitmask_nbytes(): 64 =>  max_number_of_node = 512
numa_max_node(): 1 => 1 + 1 iterations

and the value returned by numa_bitmask_nbytes() does not change for different
bind configurations. It's fixed. Another example is that on Power with 4 numa
nodes:

available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 130722 MB
node 0 free: 71930 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 0 MB
node 1 free: 0 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 130599 MB
node 16 free: 75934 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 0 MB
node 17 free: 0 MB
node distances:
node   0   1  16  17
  0:  10  20  40  40
  1:  20  10  40  40
 16:  40  40  10  20
 17:  40  40  20  10

numa_bitmask_nbytes(): 32 => max_number_of_node = 256
numa_max_node(): 17 => 17 + 1 iterations

So I understand it's better to set the iteration over numa_max_node() instead of
numa_bitmask_nbytes(). Even more for Intel (with contiguous nodes) than for
Power.

For the POWER9 with NVIDIA Tesla it would be a worst case: only 8 numa nodes but
numa_max_node is 255! But I understand it's a very rare case and I'm fine with
that.

So what about:

+    if (_numa_get_membind != NULL && _numa_max_node != NULL) {
+      bmp = _numa_get_membind();
+      highest_node_number = _numa_max_node();
+    } else {
+      return false;
+    }
+
+    for (node = 0; node <= highest_node_number; node++) {
+      if (_numa_bitmask_isbitset(bmp, node)) {
+        nodes++;
+      }
+    }
+
+    if (nodes == 1) {
+      return true;
+    } else {
+      return false;
+    }

For convenience, I hosted a patch with all the changes above here:
http://cr.openjdk.java.net/~gromero/8189922/draft/usenuma_v4.patch

@Derek, could you please confirm that this change solves JDK-8189922?

Swati, if Derek confirms it solves JDK-8189922? and you confirm it's fine for
you I'll consider it's reviewed from my side and I can host that change for you
so you can start a formal request for approval (remember I'm not a Reviewer, so
you still need two additional reviews for the change).

Finally, as a heads up, I could not find you (nor AMD?) in the OCA:

http://www.oracle.com/technetwork/community/oca-486395.html#a

If I'm not mistaken, you (individually) or AMD must sign it before contributing
to OpenJDK.


Best regards,
Gustavo

=======================================================

Swati




On Tue, May 29, 2018 at 6:53 PM, Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com> <mailto:gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>> wrote:
 >
 > Hi Swati,
 >
 > On 05/29/2018 06:12 AM, Swati Sharma wrote:
 >>
 >> I have incorporated some changes suggested by you.
 >>
 >> The use of struct bitmask's  maskp for checking 64 bit in single iteration
 >> is more optimized compared to numa_bitmask_isbitset()  as by using this we
 >> need to check each bit for  1024 times(SUSE case) and 64 times(Ubuntu Case).
 >> If its fine to iterate at initialization time then I can change.
 >
 >
 > Yes, I know, your version is more optimized. libnuma API should provide a
 > ready-made solution for that... but that's another story. I'm curious to know
 > what the time difference is on the worst case for both ways tho. Anyway, I
 > just would like to point out that, regardless performance, it's possible to
 > achieve the same result with current libnuma API.
 >
 >
 >> For the answer to your question:
 >> If it picks up node 16, not so bad, but what if it picks up node 0 or 1?
 >> It can be checked based on numa_distance instead of picking up the lgrps randomly.
 >
 >
 > That seems a good solution. You can do the checking very early, so
 > lgrp_spaces()->find() does not even fail (return -1), i.e. by changing the CPU to
 > node mapping on initialization (avoiding to change cas_allocate()). On that checking
 > both numa distance and if the node is bound (or not) would be considered to generate
 > the map.
 >
 >
 > Best regards,
 > Gustavo
 >
 >> Thanks,
 >> Swati
 >>
 >>
 >>
 >> On Fri, May 25, 2018 at 4:54 AM, Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com> <mailto:gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>> <mailto:gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com> <mailto:gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>>> wrote:
 >>
 >>     Hi Swati,
 >>
 >>
 >>     Thanks for CC:ing me. Sorry for the delay replying it, I had to reserve a few
 >>     specific machines before trying your patch :-)
 >>
 >>     I think that UseNUMA's original task was to figure out the best binding
 >>     setup for the JVM automatically but I understand that it also has to be aware
 >>     that sometimes, for some (new) particular reasons, its binding task is
 >>     "modulated" by other external agents. Thanks for proposing a fix.
 >>
 >>     I have just a question/concern on the proposal: how the JVM should behave if
 >>     CPUs are not bound in accordance to the bound memory nodes? For instance, what
 >>     happens if no '--cpunodebind' is passed and '--membind=0,1,16' is passed at
 >>     the same time on this numa topology:
 >>
 >>     brianh at p215n12:~$ numactl -H
 >>     available: 4 nodes (0-1,16-17)
 >>     node 0 cpus: 0 1 2 3 8 9 10 11 16 17 18 19 24 25 26 27 32 33 34 35
 >>     node 0 size: 65342 MB
 >>     node 0 free: 56902 MB
 >>     node 1 cpus: 40 41 42 43 48 49 50 51 56 57 58 59 64 65 66 67 72 73 74 75
 >>     node 1 size: 65447 MB
 >>     node 1 free: 58322 MB
 >>     node 16 cpus: 80 81 82 83 88 89 90 91 96 97 98 99 104 105 106 107 112 113 114 115
 >>     node 16 size: 65448 MB
 >>     node 16 free: 63096 MB
 >>     node 17 cpus: 120 121 122 123 128 129 130 131 136 137 138 139 144 145 146 147 152 153 154 155
 >>     node 17 size: 65175 MB
 >>     node 17 free: 61522 MB
 >>     node distances:
 >>     node   0   1  16  17
 >>        0:  10  20  40  40
 >>        1:  20  10  40  40
 >>       16:  40  40  10  20
 >>       17:  40  40  20  10
 >>
 >>
 >>     In that case JVM will spawn threads that will run on all CPUs, including those
 >>     CPUs in numa node 17. Then once in
 >>     src/hotspot/share/gc/parallel/mutableNUMASpace.cpp, in cas_allocate():
 >>
 >>       834 // This version is lock-free.
 >>       835 HeapWord* MutableNUMASpace::cas_allocate(size_t size) {
 >>       836   Thread* thr = Thread::current();
 >>       837   int lgrp_id = thr->lgrp_id();
 >>       838   if (lgrp_id == -1 || !os::numa_has_group_homing()) {
 >>       839     lgrp_id = os::numa_get_group_id();
 >>       840     thr->set_lgrp_id(lgrp_id);
 >>       841   }
 >>
 >>     a newly created thread will try to be mapped to a numa node given your CPU ID.
 >>     So if that CPU is in numa node 17 it will then not find it in:
 >>
 >>       843   int i = lgrp_spaces()->find(&lgrp_id, LGRPSpace::equals);
 >>
 >>     and will fallback to a random map, picking up a random numa node among nodes
 >>     0, 1, and 16:
 >>
 >>       846   if (i == -1) {
 >>       847     i = os::random() % lgrp_spaces()->length();
 >>       848   }
 >>
 >>     If it picks up node 16, not so bad, but what if it picks up node 0 or 1?
 >>
 >>     I see that if one binds mem but leaves CPU unbound one has to know exactly what
 >>     she/he is doing, because it can be likely suboptimal. On the other hand, letting
 >>     the node being picked up randomly when there are memory nodes bound but no CPUs
 >>     seems even more suboptimal in some scenarios. Thus, should the JVM deal with it?
 >>
 >>     @Zhengyu, do you have any opinion on that?
 >>
 >>     Please find a few nits / comments inline.
 >>
 >>     Note that I'm not a (R)eviewer so you still need two official reviews.
 >>
 >>
 >>     Best regards,
 >>     Gustavo
 >>
 >>     On 05/21/2018 01:44 PM, Swati Sharma wrote:
 >>
 >>         ======================PATCH==============================
 >>         diff --git a/src/hotspot/os/linux/os_linux.cpp b/src/hotspot/os/linux/os_linux.cpp
 >>         --- a/src/hotspot/os/linux/os_linux.cpp
 >>         +++ b/src/hotspot/os/linux/os_linux.cpp
 >>         @@ -2832,14 +2832,42 @@
 >>              // Map all node ids in which is possible to allocate memory. Also nodes are
 >>              // not always consecutively available, i.e. available from 0 to the highest
 >>              // node number.
 >>         +  // If the nodes have been bound explicitly using numactl membind, then
 >>         +  // allocate memory from those nodes only.
 >>
 >>
 >>     I think ok to place that comment on the same existing line, like:
 >>
 >>     -  // node number.
 >>     +  // node number. If the nodes have been bound explicitly using numactl membind,
 >>     +  // then allocate memory from these nodes only.
 >>
 >>
 >>              for (size_t node = 0; node <= highest_node_number; node++) {
 >>         -    if (Linux::isnode_in_configured_nodes(node)) {
 >>         +    if (Linux::isnode_in_bounded_nodes(node)) {
 >>
 >>     ---------------------------------^ s/bounded/bound/
 >>
 >>
 >>                  ids[i++] = node;
 >>                }
 >>              }
 >>              return i;
 >>            }
 >>         +extern "C"  struct bitmask {
 >>         +  unsigned long size; /* number of bits in the map */
 >>         +  unsigned long *maskp;
 >>         +};
 >>
 >>
 >>     I think it's possible to move the function below to os_linux.hpp with its
 >>     friends and cope with the forward declaration of 'struct bitmask*` by using the
 >>     functions from numa API, notably numa_bitmask_nbytes() and
 >>     numa_bitmask_isbitset() only,  avoiding the member dereferecing issue and the
 >>     need to add the above struct explicitly.
 >>
 >>
 >>         +// Check if single memory node bound.
 >>         +// Returns true if single memory node bound.
 >>
 >>
 >>     I suggest a minuscule improvement, something like:
 >>
 >>     +// Check if bound to only one numa node.
 >>     +// Returns true if bound to a single numa node, otherwise returns false.
 >>
 >>
 >>         +bool os::Linux::issingle_node_bound() {
 >>
 >>
 >>     What about s/issingle_node_bound/isbound_to_single_node/ ?
 >>
 >>
 >>         +  struct bitmask* bmp = _numa_get_membind != NULL ? _numa_get_membind() : NULL;
 >>         +  if(!(bmp != NULL && bmp->maskp != NULL)) return false;
 >>
 >>                                -----^
 >>     Are you sure this checking is necessary? I think if numa_get_membind succeed
 >>     bmp->maskp is always != NULL.
 >>
 >>     Indentation here is odd. No space before 'if' and return on the same line.
 >>
 >>     I would try to avoid lines over 80 chars.
 >>
 >>
 >>         +  int issingle = 0;
 >>         +  // System can have more than 64 nodes so check in all the elements of
 >>         +  // unsigned long array
 >>         +  for (unsigned long i = 0; i < (bmp->size / (8 * sizeof(unsigned long))); i++) {
 >>         +    if (bmp->maskp[i] == 0) {
 >>         +      continue;
 >>         +    } else if ((bmp->maskp[i] & (bmp->maskp[i] - 1)) == 0) {
 >>         +      issingle++;
 >>         +    } else {
 >>         +      return false;
 >>         +    }
 >>         +  }
 >>         +  if (issingle == 1)
 >>         +    return true;
 >>         +  return false;
 >>         +}
 >>         +
 >>
 >>
 >>     As I mentioned, I think it could be moved to os_linux.hpp instead. Also, it
 >>     could be something like:
 >>
 >>     +bool os::Linux::isbound_to_single_node(void) {
 >>     +  struct bitmask* bmp;
 >>     +  unsigned long mask; // a mask element in the mask array
 >>     +  unsigned long max_num_masks;
 >>     +  int single_node = 0;
 >>     +
 >>     +  if (_numa_get_membind != NULL) {
 >>     +    bmp = _numa_get_membind();
 >>     +  } else {
 >>     +    return false;
 >>     +  }
 >>     +
 >>     +  max_num_masks = bmp->size / (8 * sizeof(unsigned long));
 >>     +
 >>     +  for (mask = 0; mask < max_num_masks; mask++) {
 >>     +    if (bmp->maskp[mask] != 0) { // at least one numa node in the mask
 >>     +      if (bmp->maskp[mask] & (bmp->maskp[mask] - 1) == 0) {
 >>     +        single_node++; // a single numa node in the mask
 >>     +      } else {
 >>     +        return false;
 >>     +      }
 >>     +    }
 >>     +  }
 >>     +
 >>     +  if (single_node == 1) {
 >>     +    return true; // only a single mask with a single numa node
 >>     +  } else {
 >>     +    return false;
 >>     +  }
 >>     +}
 >>
 >>
 >>            bool os::get_page_info(char *start, page_info* info) {
 >>              return false;
 >>            }
 >>         @@ -2930,6 +2958,10 @@
 >>                                                           libnuma_dlsym(handle, "numa_bitmask_isbitset")));
 >>                  set_numa_distance(CAST_TO_FN_PTR(numa_distance_func_t,
 >>                                                   libnuma_dlsym(handle, "numa_distance")));
 >>         +      set_numa_set_membind(CAST_TO_FN_PTR(numa_set_membind_func_t,
 >>         +                                          libnuma_dlsym(handle, "numa_set_membind")));
 >>         +      set_numa_get_membind(CAST_TO_FN_PTR(numa_get_membind_func_t,
 >>         +                                          libnuma_v2_dlsym(handle, "numa_get_membind")));
 >>                  if (numa_available() != -1) {
 >>                    set_numa_all_nodes((unsigned long*)libnuma_dlsym(handle, "numa_all_nodes"));
 >>         @@ -3054,6 +3086,8 @@
 >>            os::Linux::numa_set_bind_policy_func_t os::Linux::_numa_set_bind_policy;
 >>            os::Linux::numa_bitmask_isbitset_func_t os::Linux::_numa_bitmask_isbitset;
 >>            os::Linux::numa_distance_func_t os::Linux::_numa_distance;
 >>         +os::Linux::numa_set_membind_func_t os::Linux::_numa_set_membind;
 >>         +os::Linux::numa_get_membind_func_t os::Linux::_numa_get_membind;
 >>            unsigned long* os::Linux::_numa_all_nodes;
 >>            struct bitmask* os::Linux::_numa_all_nodes_ptr;
 >>            struct bitmask* os::Linux::_numa_nodes_ptr;
 >>         @@ -4962,8 +4996,9 @@
 >>                if (!Linux::libnuma_init()) {
 >>                  UseNUMA = false;
 >>                } else {
 >>         -      if ((Linux::numa_max_node() < 1)) {
 >>         -        // There's only one node(they start from 0), disable NUMA.
 >>         +      if ((Linux::numa_max_node() < 1) || Linux::issingle_node_bound()) {
 >>         +        // If there's only one node(they start from 0) or if the process
 >>         +        // is bound explicitly to a single node using membind, disable NUMA.
 >>                    UseNUMA = false;
 >>                  }
 >>                }
 >>         diff --git a/src/hotspot/os/linux/os_linux.hpp b/src/hotspot/os/linux/os_linux.hpp
 >>         --- a/src/hotspot/os/linux/os_linux.hpp
 >>         +++ b/src/hotspot/os/linux/os_linux.hpp
 >>         @@ -228,6 +228,8 @@
 >>              typedef int (*numa_tonode_memory_func_t)(void *start, size_t size, int node);
 >>              typedef void (*numa_interleave_memory_func_t)(void *start, size_t size, unsigned long *nodemask);
 >>              typedef void (*numa_interleave_memory_v2_func_t)(void *start, size_t size, struct bitmask* mask);
 >>         +  typedef void (*numa_set_membind_func_t)(struct bitmask *mask);
 >>         +  typedef struct bitmask* (*numa_get_membind_func_t)(void);
 >>              typedef void (*numa_set_bind_policy_func_t)(int policy);
 >>              typedef int (*numa_bitmask_isbitset_func_t)(struct bitmask *bmp, unsigned int n);
 >>         @@ -244,6 +246,8 @@
 >>              static numa_set_bind_policy_func_t _numa_set_bind_policy;
 >>              static numa_bitmask_isbitset_func_t _numa_bitmask_isbitset;
 >>              static numa_distance_func_t _numa_distance;
 >>         +  static numa_set_membind_func_t _numa_set_membind;
 >>         +  static numa_get_membind_func_t _numa_get_membind;
 >>              static unsigned long* _numa_all_nodes;
 >>              static struct bitmask* _numa_all_nodes_ptr;
 >>              static struct bitmask* _numa_nodes_ptr;
 >>         @@ -259,6 +263,8 @@
 >>              static void set_numa_set_bind_policy(numa_set_bind_policy_func_t func) { _numa_set_bind_policy = func; }
 >>              static void set_numa_bitmask_isbitset(numa_bitmask_isbitset_func_t func) { _numa_bitmask_isbitset = func; }
 >>              static void set_numa_distance(numa_distance_func_t func) { _numa_distance = func; }
 >>         +  static void set_numa_set_membind(numa_set_membind_func_t func) { _numa_set_membind = func; }
 >>         +  static void set_numa_get_membind(numa_get_membind_func_t func) { _numa_get_membind = func; }
 >>              static void set_numa_all_nodes(unsigned long* ptr) { _numa_all_nodes = ptr; }
 >>              static void set_numa_all_nodes_ptr(struct bitmask **ptr) { _numa_all_nodes_ptr = (ptr == NULL ? NULL : *ptr); }
 >>              static void set_numa_nodes_ptr(struct bitmask **ptr) { _numa_nodes_ptr = (ptr == NULL ? NULL : *ptr); }
 >>         @@ -320,6 +326,15 @@
 >>                } else
 >>                  return 0;
 >>              }
 >>         +  // Check if node in bounded nodes
 >>
 >>
 >>     +  // Check if node is in bound node set. Maybe?
 >>
 >>
 >>         +  static bool isnode_in_bounded_nodes(int node) {
 >>         +    struct bitmask* bmp = _numa_get_membind != NULL ? _numa_get_membind() : NULL;
 >>         +    if (bmp != NULL && _numa_bitmask_isbitset != NULL && _numa_bitmask_isbitset(bmp, node)) {
 >>         +      return true;
 >>         +    } else
 >>         +      return false;
 >>         +  }
 >>         +  static bool issingle_node_bound();
 >>
 >>
 >>     Looks like it can be re-written like:
 >>
 >>     +  static bool isnode_in_bound_nodes(int node) {
 >>     +    if (_numa_get_membind != NULL && _numa_bitmask_isbitset != NULL) {
 >>     +      return _numa_bitmask_isbitset(_numa_get_membind(), node);
 >>     +    } else {
 >>     +      return false;
 >>     +    }
 >>     +  }
 >>
 >>     ?
 >>
 >>
 >>            };
 >>            #endif // OS_LINUX_VM_OS_LINUX_HPP
 >>
 >>
 >>
 >