RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3)

Fri Oct 4 12:11:42 UTC 2019

Hi Sangheon,

   thanks for your  hard work on this!

On 01.10.19 18:43, sangheon.kim at oracle.com wrote:
> Hi Kim and others,
> 
> This webrev.2 simplified a bit more after changing 'heap expansion' 
> approach.
[...]
> webrev:
> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/
> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc
> Testing: hs-tier1 ~ 5 +-UseNUMA
> 

Comments:

- os_solaris.cpp:2236: indentation addition

- os_windows.cpp: os::get_address_id(): the "return 0" is in the same 
line as the method declaration, while the change uses extra lines in 
os_bsd. Please make this uniform.

g1_globals.cpp: unnecessary whitespace  change

- G1Allocator::unsafe_max_alloc(): I need to think some more if this is 
correct - should that really be node specific? (and not the max of all 
nodes).

Otoh I think this is fine.

- g1Allocator::used_in_alloc_regions(): not sure why the assert has been 
removed

- os_linux.cpp: os::numa_get_address_id(): I think "id" should be an 
"int", not uint32_t according to 
http://man7.org/linux/man-pages/man2/get_mempolicy.2.html

And I think you can initialize it with os::InvalidId;

- in G1Allocator::current_node_index() retrieving the current node index 
is part of the G1MemoryNodeManager; I would really prefer if this were 
some property of a Thread. Not sure what others think.

That value could be put into the G1ThreadLocalData.

In any case, G1Allocator should probably cache the reference to the 
G1MemoryNodeManager for faster access.

- G1CollectedHeap::expand_single_region(): the log output in the first 
line looks more like some debug code than generally interesting information.

- G1CollectedHeap::expand_single_region(): pre-existing: it should add 
to the in-safepoint expansion time like expand(); okay to just file a CR.

- instead of the "late initialization method" set_page_size() I would 
prefer to have this value passed in the constructor. It is not required 
to me to have the create() call in the initialization list of 
g1CollectedHeap at all costs... it could be put right after we determine 
the page size in the body of the G1CollectedHeap constructor.

- g1CollectedHeap.hpp:940: no need to delete the newline.

- g1MemoryNodeManager.cpp:41: that comment does not add information imho

- G1NUMA::index_of_current_thread() needs a comment

- G1NUMA::index_of_num_id/is_valid_numa_id/ should be private

- not sure why G1NUMA::initialize()/set_numa() are needed. It's only 
call is right after instantiating a G1NUMA instance

- G1NUMA::request_memory_on_node needs a comment.

- I observed that a *lot* of G1NUMA methods are only used by 
G1MemoryNodeManager; and G1MemoryNodeManager just forwards to G1NUMA a 
lot. Maybe these two can be merged?

- G1NUMA::preferred_index_for_address/request_memory_on_node: I would 
prefer if these methods were not hardcoded with HeapRegion metrics as 
example.

I.e. for preferred_index_for_address(), instead of the address it is 
probably better to pass it the zero-based index directly, that is used 
for calculating the node index. I.e. all callers know the HeapRegion's 
index anyway *and* this would make the method independent of 
G1CollectedHeap.

I.e. something like preferred_node_index_for_index(<region-index>), 
because then the same method can be reused for other data structures 
than the heap/heap region.

G1NUMA::request_memory_on_node() could also be moved to 
G1PageBasedVirtualSpace, using the chunk sizes of page based virtual 
space instead of hardcoding HeapRegion::GrainBytes (i.e. hardcode the 
method to HeapRegion) - or pass in the "chunk size" calculated there 
from G1PageBasedVirtualSize.

I think this would increase the generality and usefulness of 
G1NUMA/G1MemoryNodeManager a lot without "passing in too many node 
indices everywhere".

- G1PageBasedVirtualSpace: the _numa member seems to be used exactly in 
one method where performance does not look critical. Maybe it is better 
to reference it directly there via G1CollectedHeap.

- heapRegionManager.cpp:verfiy_actual_node_index(): not sure if that 
should be debug level.

I would also kind of prefer a method that iterates over all regions and 
prints a summary status (and potentially drop this per-region checking 
at least when allocating a new free region). It is sufficient to print a 
summary of expected/actual values, at most a summary per node. I.e.

"NUMA Node index verification: Nodes: X_0/Y_0 X_1/Y_1 ... X_N/Y_N 
Unknown: Z Total: X/Y"

where X(_i) is the number of matching indexes (for node index i) and 
Y(_i) the number of expected (for node index i).

Also, the correct word to use here is "mismatch" not "different" (to what?)

- in some discussion we talked about the "node_index" lifecycle, and 
what I remember is the following:

   - initially, when we commit/make the region available, we set that 
HeapRegion's node_index to "Unknown" (with AlwaysPretouch on we can of 
course immediately set the correct one).
   - in HeapRegion::node_index() we do something like the following 
pseudo-code:

   {
     if (_node_index == Unknown) {
       // try to get actual node index from OS, and update _node_index 
if we could get the information
     }

     if (_node_index == Unknown) { // Still unknown
       // return _preferred_ node index *without* updating _node_index
     }
     return _node_index;
   }

   - now, during the "verification" pass, we use whether 
HeapRegion::node_index() == preferred_node_index to determine if the 
region is on the correct node.

The change only sets the node index during making the region available, 
and immediately to the preferred node index.

I.e. we eventually end up with the actual node index reported by the OS 
in HeapRegion::_node_index.

- for the expression "G1MemoryNodeManager::num_active_nodes() > 1" it 
would be nice to have an extra method in G1MemoryNodeManager instead of 
repeating it over and over.

- heapRegionManager.cpp:print_node_id_of_regions: that method will print 
a huge amount of lines. Better to print the summary I sketched out above.

- in FreeRegionList::remove_region_with_node_index(), the maximum search 
depth must take into account how many regions are there per page.

Consider 1GB pages, 32M region size, meaning that we get 32 consecutive 
regions/page.
Now with a node amount of 2, the maximum search depth will be 6 - which 
is too low :)
The intention is probably 3  * MAX(page_size / region size, 1) * 
numa->num_active_numa_ids().

I think it is useful to put that expresssion into 
G1NUMA/G1NodeMemoryManager (or somewhere else appropriate - 
HeapRegionManager?) to avoid that part having too much info about page size.

- os.hpp: the new Enum values might or might need some description.

Btw, there is no regression in performance from the .0/.1 versions of 
this code in our benchmarks.

Thanks,
   Thomas