RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3)

Tue Oct 8 04:13:00 UTC 2019

Hi Thomas,

Many thanks for this thorough review!

On 10/4/19 5:11 AM, Thomas Schatzl wrote:
> Hi Sangheon,
>
>   thanks for your  hard work on this!
>
> On 01.10.19 18:43, sangheon.kim at oracle.com wrote:
>> Hi Kim and others,
>>
>> This webrev.2 simplified a bit more after changing 'heap expansion' 
>> approach.
> [...]
>> webrev:
>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/
>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc
>> Testing: hs-tier1 ~ 5 +-UseNUMA
>>
>
> Comments:
>
> - os_solaris.cpp:2236: indentation addition
I don't see any changes at line 2236?

>
> - os_windows.cpp: os::get_address_id(): the "return 0" is in the same 
> line as the method declaration, while the change uses extra lines in 
> os_bsd. Please make this uniform.
Done.

>
> g1_globals.cpp: unnecessary whitespace  change
Done.

>
> - G1Allocator::unsafe_max_alloc(): I need to think some more if this 
> is correct - should that really be node specific? (and not the max of 
> all nodes).
>
> Otoh I think this is fine.
Yes, according to the comment at the first line at the method:
   // Return the remaining space in the cur alloc region, but not less than
   // the min TLAB size.

'cur alloc region' differs per numa node, so it reflects node index.

>
> - g1Allocator::used_in_alloc_regions(): not sure why the assert has 
> been removed
Probably removed during rebasing the patch.

>
> - os_linux.cpp: os::numa_get_address_id(): I think "id" should be an 
> "int", not uint32_t according to 
> http://man7.org/linux/man-pages/man2/get_mempolicy.2.html
>
> And I think you can initialize it with os::InvalidId;
All done, change to 'int' and 'InvalidId'.

>
> - in G1Allocator::current_node_index() retrieving the current node 
> index is part of the G1MemoryNodeManager; I would really prefer if 
> this were some property of a Thread. Not sure what others think.
>
> That value could be put into the G1ThreadLocalData.
>
> In any case, G1Allocator should probably cache the reference to the 
> G1MemoryNodeManager for faster access.
Added new member of G1MemoryNodeManager* at G1Allocator.

>
> - G1CollectedHeap::expand_single_region(): the log output in the first 
> line looks more like some debug code than generally interesting 
> information.
Removed the first line log.

>
> - G1CollectedHeap::expand_single_region(): pre-existing: it should add 
> to the in-safepoint expansion time like expand(); okay to just file a CR.
Filed
>
> - instead of the "late initialization method" set_page_size() I would 
> prefer to have this value passed in the constructor. It is not 
> required to me to have the create() call in the initialization list of 
> g1CollectedHeap at all costs... it could be put right after we 
> determine the page size in the body of the G1CollectedHeap constructor.
We do need G1MemoryNodeManager instance to get the number of active numa 
nodes when we construct G1Allocator. i.e. we create per numa node 
G1AllocRegion at G1Allocator.
And we also need page size at G1MemoryNodeManager after G1CollectedHeap 
is initialized.

I tried to add comment at G1NUMA::initialize().

>
> - g1CollectedHeap.hpp:940: no need to delete the newline.
Reverted the newline.

>
> - g1MemoryNodeManager.cpp:41: that comment does not add information imho
:)
Removed the comment.

>
> - G1NUMA::index_of_current_thread() needs a comment
Added:
// Returns numa index of current calling thread.

Do you have any suggestions?

I was thinking the method name is more than enough to explain itself. :)

>
> - G1NUMA::index_of_num_id/is_valid_numa_id/ should be private
Done.
Actually got same comment from Kim as well during private discussion.

>
> - not sure why G1NUMA::initialize()/set_numa() are needed. It's only 
> call is right after instantiating a G1NUMA instance
Probably you are pointing G1NUMA::initialize() and set_page_size()?
I tried to explain above why we need 2 calls.

>
> - G1NUMA::request_memory_on_node needs a comment.
Added:
   // Request the given range of memory to be located at a specific numa 
node.
   // But OS doesn't guarantee to reside on the node.
   // The numa node is decided by preferred_index_for_address().

>
> - I observed that a *lot* of G1NUMA methods are only used by 
> G1MemoryNodeManager; and G1MemoryNodeManager just forwards to G1NUMA a 
> lot. Maybe these two can be merged?
Done.
I agree since G1NUMA is getting smaller and smaller.
Since this is relatively large change, I had to revisit all addressed 
comments above. :)

>
> - G1NUMA::preferred_index_for_address/request_memory_on_node: I would 
> prefer if these methods were not hardcoded with HeapRegion metrics as 
> example.
>
> I.e. for preferred_index_for_address(), instead of the address it is 
> probably better to pass it the zero-based index directly, that is used 
> for calculating the node index. I.e. all callers know the HeapRegion's 
> index anyway *and* this would make the method independent of 
> G1CollectedHeap.
>
> I.e. something like preferred_node_index_for_index(<region-index>), 
> because then the same method can be reused for other data structures 
> than the heap/heap region.
Done. Removed all dependency with G1CollectedHeap and HeapRegion at 
G1MemoryNodeManager(previously G1NUMA).
Added 'size_t _region_size' at G1NUMA and then 
G1NUMA::preferred_node_index_for_index(uint heap_region_index).

> G1NUMA::request_memory_on_node() could also be moved to 
> G1PageBasedVirtualSpace, using the chunk sizes of page based virtual 
> space instead of hardcoding HeapRegion::GrainBytes (i.e. hardcode the 
> method to HeapRegion) - or pass in the "chunk size" calculated there 
> from G1PageBasedVirtualSize.
>
> I think this would increase the generality and usefulness of 
> G1NUMA/G1MemoryNodeManager a lot without "passing in too many node 
> indices everywhere".
>
> - G1PageBasedVirtualSpace: the _numa member seems to be used exactly 
> in one method where performance does not look critical. Maybe it is 
> better to reference it directly there via G1CollectedHeap.
With the above comment of 'G1NUMA::request_memory_on_node() to be moved 
to G1PageBasedVirtualSpace, I tried to change a bit.
_numa is removed from G1PageBasedVS.
The intent of new implementation is to avoid HeapRegion dependency as 
you mentioned.

>
> - heapRegionManager.cpp:verfiy_actual_node_index(): not sure if that 
> should be debug level.
Are you suggesting 'trace' level'?

>
> I would also kind of prefer a method that iterates over all regions 
> and prints a summary status (and potentially drop this per-region 
> checking at least when allocating a new free region). 
Probably I'm missing something but what I tried to mention during 
internal discussion was this. But someone told me adding different log 
level should be sufficient and I agree on that.

I don't have strong opinion verifying at HRM::allocate_free_region() so 
I will remove if there's no other opinion on this. Currently we can 
check node index when the region is being used on specific log level/tag.

Stefan (and probably you as well) mentioned verifying at safepoint, but 
it would be better to address at JDK-8220312 (3/3 which is part of this 
JEP) since I would like to go forward. :)

> It is sufficient to print a summary of expected/actual values, at most 
> a summary per node. I.e.
>
> "NUMA Node index verification: Nodes: X_0/Y_0 X_1/Y_1 ... X_N/Y_N 
> Unknown: Z Total: X/Y"
>
> where X(_i) is the number of matching indexes (for node index i) and 
> Y(_i) the number of expected (for node index i).
I'm okay with adding additional log which is simpler version than 
existing one.
I would say the new one at Debug level while existing one remains Trace 
level.
The benefit of printing current way is that we can see how heap region 
(and page) is consisted. So current jtreg test is utilizing it.

Do you have any recommendation for print timing for the new log?

>
> Also, the correct word to use here is "mismatch" not "different" (to 
> what?)
Changed to 'mismatch'.
The logging was printing both actual value and preferred value, so I'm 
thinking those 2 values are different. :)

>
> - in some discussion we talked about the "node_index" lifecycle, and 
> what I remember is the following:
>
>   - initially, when we commit/make the region available, we set that 
> HeapRegion's node_index to "Unknown" (with AlwaysPretouch on we can of 
> course immediately set the correct one).
>   - in HeapRegion::node_index() we do something like the following 
> pseudo-code:
>
>   {
>     if (_node_index == Unknown) {
>       // try to get actual node index from OS, and update _node_index 
> if we could get the information
>     }
>
>     if (_node_index == Unknown) { // Still unknown
>       // return _preferred_ node index *without* updating _node_index
>     }
>     return _node_index;
>   }
>
>   - now, during the "verification" pass, we use whether 
> HeapRegion::node_index() == preferred_node_index to determine if the 
> region is on the correct node.
>
> The change only sets the node index during making the region 
> available, and immediately to the preferred node index.
>
> I.e. we eventually end up with the actual node index reported by the 
> OS in HeapRegion::_node_index.
Above is what I tried to implement what we discussed internally. :)

I was aware about this bug that set_heapregion_node_index() is using 
verify_actual_node_index(). But forget fixing it before posting the 
webrev. :(

Changed like below:
static void set_heapregion_node_index(HeapRegion* hr) {
   uint node_index;
   if(AlwaysPreTouch) {
     // If we already pretouched, we can check actual node index here.
     node_index = 
G1MemoryNodeManager::mgr()->index_of_address(hr->bottom());
   } else {
     node_index = 
G1MemoryNodeManager::mgr()->preferred_node_index_for_index(hr->hrm_index());
   }
   hr->set_node_index(node_index);
}

BTW, setting node index at make_regions_available() is intentional since 
it is the only place HeapRegion is initialized so HeapRegion change is 
limited. Am I missing something?
I would prefer to remain HeapRegion::node_index() simple getter.
Another idea that I didn't choose is let HeapRegion::initialize() do the 
setting node index work and change HeapRegion::set_node_index() to 
clear_node_index(which sets to InvalidIndex).

>
>
> - for the expression "G1MemoryNodeManager::num_active_nodes() > 1" it 
> would be nice to have an extra method in G1MemoryNodeManager instead 
> of repeating it over and over.
Done.
FYI, currently there are 2 locations but following patches may use more.

>
> - heapRegionManager.cpp:print_node_id_of_regions: that method will 
> print a huge amount of lines. Better to print the summary I sketched 
> out above.
This is why I added at 'trace' level and it is okay to me.

>
> - in FreeRegionList::remove_region_with_node_index(), the maximum 
> search depth must take into account how many regions are there per page.
>
> Consider 1GB pages, 32M region size, meaning that we get 32 
> consecutive regions/page.
> Now with a node amount of 2, the maximum search depth will be 6 - 
> which is too low :)
> The intention is probably 3  * MAX(page_size / region size, 1) * 
> numa->num_active_numa_ids().
>
> I think it is useful to put that expresssion into 
> G1NUMA/G1NodeMemoryManager (or somewhere else appropriate - 
> HeapRegionManager?) to avoid that part having too much info about page 
> size.
Nice catch!
Added G1MemoryNodeManager::max_search_depth() which addresses your comment.

>
> - os.hpp: the new Enum values might or might need some description.
enum will be replaced with static const int as AnyId will be removed.

>
> Btw, there is no regression in performance from the .0/.1 versions of 
> this code in our benchmarks.
Great!
Many thanks for doing benchmark tests, Thomas!

Let me post next webrev after addressing all Stefan and Kim's comments 
as well.

Thanks,
Sangheon

>
> Thanks,
>   Thomas