RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3)

sangheon.kim at oracle.com sangheon.kim at oracle.com
Sat Nov 2 06:08:15 UTC 2019


Hi Stefan,

On 10/31/19 8:31 AM, Stefan Johansson wrote:
> Hi Sangheon,
>
> On 2019-10-23 08:39, sangheon.kim at oracle.com wrote:
>> Hi Thomas,
>>
>> I am posting the next webrev as Kim is waiting it.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.3
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.3.inc
>
> Here are my comments:
> src/hotspot/share/gc/g1/g1CollectedHeap.hpp
> ---
> 2397     st->print("  remaining free region(s) from each node id: ");
>
> What do you think about changing this to "... region(s) on each NUMA 
> node: "? I think we should be clear about the logging being for NUMA.
Changed as you suggested.

> ---
>
> src/hotspot/share/gc/g1/g1EdenRegions.hpp
> ---
> 33 class G1EdenRegions : public G1RegionCounts {
>
> I don’t think G1EdenRegions is a G1RegionCounts but rather it should 
> have one. So instead of using inheritance here I think G1EdenRegions 
> should have a G1RegionsCount. Instead of overloading length I would 
> then suggest adding a region_count(uint node_index) to get the count.
>
> Same goes for G1SurvivorRegions.
Changed as you suggested for both G1EdenRegions and G1SurvivorRegions.

> ---
>
> src/hotspot/share/gc/g1/g1NUMA.cpp
> ---
>  279 bool NodeIndexCheckClosure::do_heap_region(HeapRegion* hr) {
>  280   uint preferred_node_index = 
> _numa->preferred_node_index_for_index(hr->hrm_index());
>  281   uint active_node_index = _numa->index_of_address(hr->bottom());
>  282
>  283   if (preferred_node_index == active_node_index) {
>  284     _matched[preferred_node_index]++;
>  285   } else if (active_node_index == G1NUMA::UnknownNodeIndex) {
>  286     _unknown++;
>  287   }
>  288   _total++;
>  289
>  290   return false;
>  291 }
>
> As we discussed offline, I would like to know the mismatches as well, 
> I think the easiest approach would be to make the total count per node 
> as well and that way we can see if there were any regions that didn't 
> match. What do you think about printing the info like this:
> [3,009s][trace][gc,heap,numa ] GC(6) NUMA region verification 
> (actual/expected): 0: 1024/1024, 1: 270/1024, Unknown: 0
Changed as you suggested to have per node but deleted unknown as unknown 
is 'total - matched'.

>
> When testing this I also realized this output is problematic in the 
> case where we have committed regions that have not yet been used. 
> Reading the manual for get_mempolicy (the way we get the numa id for 
> the address) say:
> "If no page has yet been allocated for the specified address, 
> get_mempolicy() will allocate a page as if the thread had performed a 
> read (load) access to that address, and return the ID of the node 
> where that page was allocated."
Nice catch.

Shortly saying get_mempolicy() doesn't honor os::numa_make_local() call 
that we previously requested. And this problem occurs when 
AlwaysPreTouch is disabled.

However, my initial implementation which uses 'numa_move_pages()' 
doesn't have this problem. So one fundamental solution would be 
replacing linux implementation.
In current scope of 3 patches, there will be no problem if we add 
'hr->free && !AlwaysPreTouch' condition check however 
os::numa_get_group_id_for_address() will still have such limitation.

What do you think about changing the Linux implementation?
(webrev link is added at the end)

>
> Doing a read access seem to always get a page on NUMA node 0, so the 
> accounting will not be correct in this case.
Right, to be clear the calling process's NUMA id.

>
> One way to fix this would be to only do accounting for regions 
> currently used (!hr->is_free()) but I'm not sure that is exactly what 
> we want, at least not if we only do this after the GC, then only the 
> survivors and old will be checked. We could solve this by also do 
> verification before the GC. I think this might be the way to go, what 
> do you think? If my proposal was hard to follow, here's a patch:
> http://cr.openjdk.java.net/~sjohanss/numa/verify-alternative/
>
> The output from this patch would be:
> 9,233s][trace][gc,heap,numa   ] GC(18) GC Start: NUMA region 
> verification (actual/expected): 0: 358/358, 1: 361/361, Unknown: 0
> [9,306s][trace][gc,heap,numa   ] GC(18) GC End: NUMA region 
> verification (actual/expected): 0: 348/348, 1: 347/347, Unknown: 0
>
> One can also see that this verification takes some time, so maybe it 
> would make sense to have this logging under gc+numa+verify.
I think if we avoid calling G1NUMAA:index_of_address() if 'hr->free() && 
!AlwaysPreTouch'  but count total, we will be fine.
Basically I imported your patch but the verification is only happening 
at the beginning of GC.

Or with 'numa_move_pages()', we don't need such condition check.

> ---
>
>  234   uint converted_req_index = requested_node_index;
>  235   if(converted_req_index == AnyNodeIndex) {
>  236     converted_req_index = _num_active_node_ids;
>  237   }
>  238   if (converted_req_index <= _num_active_node_ids) {
>  239     _times->update(phase, converted_req_index, 
> allocated_node_index);
>  240   }
>
> I had to read this more than once to understand what it really did and 
> I think we can simplify it a bit, by just doing an if-else that checks 
> for AnyNodeIndex and if so passes in _num_active_node_ids to update(). 
> This should be ok since requested_node_index never can be larger than 
> _num_active_node_ids.
Tried to reflect your comment. :)

> ---
>
> src/hotspot/share/gc/g1/g1ParScanThreadState.cpp
> ---
> I would prefer if we hide all the accounting in helper functions, but 
> it might be good to declare them to be inlined.
>
>   85   if (_numa->is_enabled()) {
>   86     LogTarget(Info, gc, heap, numa) lt;
>   87
>   88     if (lt.is_enabled()) {
>   89       uint num_nodes = _numa->num_active_nodes();
>   90       // Record only if there are multiple active nodes.
>   91       _obj_alloc_stat = NEW_C_HEAP_ARRAY(size_t, num_nodes, mtGC);
>   92       memset((void*)_obj_alloc_stat, 0, sizeof(size_t) * num_nodes);
>   93     }
>   94   }
>
> Move to something like initialize_numa_stats().
>
>  108   if (_obj_alloc_stat != NULL) {
>  109     uint node_index = _numa->index_of_current_thread();
>  110 _numa->copy_statistics(G1NodeTimes::LocalObjProcessAtCopyToSurv, 
> node_index, _obj_alloc_stat);
>  111   }
>
> This could be called flush_numa_stats().
>
>  268     if (_obj_alloc_stat != NULL) {
>  269       _obj_alloc_stat[node_index]++;
>  270     }
>
> And this something like update_numa_stats(uint).
All changed with suggested helper functions.

> -- 
>
> heapRegionSet.hpp
> ---
> 159   inline void update_length(HeapRegion* hr, bool increase);
> 254   inline void update_length(HeapRegion* hr, bool increase);
>
> Is there any reason for having update_length that takes a bool rather 
> than having one function for increments and one for decrements? To me 
> it looks like all uses are pretty well defined and it would make the 
> code easier to read. I also think we could pass in the node index 
> rather than the HeapRegion since the getter lenght() does this.
Done.

> ---
>
> src/hotspot/share/gc/g1/g1NodeTimes.cpp
> ---
> First, a question about the names, G1NodeTimes signals that it has to 
> do with timing, but currently we don't really record any timings. Same 
> thing with NodeStatPhases, not really the same type of phases that we 
> have for the rest of the GC logging. What do you think about renaming 
> the class to G1NUMAStats and the enum to NodeDataItems?
Changed both class and enum names as you suggested.

>
>  166 void G1NodeTimes::print_phase_info(G1NodeTimes::NodeStatPhases 
> phase) {
>  167   LogTarget(Info, gc, heap, numa) lt;
>
> I think this should be on debug level, but if you don't agree leave it 
> as is.
I feel Info seems okay, so let me leave as is.

> ---
>
>  191 void G1NodeTimes::print_mutator_alloc_stat_debug() {
>  192   LogTarget(Debug, gc, heap, numa) lt;
>
> And if you agree on moving the above to debug I think this should be 
> on trace level.
As is, please.

Here's the webrev:
http://cr.openjdk.java.net/~sangheki/8220312/webrev.4
http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc
http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc.numa_move_pages

Testing: hs-tier 1 ~ 5 with / without UseNUMA.

Thanks,
Sangheon


> ---
>
> This is it for now. Thanks,
> Stefan
>
>
>> Testing: hs-tier 1 ~ 4 with/without UseNUMA. hs-tier5 is almost 
>> finished without new failures.
>>
>> Thanks,
>> Sangheon
>>
>>




More information about the hotspot-gc-dev mailing list