RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3)
sangheon.kim at oracle.com
sangheon.kim at oracle.com
Sat Nov 2 06:08:15 UTC 2019
Hi Stefan,
On 10/31/19 8:31 AM, Stefan Johansson wrote:
> Hi Sangheon,
>
> On 2019-10-23 08:39, sangheon.kim at oracle.com wrote:
>> Hi Thomas,
>>
>> I am posting the next webrev as Kim is waiting it.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.3
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.3.inc
>
> Here are my comments:
> src/hotspot/share/gc/g1/g1CollectedHeap.hpp
> ---
> 2397 st->print(" remaining free region(s) from each node id: ");
>
> What do you think about changing this to "... region(s) on each NUMA
> node: "? I think we should be clear about the logging being for NUMA.
Changed as you suggested.
> ---
>
> src/hotspot/share/gc/g1/g1EdenRegions.hpp
> ---
> 33 class G1EdenRegions : public G1RegionCounts {
>
> I don’t think G1EdenRegions is a G1RegionCounts but rather it should
> have one. So instead of using inheritance here I think G1EdenRegions
> should have a G1RegionsCount. Instead of overloading length I would
> then suggest adding a region_count(uint node_index) to get the count.
>
> Same goes for G1SurvivorRegions.
Changed as you suggested for both G1EdenRegions and G1SurvivorRegions.
> ---
>
> src/hotspot/share/gc/g1/g1NUMA.cpp
> ---
> 279 bool NodeIndexCheckClosure::do_heap_region(HeapRegion* hr) {
> 280 uint preferred_node_index =
> _numa->preferred_node_index_for_index(hr->hrm_index());
> 281 uint active_node_index = _numa->index_of_address(hr->bottom());
> 282
> 283 if (preferred_node_index == active_node_index) {
> 284 _matched[preferred_node_index]++;
> 285 } else if (active_node_index == G1NUMA::UnknownNodeIndex) {
> 286 _unknown++;
> 287 }
> 288 _total++;
> 289
> 290 return false;
> 291 }
>
> As we discussed offline, I would like to know the mismatches as well,
> I think the easiest approach would be to make the total count per node
> as well and that way we can see if there were any regions that didn't
> match. What do you think about printing the info like this:
> [3,009s][trace][gc,heap,numa ] GC(6) NUMA region verification
> (actual/expected): 0: 1024/1024, 1: 270/1024, Unknown: 0
Changed as you suggested to have per node but deleted unknown as unknown
is 'total - matched'.
>
> When testing this I also realized this output is problematic in the
> case where we have committed regions that have not yet been used.
> Reading the manual for get_mempolicy (the way we get the numa id for
> the address) say:
> "If no page has yet been allocated for the specified address,
> get_mempolicy() will allocate a page as if the thread had performed a
> read (load) access to that address, and return the ID of the node
> where that page was allocated."
Nice catch.
Shortly saying get_mempolicy() doesn't honor os::numa_make_local() call
that we previously requested. And this problem occurs when
AlwaysPreTouch is disabled.
However, my initial implementation which uses 'numa_move_pages()'
doesn't have this problem. So one fundamental solution would be
replacing linux implementation.
In current scope of 3 patches, there will be no problem if we add
'hr->free && !AlwaysPreTouch' condition check however
os::numa_get_group_id_for_address() will still have such limitation.
What do you think about changing the Linux implementation?
(webrev link is added at the end)
>
> Doing a read access seem to always get a page on NUMA node 0, so the
> accounting will not be correct in this case.
Right, to be clear the calling process's NUMA id.
>
> One way to fix this would be to only do accounting for regions
> currently used (!hr->is_free()) but I'm not sure that is exactly what
> we want, at least not if we only do this after the GC, then only the
> survivors and old will be checked. We could solve this by also do
> verification before the GC. I think this might be the way to go, what
> do you think? If my proposal was hard to follow, here's a patch:
> http://cr.openjdk.java.net/~sjohanss/numa/verify-alternative/
>
> The output from this patch would be:
> 9,233s][trace][gc,heap,numa ] GC(18) GC Start: NUMA region
> verification (actual/expected): 0: 358/358, 1: 361/361, Unknown: 0
> [9,306s][trace][gc,heap,numa ] GC(18) GC End: NUMA region
> verification (actual/expected): 0: 348/348, 1: 347/347, Unknown: 0
>
> One can also see that this verification takes some time, so maybe it
> would make sense to have this logging under gc+numa+verify.
I think if we avoid calling G1NUMAA:index_of_address() if 'hr->free() &&
!AlwaysPreTouch' but count total, we will be fine.
Basically I imported your patch but the verification is only happening
at the beginning of GC.
Or with 'numa_move_pages()', we don't need such condition check.
> ---
>
> 234 uint converted_req_index = requested_node_index;
> 235 if(converted_req_index == AnyNodeIndex) {
> 236 converted_req_index = _num_active_node_ids;
> 237 }
> 238 if (converted_req_index <= _num_active_node_ids) {
> 239 _times->update(phase, converted_req_index,
> allocated_node_index);
> 240 }
>
> I had to read this more than once to understand what it really did and
> I think we can simplify it a bit, by just doing an if-else that checks
> for AnyNodeIndex and if so passes in _num_active_node_ids to update().
> This should be ok since requested_node_index never can be larger than
> _num_active_node_ids.
Tried to reflect your comment. :)
> ---
>
> src/hotspot/share/gc/g1/g1ParScanThreadState.cpp
> ---
> I would prefer if we hide all the accounting in helper functions, but
> it might be good to declare them to be inlined.
>
> 85 if (_numa->is_enabled()) {
> 86 LogTarget(Info, gc, heap, numa) lt;
> 87
> 88 if (lt.is_enabled()) {
> 89 uint num_nodes = _numa->num_active_nodes();
> 90 // Record only if there are multiple active nodes.
> 91 _obj_alloc_stat = NEW_C_HEAP_ARRAY(size_t, num_nodes, mtGC);
> 92 memset((void*)_obj_alloc_stat, 0, sizeof(size_t) * num_nodes);
> 93 }
> 94 }
>
> Move to something like initialize_numa_stats().
>
> 108 if (_obj_alloc_stat != NULL) {
> 109 uint node_index = _numa->index_of_current_thread();
> 110 _numa->copy_statistics(G1NodeTimes::LocalObjProcessAtCopyToSurv,
> node_index, _obj_alloc_stat);
> 111 }
>
> This could be called flush_numa_stats().
>
> 268 if (_obj_alloc_stat != NULL) {
> 269 _obj_alloc_stat[node_index]++;
> 270 }
>
> And this something like update_numa_stats(uint).
All changed with suggested helper functions.
> --
>
> heapRegionSet.hpp
> ---
> 159 inline void update_length(HeapRegion* hr, bool increase);
> 254 inline void update_length(HeapRegion* hr, bool increase);
>
> Is there any reason for having update_length that takes a bool rather
> than having one function for increments and one for decrements? To me
> it looks like all uses are pretty well defined and it would make the
> code easier to read. I also think we could pass in the node index
> rather than the HeapRegion since the getter lenght() does this.
Done.
> ---
>
> src/hotspot/share/gc/g1/g1NodeTimes.cpp
> ---
> First, a question about the names, G1NodeTimes signals that it has to
> do with timing, but currently we don't really record any timings. Same
> thing with NodeStatPhases, not really the same type of phases that we
> have for the rest of the GC logging. What do you think about renaming
> the class to G1NUMAStats and the enum to NodeDataItems?
Changed both class and enum names as you suggested.
>
> 166 void G1NodeTimes::print_phase_info(G1NodeTimes::NodeStatPhases
> phase) {
> 167 LogTarget(Info, gc, heap, numa) lt;
>
> I think this should be on debug level, but if you don't agree leave it
> as is.
I feel Info seems okay, so let me leave as is.
> ---
>
> 191 void G1NodeTimes::print_mutator_alloc_stat_debug() {
> 192 LogTarget(Debug, gc, heap, numa) lt;
>
> And if you agree on moving the above to debug I think this should be
> on trace level.
As is, please.
Here's the webrev:
http://cr.openjdk.java.net/~sangheki/8220312/webrev.4
http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc
http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc.numa_move_pages
Testing: hs-tier 1 ~ 5 with / without UseNUMA.
Thanks,
Sangheon
> ---
>
> This is it for now. Thanks,
> Stefan
>
>
>> Testing: hs-tier 1 ~ 4 with/without UseNUMA. hs-tier5 is almost
>> finished without new failures.
>>
>> Thanks,
>> Sangheon
>>
>>
More information about the hotspot-gc-dev
mailing list