RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3)

Tue Nov 5 06:22:19 UTC 2019

Hi Stefan,

On 11/4/19 8:20 AM, Stefan Johansson wrote:
> Hi Sangheon,
>
> Thanks for addressing my comments, some comments inline and additional 
> comments below.
>
> On 2019-11-02 07:08, sangheon.kim at oracle.com wrote:
>>> ...
>>> As we discussed offline, I would like to know the mismatches as 
>>> well, I think the easiest approach would be to make the total count 
>>> per node as well and that way we can see if there were any regions 
>>> that didn't match. What do you think about printing the info like this:
>>> [3,009s][trace][gc,heap,numa ] GC(6) NUMA region verification 
>>> (actual/expected): 0: 1024/1024, 1: 270/1024, Unknown: 0
>> Changed as you suggested to have per node but deleted unknown as 
>> unknown is 'total - matched'.
> I'm fine with skipping 'Unknown', but not all regions in (total - 
> matched) must be 'Unknown' since there is the possibility of a real 
> mismatch as well.
Basically I agree with you that 'total - matched' remains some other 
cases. i.e. 'mismatch' and 'unknown (mostly not yet touched)'. 
Previously I just wanted to highlight the most interesting part which I 
think is 'matched and total'.

New patch counts, matched / mismatched / total.

>>
>>>
>>> When testing this I also realized this output is problematic in the 
>>> case where we have committed regions that have not yet been used. 
>>> Reading the manual for get_mempolicy (the way we get the numa id for 
>>> the address) say:
>>> "If no page has yet been allocated for the specified address, 
>>> get_mempolicy() will allocate a page as if the thread had performed 
>>> a read (load) access to that address, and return the ID of the node 
>>> where that page was allocated."
>> Nice catch.
>>
>> Shortly saying get_mempolicy() doesn't honor os::numa_make_local() 
>> call that we previously requested. And this problem occurs when 
>> AlwaysPreTouch is disabled.
>>
>> However, my initial implementation which uses 'numa_move_pages()' 
>> doesn't have this problem. So one fundamental solution would be 
>> replacing linux implementation.
>> In current scope of 3 patches, there will be no problem if we add 
>> 'hr->free && !AlwaysPreTouch' condition check however 
>> os::numa_get_group_id_for_address() will still have such limitation.
>>
>> What do you think about changing the Linux implementation?
>> (webrev link is added at the end)
>>
> I agree that we should use numa_move_pages(), but I think we might 
> want to change things a bit more because of this. I will add those 
> comments below the patch to have all code comments in one place.
>
okay

>>> ... >> One can also see that this verification takes some time, so 
>>> maybe it
>>> would make sense to have this logging under gc+numa+verify.
>> I think if we avoid calling G1NUMAA:index_of_address() if 'hr->free() 
>> && !AlwaysPreTouch'  but count total, we will be fine.
>> Basically I imported your patch but the verification is only 
>> happening at the beginning of GC.
>
> Any reason for not including it at the end of GC as well? I guess that 
> could be good to be able to diagnose if there was an increase 
> imbalance due to the collection.
Added after gc as well.
I just wanted to avoid logging to much. But as you said, node 
information after gc seems helpful too.

>
>>> ...
>>>  166 void G1NodeTimes::print_phase_info(G1NodeTimes::NodeStatPhases 
>>> phase) {
>>>  167   LogTarget(Info, gc, heap, numa) lt;
>>>
>>> I think this should be on debug level, but if you don't agree leave 
>>> it as is.
>> I feel Info seems okay, so let me leave as is.
>>
>>> ---
>>>
>>>  191 void G1NodeTimes::print_mutator_alloc_stat_debug() {
>>>  192   LogTarget(Debug, gc, heap, numa) lt;
>>>
>>> And if you agree on moving the above to debug I think this should be 
>>> on trace level.
>> As is, please.
> I'm okay with that, I just wanted to point out that compared to the 
> other G1 logging on Info/Debug level these seemed to be one off.
OK

>
>>
>> Here's the webrev:
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4
>
> Some additional comments:
>
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
> ---
> 2598 void G1CollectedHeap::verify_numa_regions() {
> 2599   LogTarget(Trace, gc, heap, numa) lt;
>
> I realize I didn't include that in my example patch, but did you have 
> anything against changing the tags here to 'gc, heap, verify'?
I'm with gc+heap+verify so update it.

> ---
>
> src/hotspot/share/gc/g1/g1NUMA.cpp
> ---
>  281     _ls->print("%d: %u / %u, ", numa_ids[i], _matched[i], 
> _total[i]);
>
> This will leave a trailing ',' now when we don't have the 'Unknown' 
> output. I think just having a space as the separator between the nodes 
> could be fine to avoid this.
Right, just a space.

> ---
>
>  292   if (hr->is_free() && !AlwaysPreTouch) {
>  293     active_node_index = G1NUMA::UnknownNodeIndex;
>  294   } else {
>  295     active_node_index = _numa->index_of_address(hr->bottom());
>  296   }
>  297
>  298   if (preferred_node_index == active_node_index) {
>  299     _matched[preferred_node_index]++;
>  300   }
>  301   _total[preferred_node_index]++;
>
> As I touched upon above, I think we still lack some important 
> information here. If we are to look at all committed regions, which 
> probably is good, we need to keep a mismatch count as well. Otherwise 
> we can't know if the diff between matched and total is just unused 
> regions or actual mismatches without relying on other logs and 
> analysis. Or am I missing something?
Added 'mismatch' which doesn't count active_node_index == UnknownNodeIndex.
So now we are printing 'matched / mismatched / total'. We are not 
printing unknown but 'total = matched + mismatched + unknown', so we can 
calculate it.

One may want to explicitly print 'unknown' additionally or print 
'unknown' instead of 'total'.
Or we can split into 2 levels too.
But I think matched / mismatched / total seems okay for now.

> -- 
>
> src/hotspot/share/gc/g1/g1SurvivorRegions.hpp
> ---
>   30 class G1RegionCounts;
>
> Just having a forward declaration here is not enough, so please add 
> back the include.
Oops, reverted.

> ---
>
> src/hotspot/share/gc/g1/g1NUMAStats.hpp
> ---
> 99   void print_phase_info(G1NUMAStats::NodeDataItems phase);
>
> Sorry for being so picky on the naming, but since we changed the class 
> and enum I think this method, the comments and the parameters should 
> change as well. For example:
> void print_info(G1NUMAStats::NodeDataItems type);
:) Okay

>
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc.numa_move_pages 
>>
> As I said above, I think we should go with this implementation of 
> getting the node given an address, but I have a comment:
> src/hotspot/os/linux/os_linux.cpp
> ---
> 3018   if (os::Linux::numa_move_pages(0, 1, pages, NULL, &id, 0) == -1) {
> 3019     return -1;
> 3020   }
> 3021   if (id < 0) {
> 3022     return -1;
> 3023   }
>
> I think we should differ between the case where the call fails (return 
> -1) and the id returned is negative. From reading the man_pages for 
> 'move_pages()' (used by numa_move_pages()) it seems we can expect to 
> either get -EFAULT (for normal pages) or -ENOENT (for large pages) 
> when a region has not yet been used. I'm not completely sure how much 
> logic we want to add to this method, so maybe creating a RFE to look 
> at this in more detail later is good. But for this first version I 
> think your proposal is good.
I understand your comment to distinguish 'not yet faulted in (not yet 
touched)' case which seems nice to have. But I don't have good usage for 
it on top of my head. This may be used when record / print at 
NodeIndexCheckClosure::do_heap_region().

Filed a RFE, https://bugs.openjdk.java.net/browse/JDK-8233535

>
> One thought I just got, with this implementation is more like 
> 'Unused', rather then 'Unknown'. But such a notion change could also 
> be looked at in the RFE.
I'm not sure I fully agree / understand with you. :)

I think when we combine 'unused or unknown' with 'node index / id', 
unknown seems better fit.
'unknown node index' which means the value is unknown for some 
reason(mostly not yet faulted in) seems better than 'unused node index' .
To me 'unused' seems better when combined with 'page or address'. e..g 
unused page or unused address etc.

If you still think further notion change is necessary in separate CR, I 
can file for you. :)

webrev:
http://cr.openjdk.java.net/~sangheki/8220312/webrev.5
http://cr.openjdk.java.net/~sangheki/8220312/webrev.5.inc

- All comments from Kim and Stefan.
- Includes class / methods name changes which Kim suggested to me directly.

Testing: hs-tier 1 ~ 5, with/without UseNUMA.

Thanks,
Sangheon

> ---
>
> Thanks,
> Stefan
>
>>
>> Testing: hs-tier 1 ~ 5 with / without UseNUMA.
>>
>> Thanks,
>> Sangheon
>>
>>
>>> ---
>>>
>>> This is it for now. Thanks,
>>> Stefan
>>>
>>>
>>>> Testing: hs-tier 1 ~ 4 with/without UseNUMA. hs-tier5 is almost 
>>>> finished without new failures.
>>>>
>>>> Thanks,
>>>> Sangheon
>>>>
>>>>
>>