RFR(L): 8220312: Implementation: NUMA-Aware Memory Allocation for G1, Logging (3/3)
sangheon.kim at oracle.com
sangheon.kim at oracle.com
Tue Nov 5 06:22:19 UTC 2019
Hi Stefan,
On 11/4/19 8:20 AM, Stefan Johansson wrote:
> Hi Sangheon,
>
> Thanks for addressing my comments, some comments inline and additional
> comments below.
>
> On 2019-11-02 07:08, sangheon.kim at oracle.com wrote:
>>> ...
>>> As we discussed offline, I would like to know the mismatches as
>>> well, I think the easiest approach would be to make the total count
>>> per node as well and that way we can see if there were any regions
>>> that didn't match. What do you think about printing the info like this:
>>> [3,009s][trace][gc,heap,numa ] GC(6) NUMA region verification
>>> (actual/expected): 0: 1024/1024, 1: 270/1024, Unknown: 0
>> Changed as you suggested to have per node but deleted unknown as
>> unknown is 'total - matched'.
> I'm fine with skipping 'Unknown', but not all regions in (total -
> matched) must be 'Unknown' since there is the possibility of a real
> mismatch as well.
Basically I agree with you that 'total - matched' remains some other
cases. i.e. 'mismatch' and 'unknown (mostly not yet touched)'.
Previously I just wanted to highlight the most interesting part which I
think is 'matched and total'.
New patch counts, matched / mismatched / total.
>>
>>>
>>> When testing this I also realized this output is problematic in the
>>> case where we have committed regions that have not yet been used.
>>> Reading the manual for get_mempolicy (the way we get the numa id for
>>> the address) say:
>>> "If no page has yet been allocated for the specified address,
>>> get_mempolicy() will allocate a page as if the thread had performed
>>> a read (load) access to that address, and return the ID of the node
>>> where that page was allocated."
>> Nice catch.
>>
>> Shortly saying get_mempolicy() doesn't honor os::numa_make_local()
>> call that we previously requested. And this problem occurs when
>> AlwaysPreTouch is disabled.
>>
>> However, my initial implementation which uses 'numa_move_pages()'
>> doesn't have this problem. So one fundamental solution would be
>> replacing linux implementation.
>> In current scope of 3 patches, there will be no problem if we add
>> 'hr->free && !AlwaysPreTouch' condition check however
>> os::numa_get_group_id_for_address() will still have such limitation.
>>
>> What do you think about changing the Linux implementation?
>> (webrev link is added at the end)
>>
> I agree that we should use numa_move_pages(), but I think we might
> want to change things a bit more because of this. I will add those
> comments below the patch to have all code comments in one place.
>
okay
>>> ... >> One can also see that this verification takes some time, so
>>> maybe it
>>> would make sense to have this logging under gc+numa+verify.
>> I think if we avoid calling G1NUMAA:index_of_address() if 'hr->free()
>> && !AlwaysPreTouch' but count total, we will be fine.
>> Basically I imported your patch but the verification is only
>> happening at the beginning of GC.
>
> Any reason for not including it at the end of GC as well? I guess that
> could be good to be able to diagnose if there was an increase
> imbalance due to the collection.
Added after gc as well.
I just wanted to avoid logging to much. But as you said, node
information after gc seems helpful too.
>
>>> ...
>>> 166 void G1NodeTimes::print_phase_info(G1NodeTimes::NodeStatPhases
>>> phase) {
>>> 167 LogTarget(Info, gc, heap, numa) lt;
>>>
>>> I think this should be on debug level, but if you don't agree leave
>>> it as is.
>> I feel Info seems okay, so let me leave as is.
>>
>>> ---
>>>
>>> 191 void G1NodeTimes::print_mutator_alloc_stat_debug() {
>>> 192 LogTarget(Debug, gc, heap, numa) lt;
>>>
>>> And if you agree on moving the above to debug I think this should be
>>> on trace level.
>> As is, please.
> I'm okay with that, I just wanted to point out that compared to the
> other G1 logging on Info/Debug level these seemed to be one off.
OK
>
>>
>> Here's the webrev:
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4
>
> Some additional comments:
>
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
> ---
> 2598 void G1CollectedHeap::verify_numa_regions() {
> 2599 LogTarget(Trace, gc, heap, numa) lt;
>
> I realize I didn't include that in my example patch, but did you have
> anything against changing the tags here to 'gc, heap, verify'?
I'm with gc+heap+verify so update it.
> ---
>
> src/hotspot/share/gc/g1/g1NUMA.cpp
> ---
> 281 _ls->print("%d: %u / %u, ", numa_ids[i], _matched[i],
> _total[i]);
>
> This will leave a trailing ',' now when we don't have the 'Unknown'
> output. I think just having a space as the separator between the nodes
> could be fine to avoid this.
Right, just a space.
> ---
>
> 292 if (hr->is_free() && !AlwaysPreTouch) {
> 293 active_node_index = G1NUMA::UnknownNodeIndex;
> 294 } else {
> 295 active_node_index = _numa->index_of_address(hr->bottom());
> 296 }
> 297
> 298 if (preferred_node_index == active_node_index) {
> 299 _matched[preferred_node_index]++;
> 300 }
> 301 _total[preferred_node_index]++;
>
> As I touched upon above, I think we still lack some important
> information here. If we are to look at all committed regions, which
> probably is good, we need to keep a mismatch count as well. Otherwise
> we can't know if the diff between matched and total is just unused
> regions or actual mismatches without relying on other logs and
> analysis. Or am I missing something?
Added 'mismatch' which doesn't count active_node_index == UnknownNodeIndex.
So now we are printing 'matched / mismatched / total'. We are not
printing unknown but 'total = matched + mismatched + unknown', so we can
calculate it.
One may want to explicitly print 'unknown' additionally or print
'unknown' instead of 'total'.
Or we can split into 2 levels too.
But I think matched / mismatched / total seems okay for now.
> --
>
> src/hotspot/share/gc/g1/g1SurvivorRegions.hpp
> ---
> 30 class G1RegionCounts;
>
> Just having a forward declaration here is not enough, so please add
> back the include.
Oops, reverted.
> ---
>
> src/hotspot/share/gc/g1/g1NUMAStats.hpp
> ---
> 99 void print_phase_info(G1NUMAStats::NodeDataItems phase);
>
> Sorry for being so picky on the naming, but since we changed the class
> and enum I think this method, the comments and the parameters should
> change as well. For example:
> void print_info(G1NUMAStats::NodeDataItems type);
:) Okay
>
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc
>> http://cr.openjdk.java.net/~sangheki/8220312/webrev.4.inc.numa_move_pages
>>
> As I said above, I think we should go with this implementation of
> getting the node given an address, but I have a comment:
> src/hotspot/os/linux/os_linux.cpp
> ---
> 3018 if (os::Linux::numa_move_pages(0, 1, pages, NULL, &id, 0) == -1) {
> 3019 return -1;
> 3020 }
> 3021 if (id < 0) {
> 3022 return -1;
> 3023 }
>
> I think we should differ between the case where the call fails (return
> -1) and the id returned is negative. From reading the man_pages for
> 'move_pages()' (used by numa_move_pages()) it seems we can expect to
> either get -EFAULT (for normal pages) or -ENOENT (for large pages)
> when a region has not yet been used. I'm not completely sure how much
> logic we want to add to this method, so maybe creating a RFE to look
> at this in more detail later is good. But for this first version I
> think your proposal is good.
I understand your comment to distinguish 'not yet faulted in (not yet
touched)' case which seems nice to have. But I don't have good usage for
it on top of my head. This may be used when record / print at
NodeIndexCheckClosure::do_heap_region().
Filed a RFE, https://bugs.openjdk.java.net/browse/JDK-8233535
>
> One thought I just got, with this implementation is more like
> 'Unused', rather then 'Unknown'. But such a notion change could also
> be looked at in the RFE.
I'm not sure I fully agree / understand with you. :)
I think when we combine 'unused or unknown' with 'node index / id',
unknown seems better fit.
'unknown node index' which means the value is unknown for some
reason(mostly not yet faulted in) seems better than 'unused node index' .
To me 'unused' seems better when combined with 'page or address'. e..g
unused page or unused address etc.
If you still think further notion change is necessary in separate CR, I
can file for you. :)
webrev:
http://cr.openjdk.java.net/~sangheki/8220312/webrev.5
http://cr.openjdk.java.net/~sangheki/8220312/webrev.5.inc
- All comments from Kim and Stefan.
- Includes class / methods name changes which Kim suggested to me directly.
Testing: hs-tier 1 ~ 5, with/without UseNUMA.
Thanks,
Sangheon
> ---
>
> Thanks,
> Stefan
>
>>
>> Testing: hs-tier 1 ~ 5 with / without UseNUMA.
>>
>> Thanks,
>> Sangheon
>>
>>
>>> ---
>>>
>>> This is it for now. Thanks,
>>> Stefan
>>>
>>>
>>>> Testing: hs-tier 1 ~ 4 with/without UseNUMA. hs-tier5 is almost
>>>> finished without new failures.
>>>>
>>>> Thanks,
>>>> Sangheon
>>>>
>>>>
>>
More information about the hotspot-gc-dev
mailing list