RFR: 8298482: Implement ParallelGC NUMAStats for Linux

Tue Jan 3 10:41:49 UTC 2023

On Mon, 2 Jan 2023 14:44:01 GMT, Albert Mingkun Yang <ayang at openjdk.org> wrote:

>> ParallelGC has a seemingly useful option -XX:+NUMAStats that prints detailed information in GC.heap_info about which NUMA node pages in the eden space are bound to.  However as far as I can tell this only ever worked on Solaris and is not implemented on any of the systems we currently support.  This patch implements it on Linux using the move_pages system call.
>> 
>> The function os::get_page_info() and accompanying struct page_info was just a thin wrapper around the Solaris meminfo(2) syscall and was never ported to other systems so I've just removed it rather than try to emulate its interface.
>> 
>> There's also a method MutableNUMASpace::LGRPSpace::scan_pages() which attempts to find pages on the wrong NUMA node and frees them so that they have another chance to be allocated on the correct node by the first-touching thread, but I think this has always been a no-op on non-Solaris so perhaps should also be removed.  On Linux it shouldn't be necessary as you can bind pages to the desired node directly.
>> 
>> I don't know what the performance of this option was like on Solaris but on Linux the move_pages call can be quite slow: I measured about 25ms/GB on my system.  At the moment we call LGRPSpace::accumulate_statistics() twice per GC cycle: I removed the second call as it's likely to see a lot of uncommitted pages if the spaces were just resized. MutableNUMASpace::print_on() also calls accumulate_statistics() directly and since that's the only place this data is used, maybe we can drop the call from MutableNUMASpace::accumulate_statistics() as well?
>> 
>> Example output:
>> 
>> 
>>  PSYoungGen      total 4290560K, used 835628K [0x00000006aac00000, 0x0000000800000000, 0x0000000800000000)
>>   eden space 3096576K, 1% used [0x00000006aac00000,0x00000007176a9f48,0x0000000767c00000)
>>     lgrp 0 space 1761280K, 2% used [0x00000006aac00000,0x00000006acfc4980,0x0000000716400000)
>>     local/remote/unbiased/uncommitted: 1671168K/0K/0K/90112K, large/small pages: 0/440320
>>     lgrp 1 space 1335296K, 46% used [0x0000000716400000,0x000000073c2abb18,0x0000000767c00000)
>>     local/remote/unbiased/uncommitted: 1335296K/0K/0K/0K, large/small pages: 0/333824
>>   from space 1193984K, 65% used [0x00000007b7200000,0x00000007e6b9c778,0x0000000800000000)
>>   to   space 1247232K, 0% used [0x0000000767c00000,0x0000000767c00000,0x00000007b3e00000)
>> 
>> 
>> After testing this with SPECjbb for a while I noticed some pages always end up bound to the wrong node.  I think this is a regression caused by JDK-8283935 but I'll raise a separate ticket for that.
>
> src/hotspot/share/gc/parallel/mutableNUMASpace.cpp line 896:
> 
>> 894:     size_t npages = 0;
>> 895:     for (; npages < PagesPerIteration && p < end; p += os::vm_page_size())
>> 896:       pages[npages++] = p;
> 
> This means each page is `vm_page_size()` aligned. Would it be problematic if large-page is in use?

I don't think so: for HugeTLBFS it makes some redundant calls but in my testing always returns the node ID for the containing huge page even if the address given is not huge page aligned, and for THP you don't know whether a particular address is backed by a huge page so you have to step with the smallest granularity anyway. With large pages enabled the "small/large pages" count is misleading but I don't know any way to get this information on Linux except by parsing `/proc/self/smaps` which we probably don't want to do.

-------------

PR: https://git.openjdk.org/jdk/pull/11635