RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3)

sangheon.kim at oracle.com sangheon.kim at oracle.com
Tue Oct 8 05:44:02 UTC 2019


Hi Stefan,

On 10/4/19 5:23 AM, Stefan Johansson wrote:
> Hi Sangheon,
>
> First of all, thanks for this updated version incorporating a lot of 
> our comments. I think we are getting closer to the goal, but I still 
> have some more comments :)
Thanks for the nice suggestions!

>
> On 2019-10-01 18:43, sangheon.kim at oracle.com wrote:
>> Hi Kim and others,
>>
>> This webrev.2 simplified a bit more after changing 'heap expansion' 
>> approach.
>> Previously heap may expand with preferred numa id which means 
>> contiguous same numa id heap regions may exist but current version is 
>> assuming to have evenly split heap regions. i.e. 4 numa node system, 
>> heap regions will be 012301230123, so if we know address or heap 
>> region index, we can know preferred numa id.
>>
>> Many codes related to support previous style expansion were removed.
>>
>> ...
>>
>> webrev:
>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/
>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc
>
> src/hotspot/share/gc/g1/g1Allocator.cpp
> ---
>   31 #include "gc/g1/g1NUMA.hpp"
> I don't see why this include is needed, but you might want to include 
> gc/g1/g1MemoryNodeManager.hpp instead.
You're right.
Done.

> ---
>
> hotspot/share/gc/g1/g1CollectedHeap.cpp
> ---
> 1518   _mem_node_mgr(G1MemoryNodeManager::create()),
>
> I saw your response to Kim regarding G1Allocator needing it do be 
> initialized and I get that, but have you looked at moving the creation 
> of G1Allocator to initialize() as well, I think it's first use is 
> actually below:
> 1802   _mem_node_mgr->set_page_size(page_size);
> here:
> 1851   _allocator->init_mutator_alloc_regions();
>
> I might be missing some other place where it gets called, but I think 
> it should be safe to create both the node manager and the allocator 
> early in initialize().
Yeah, we can consider this as well. But there are some other followup 
enhancements which may affect to this initialization order, so I would 
like to leave as is. And then file a separate CR.
One of the example is separating free list, so HeapRegionManager also 
needs G1MemoryNodeManager instance to initialize free list.

> ---
>
> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.hpp
> ---
> 28 #include "gc/g1/g1MemoryNodeManager.hpp"
>
> Remove this include.
Done.

> ---
>
> src/hotspot/share/gc/g1/g1_globals.hpp
> ---
> 326                range(0, 100)
>
> Remove the backslash and add back the removed line to leave the file 
> gc, heap, numa, verificationunchanged.
Done.

> ---
>
> src/hotspot/share/gc/g1/heapRegionManager.cpp
> ---
>  142   if (hr != NULL) {
>  143     assert(hr->next() == NULL, "Single region should not have 
> next");
>  144     assert(is_available(hr->hrm_index()), "Must be committed");
>  145
>  146     verify_actual_node_index(hr->bottom(), hr->node_index());
>  147   }
>
> I don't think this is a good place to do the verification, we allocate 
> the free region while holding a lock and I think we should avoid doing 
> a system call there. I would rather see this done during a safepoint, 
> having a closure that iterates the heap and verify all regions.
I tried to point out this during the discussion but probably not enough. :(
My understanding of the result is okay as the logs are protected by log 
level+tag. But as replied to Thomas, I will remove the verification at 
HRM::allocate_free_region() if there's no more opinions.

Any opinions? Thomas or Kim?

>
> I also think it would be nice to have two levels of the output, the 
> one line for each region on trace level and on debug we can have a 
> summary, something like:
> NUMA Node 1: expected=25, actual=23
> NUMA Node 2: expected=25, actual=27
>
> What do you (and others) think about that?
Having 2 level log print seems good to me.
And your suggestion is similar to Thomas' one and I would like to 
address it at the later patch #3 (JDK-8220312 which is also part of the JEP)

> ---
>  216 static void print_node_id_of_regions(uint start, uint num_regions){
>  217   LogTarget(Trace, gc, heap, numa) lt;
>
> I understand that it might make the test a bit more complicated, but 
> have you thought about instead adding the node index to the heap 
> printing done when <gc, heap, region> is enabled on trace level?
So you are suggesting the log tag from gc+heap+numa to gc+heap+region?

> ---
>  235 static void set_heapregion_node_index(HeapRegion* hr) {
>
> I don't think we should special case for when AlwaysPreTouch is on and 
> instead always just call hr->set_node_index(preferred_index) directly 
> in make_regions_available. The reason is that I think it will make the 
> NUMA support harder to understand and explain and it can potentially 
> also hide problems with a systems configuration. It might also 
> actually be worse then using the preferred id, because the OS might 
> decide to move the pages back to the preferred node right after we 
> checked this (not sure it will happen, but in theory).
I have different opinion, sorry.
I do believe when AlwaysPreTouch is enabled, we should check actual node 
and then use it because;
1. If don't check the actual node id when 'AlwayPreTouch' is enabled, we 
will loose a chance of having improvement if actual node is different 
from preferred node. (I know this will not happen frequently but in 
theory.. )
2. I don't think acting differently with AlwayPreTouch is a problem. I 
think it is opposite that it is a good chance to analyze the behavior of 
VM earlier. Earlier means we are planning to add verification code at 
safepoint(not yet decided when, so please give me good suggestion) which 
is later than make_regions_available(). In addition, the default value 
of AlwaysPreTouch is false so it means user requested pages to faulted in.
3. We are already assuming we cannot immediately react when OS migrates 
the memory. So if OS migrates after checking, still we are consistent on 
that assumption.

>
> An other problem with this code is the call to:
> verify_actual_node_index(hr->bottom(), node_index)
>
> This function will only return the "actual" node index if logging for 
> <gc, heap, numa, verification> is enable on debug level.
Yes, I'm aware of this problem so planned to fix before posting the 
webrev but completely forgot about it. My bad.
Replaced to work as expected.

> ---
>
>  346  bool HeapRegionManager::is_on_preferred_index(uint region_index, 
> uint preferred_node_index) {
>  347    uint region_node_index = 
> G1MemoryNodeManager::mgr()->preferred_index_for_address(
>  348 G1CollectedHeap::heap()->bottom_addr_for_region(region_index));
>  349   return region_node_index == preferred_node_index ||
>  350          preferred_node_index == G1MemoryNodeManager::AnyNodeIndex;
>
> I guess adding the AnyNodeIndex case here is because in this patch 
> nobody is expanding on a preferred node, right? To me this is just 
> another argument to not do any changes to the expand code in this 
> patch. I know I suggested adding expand_on_preferred_node(), but I 
> should have been clearer about when I think we should add it.
Got it.
Removed AnyNodeIndex.

> ---
>
> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
> ---
>   56   // Returns memory node ids
>   57   virtual const int* node_ids() const;
>
> Doesn't seem to be used, remove.
It will be used at patch 3/3, JDK-8220312.

> ---
>
> src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp
> ---
>  67   LINUX_ONLY(if (UseNUMA) {
> ...
>  79     delete numa;
>  80   })
>
> A bit confusing with a multi-line LINUX_ONLY, I would prefer to hide 
> this in a private helper, something like:
>   if (UseNUMA) {
>      LINUX_ONLY(create_numa_manager());
>   }
>
>   if (_inst == NULL) {
>     _inst = new G1MemoryNodeManager();
>   }
>
> Not really happy about this either, but we can look at simplifying the 
> NUMA initialization as a follow up.
Changed as Kim suggested, hope you are okay with this.

#ifdef LINUX


> ---
>
> src/hotspot/share/gc/g1/g1NUMA.hpp
> ---
>   87   // Returns numa id of the given numa index.
>   88   inline int numa_id_of_index(uint numa_index) const;
>
> Currently unused, either remove or make use of it when calling 
> numa_make_local.
Done.

> ---
>   94   // Returns current active numa ids.
>   95   const int* numa_ids() const { return _numa_ids; }
>
> Only used by memory manager above, which in turn is unused, remove.
It will be used at patch 3/3, JDK-8220312.

> ---
>
> src/hotspot/share/gc/g1/g1NUMA.hpp
> ---
>   55 // Request the given memory to locate on preferred node.
>   56 // There are 2 things to consider.
>   57 // First, size comparison for G1HeapRegionSize and page size.
>  ...
>   62 // Examples of 4 numa ids with non-preferred numa id.
>
> What do you think about this instead:
> // Request to spread the given memory evenly across the available NUMA
> // nodes. Which node to request for a given address is given by the
> // region size and the page size. Below are two examples:
>
> I would also like a "NUMA node" row for each example showing which 
> numa node the pages and regions end up on.
Changed / added as you suggested.

Will post the webrev.3 after addressing Kim's comments and tests finished.

Thanks,
Sangheon


> ---
>
> Thanks,
> Stefan
>
>> Testing: hs-tier1 ~ 5 +-UseNUMA
>>
>> Thanks,
>> Sangheon
>>
>>
>>> ------------------------------------------------------------------------------ 
>>>
>>>
>>




More information about the hotspot-gc-dev mailing list