RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3)

Tue Oct 1 16:43:58 UTC 2019

Hi Kim and others,

This webrev.2 simplified a bit more after changing 'heap expansion' 
approach.
Previously heap may expand with preferred numa id which means contiguous 
same numa id heap regions may exist but current version is assuming to 
have evenly split heap regions. i.e. 4 numa node system, heap regions 
will be 012301230123, so if we know address or heap region index, we can 
know preferred numa id.

Many codes related to support previous style expansion were removed.

On 9/24/19 6:44 PM, Kim Barrett wrote:
>> On Sep 21, 2019, at 1:19 AM,sangheon.kim at oracle.com  wrote:
>>
>> webrev:
>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1
>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1.inc  (this may not help much! :) )
>> Testing: hs-tier 1 ~ 5 (with/without UseNUMA)
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1AllocRegion.hpp
>    96   uint _node_index;
>
> Protected; should be private.
_node_index is used from derived classes.
Are you suggesting to add a getter?

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1Allocator.cpp
>    42   _mutator_alloc_region(NULL),
>
> Should be _mutator_alloc_regions (plural), since it's now an array.
>
> Similarly, these should be pluralized:
>    67 void G1Allocator::init_mutator_alloc_region() {
>    74 void G1Allocator::release_mutator_alloc_region() {
>
> And this
>    48   // The number of MutatorAllocRegions used, one per memory node.
>    49   size_t _num_alloc_region;
Done

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1Allocator.cpp
>    53 G1Allocator::~G1Allocator() {
>    54   for (uint i = 0; i < _num_alloc_region; i++) {
>    55     _mutator_alloc_region[i].~MutatorAllocRegion();
>    56   }
>    57   FREE_C_HEAP_ARRAY(MutatorAllocRegion, _mutator_alloc_region);
>    58 }
>
> --- should also be calling _mutator_alloc_region[i].release() ??
> --- or does destructor do that?
No, release() is never called.
release() is not actually releasing allocated resources but sets null to 
pointers and inc/dec some numbers such as used bytes. So I was thinking 
we don't need to call release().

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1Arguments.cpp
>   161   if (UseNUMA) {
>   162     if (FLAG_IS_DEFAULT(AlwaysPreTouch)) {
>   163       FLAG_SET_DEFAULT(AlwaysPreTouch, true);
>   164     }
>   165     if (!AlwaysPreTouch && FLAG_IS_CMDLINE(AlwaysPreTouch)) {
>   166       warning("Disabling AlwaysPreTouch is incompatible with UseNUMA. Disabling UseNUMA.");
>   167       FLAG_SET_ERGO(UseNUMA, false);
>   168     }
>   169   }
>
> Stefan asked about why AlwaysPreTouch is required when UseNUMA. I have
> a different question. Assuming UseNUMA does require AlwaysPreTouch,
> why is !AlwaysPreTouch winning here? Why not have UseNUMA win if they
> are conflicting?
As webrev.2 removes above code, we can skip this discussion?

> But see discussion below about
> G1RegionsSmallerThanCommitSizeMapper::commit_regions(), which
> suggested AlwaysPreTouch is required.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp
>    83 G1PageBasedVirtualSpace::~G1PageBasedVirtualSpace() {
> ...
>    92   _numa                   = NULL;
>    93 }
>
> [pre-existing] Destructors are for resource management. Nulling out /
> zeroing out members in a destructor generally isn't useful. This is
> really a comment on the existing code rather than a request to change
> anything. The addition of line 92 is okay in context, just the context
> is not good.
Agreed on pre-existing.
The intent here is to align with existing context, so leave as is?

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp
>   108         _next_node_index(G1MemoryNodeManager::mgr()->num_active_nodes() - 1),
>   109         _max_node_index(G1MemoryNodeManager::mgr()->num_active_nodes()) {
>
> Consider reversing the order of these members and their initializers,
> so the _next_node_index can use _max_node_index rather than another
> call to num_active_nodes().
Good point!
However this newly added part is removed.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp
>   113     uint next_node_index() const {
>   114       return _next_node_index;
>   115     }
>
> I think this is mis-named.  It's the current index for the
> distributor.  I think it should just be called "node_index".
Agree, but this line is also removed.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp
>
> I'm confused by G1RegionsSmallerThanCommitSizeMapper::commit_regions().
>
> For the LargerThan mapper, we have a sequence of regions that
> completely covers a sequence of pages, and we commit all of the
> associated pages using the requested node_index.
>
> For the SmallerThan mapper, we have a sequence of regions split up
> into subsequences that are each contained in a single page.  The first
> such subsequence might be on an already committed page.  Similarly for
> the last subsequence.  Nothing is done to those pages.
>
> In between there may be a series of region sequences, with each region
> sequence on a single page.  If there are more than one of these region
> sequences then more than one page will need to be committed.
>
> As we step through the seuqnce of pages and commit them, we also step
> the numa index to use for each page.
>
> Stefan asked a question in this area about the mechanism by which the
> node stepping is provided, and you responded with what sounds like an
> improvement.  But I have a different question.
>
> Why are we committing different pages on different numa nodes? The
> caller requested these regions be on the requested node. Why are we
> not honoring that request (as much as possible within the constraints
> of possible leading and trailing regions being on already committed
> pages.) The comment for G1NodeDistributor discusses (at a high level)
> what it's doing (e.g. a short summary of the above description), but
> there is no discussion of why that distribution is needed or
> desirable.
If I understand your question correctly, we do honor 'requested node 
index' at G1RegionSmallerThan case.
Please look at 'G1NodeDistributor::next()'.

     void next() {
       if (_requested_node_index == G1MemoryNodeManager::AnyNodeIndex) {
         _node_index = (_node_index + 1) % _max_node_index;
       } else {
         _node_index = _requested_node_index;
       }

If _requested_node_index is AnyNodeIndex, we cycle through valid node 
indices.
This code is also removed.
So G1NUMA is responsible to decide preferred node and upper APIs only 
decide whether need to expand or not.

> There might be a good reason for this behavior, in which case your
> response with an improvement sounds good.  But if so, I'm guessing I
> won't be the only one who doesn't know what that reason might be, and
> it would be good to provide an explanatory comment.  And of course, if
> there isn't a good reason...
>
> I think there is also a problem here if AlwaysPreTouch is false. (As
> discussed earlier, maybe it isn't required to be true.) The node index
> for the committed regions gets set (in make_regions_available) via the
> result of the syscall, so we really need pretouch to have been done.
> The alternative would be to assume commit_regions used the requested
> numa node.  But with the request stepping that wouldn't hold.  Of
> course, it also doesn't hold for any leading or trailing regions that
> were covered by already committed pages.
>
> I think this is the basis of your argument that AlwaysPreTouch is
> required for UseNUMA, and I think I'm now agreeing.  Otherwise we may
> think the leading and trailing regions in the sequence are on a
> different node than they actually are, since the associated pages may
> have already been committed on a different node than requested, but
> not yet touched.
The leading regions are only committed, so other regions which belong to 
same page will not actually committed, so touching issue doesn't happen.
SmallerThan class is supposed to handle this situation.

> But I still don't know why we would want to cycle through nodes for
> the "middle" pages.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/heapRegionManager.cpp
>   127     if (G1VerifyNUMAIdOfHeapRegions) {
>   128       // Read actual node index via system call.
>   129       uint actual_node_index = mgr->index_of_address(hr->bottom());
>   130       if (hr->node_index() != actual_node_index) {
>
> Can we actually do this here?  I thought the system call only gave a
> useful answer if the addressed location has been paged in.  I'm not
> sure that's necessarily happened at this point.
At webrev.1:
We can do this here because webrev.1 assumes AlwaysPreTouch is enabled.
So at the time of commit, we pretouch as soon as commit is finished. And 
we can check actual node id here.

At webrev.2:
AlwaysPreTouch is NOT coupled with UseNUMA. We trust OS that the 
requested memory will be located on preferred node. i.e. we don't 
actually touch the memory.

> I think Stefan suggested the logging of mismatches between requested
> and actual numa node for a region should occur at region retirement.
> We could log mismatches there and correct the region's information.
>
> But see discussion above about
> G1RegionsSmallerThanCommitSizeMapper::commit_regions().  If
> AlwaysPreTouch is indeed required, then this code is okay.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/heapRegionSet.inline.hpp
>   156   HeapRegion * cur;
>
> s/HeapRegion */HeapRegion*/
Done

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>    38   static const uint InvalidNodeIndex = (uint)os::InvalidId;
>    39   static const uint AnyNodeIndex = (uint)os::AnyId;
>
> I complained about these in my review of webrev.0. These are making
> very strong assumptions about the values of the os Id values, for no
> good reason that I can see. You responded
>
> "But the intend is to make same value after casting for same meaning
> constants instead of randomly chosen ones."
>
> I don't buy that. There aren't any casts that I can see between NUMA
> ids and indexes. Nor should there be any such casts. If there were,
> I'd strongly question them, as being units mismatches.
Okay,
Fixed similar to your previous comment.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>    54   virtual const int* node_ids() const { static int dummy_id = 0; return &dummy_id; }
>
> dummy_id should be const.
>
> I would probably put that definition in the .cpp file. I've run into
> multiple compiler bugs with function scoped static variables in inline
> functions. Not recently, but I'm paranoid.
Good to know.
Done

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>    34 class G1MemoryNodeManager : public CHeapObj<mtGC> {
> ...
>    43   static G1MemoryNodeManager* create();
>
> Given that we have a factory function that should be used for
> creation, the constructor ought to be non-public.  It needs to be
> protected so the derived G1MemoryNodeManager can refer to it.
Changed to protected.

> A different approach would have G1MemoryNodeManager be abstract (with
> all virtuals but the destructor being pure), with hidden (possibly
> private nested) classes for the single-node and multi-node cases.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp
>    62   if (UseNUMA SOLARIS_ONLY(&& false)) {
>
> I thought we were only providing Linux support.  This seems like it
> would attempt to work (and probably fail somewhere later) on other
> platforms (anything not Linux or Solaris).
You are right.
Changed to use LINUX_ONLY() macro.
Please correct me if I misunderstood your point. :)

All platforms are allowed to set +UseNUMA and eventually get some 
benefit from UseNUMAInterleaving.
Treating Windows and Mac are easy because those have only one active 
node. However, Solaris may have multiple active nodes, so above line is 
added. I believe that is one of the simplest way to filter out Solaris 
case on top of existing filtering logic(active node check).

But as you pointed out, previous one is not that clear so changed to use 
LINUX_ONLY().

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp
>    64     G1NUMA* numa = new G1NUMA();
>    65
>    66     if (numa != NULL) {
>
> numa cannot be NULL here.
Done

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp
>    86 G1MemoryMultiNodeManager::~G1MemoryMultiNodeManager() {
>    87   delete _numa;
>    88 }
>
> This is leaving a stale pointer to the G1NUMA object in wherever
> G1NUMA::set_numa stashed it.
G1NUMA::_inst = NULL
is added at the dtor of G1NUMA because G1NUMA::set_numa() sets '_inst'.
Correct me if I misunderstood.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
> 1500   _mem_node_mgr(G1MemoryNodeManager::create()),
>
> Maybe this manager should be created in G1CH::initialize() (and
> initialized here to NULL).
No.
We need the number of active node ids at G1Allocator::G1Allocator.
This is the reason why G1MemoryNodeManager is created earlier. 
Previously HeapRegionManager also had dependency but it is removed now.

> Then the page_size could be passed to create, and there wouldn't be a
> need to later set the page size of the manager and pass that along to
> the G1NUMA, instead both getting it as a constructor argument.  Then
> the associated setters go away too.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1NUMA.hpp
>    95   // Gets a next valid numa id.
>    96   inline int next_numa_id();
>
> Appears to be unused.
Removed.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1MemoryNodeManager.cpp
>   102 uint G1MemoryMultiNodeManager::index_of_current_thread() const {
>   103   int node_id = os::numa_get_group_id();
>   104   return _numa->index_of_numa_id(node_id);
>   105 }
>
> Other than here, os::numa_xxx usage is encapsulated in G1NUMA, with
> the manager forwarding to the G1NUMA object as needed.  I suggest
> doing that here too.  (Note that this file doesn't #include os.hpp.)
> I think doing so eliminates the need for G1NUMA::index_of_numa_id(),
> which also seems like a good thing.
Done.
Added G1NUMA::index_of_current_thread() to remove os call.
However still we need G1NUMA::index_of_numa_id() which is used at 
G1NUMA::index_of_address(HeapWord*).

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1NUMA.cpp
>    92 void G1NUMA::touch_memory(address aligned_address, size_t size_in_bytes, uint numa_index) {
>
> Assert aligned_address is page aligned?
> Assert size_in_bytes is a page aligned?
Added 2 assertions as you commented.
The first one 'aligned_address' means page size aligned though.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1NUMA.cpp
>    42   memset(_numa_id_to_index_map,
>    43          G1MemoryNodeManager::InvalidNodeIndex,
>    44          sizeof(uint) * _len_numa_id_to_index_map);
>
> memset only works here because all bytes of InvalidNodeIndex happen to
> have the same value.  I would prefer an explicit fill loop rather than
> memset here.  Or a static assert on the value, but that's probably
> more code.
Changed to fill during loop.
I'm aware of this and the only reason of changing InvalidNodeIndex from 
0xfffe to 0xffff was to use memset here.
I was thinking you are okay with memset as you commented to use memset 
from your previous email. :)

webrev:
http://cr.openjdk.java.net/~sangheki/8220310/webrev.2/
http://cr.openjdk.java.net/~sangheki/8220310/webrev.2.inc
Testing: hs-tier1 ~ 5 +-UseNUMA

Thanks,
Sangheon

> ------------------------------------------------------------------------------
>