RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3)

Tue Sep 24 09:02:19 UTC 2019

Hi Sangheon,

On 2019-09-24 00:33, sangheon.kim at oracle.com wrote:
> Hi Stefan,
> 
> Many thanks for reviewing and discussing this!
> And please allow me to post next webrev with more comments.

Yes, no rush.

> 
> On 9/23/19 8:14 AM, Stefan Johansson wrote:
>> Hi Sangheon,
>>
>> Thanks for the updated webrev, I like where we are heading with this. 
>> I still have some comments but most of them are fairly small. The only 
>> big thing is that I still think we could simplify things a lot by not 
>> passing down node_index to the lower level commit functions and 
>> instead just focus on keeping a strict manual interleaving of 
>> regions/pages (whichever is largest). I know this wouldn't allow us to 
>> expand by a single region on a specific node, but since this is a 
>> fairly uncommon case I think we could live with that.
>>
>> src/hotspot/share/gc/g1/g1Allocator.*
>> -------------------------------------
>> There are a few calls like this:
>> uint node_index = _g1h->mem_node_mgr()->index_of_current_thread();
>> I would like a private helper to wrap this, something like 
>> current_node_index().
> Will do.
> 
>> ---
>>
>> hotspot/share/gc/g1/g1Arguments.cpp
>> -----------------------------------
>>  161   if (UseNUMA) {
>>  162     if (FLAG_IS_DEFAULT(AlwaysPreTouch)) {
>>  163       FLAG_SET_DEFAULT(AlwaysPreTouch, true);
>>  164     }
>>  165     if (!AlwaysPreTouch && FLAG_IS_CMDLINE(AlwaysPreTouch)) {
>>  166       warning("Disabling AlwaysPreTouch is incompatible with 
>> UseNUMA. Disabling UseNUMA.");
>>  167       FLAG_SET_ERGO(UseNUMA, false);
>>  168     }
>>  169   }
>>
>> I'm not sure I understand why we have to enforce AlwaysPreTouch when 
>> NUMA is enabled. This will improve performance, but I think the NUMA 
>> support should work even without it. Or am I missing something?
> When we allocate a new region, we have to know the node index of a 
> region came from free list.
> However, without pretouch, we can't know whether the region is properly 
> located or not. i.e. when checking node id via 
> os::numa_get_address_id(), we will get invalid id even though we called 
> numa_make_local() because the page (of the bottom of heap region) is not 
> yet faulted.
> 
> We may set HeapRegion::_node_index with preferred index after calling 
> numa_make_local() ( without actually checking the node id). But not sure 
> whether it is a way to go.
> If you are suggesting this, maybe we need more discussions and tests. So 
> unless others have strong opinion on this, I would like to address in a 
> separate CR.
> 
I think we should go with trusting the OS and set the preferred index on 
the HeapRegion and then we can have our verification/check on what 
actual node we ended up on during a safepoint. But we can discuss this 
more. I'm not sure how much more testing this requires, in the end we 
cannot know what the OS will do underneath.

>> ---
>>
>> src/hotspot/share/gc/g1/g1CollectedHeap.*
>> -----------------------------------------
>> As we discussed offline I'm not sure we need pass the node_index down 
>> in the call to expand. Since we're not strict in giving out memory 
>> from the correct node during the mutator phase, I think we could do 
>> the same during the GC in the case we need to expand.
> Yeah, the only difference would be: unlike the mutator case, we even 
> don't try to get the identical node and expect to get random one. I do 
> understand this case would be rare, but I think doing our best to match 
> is better even though we have to sacrifice a little bit of code complexity.
> 
I see you point, but I don't fully agree. If the case is rare, or we 
can't show that handing out a region with the correct index is giving a 
benefit I think we should leave this as a separate enhancement later on. 
Doing so we might even be able to design it in a smarter way.

>> ---
>>
>> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp
>> -------------------------------------------------
>>  162     G1NodeDistributor itr(node_index);
>>  163
>>  164     for (uint i = start_idx; i < start_idx + num_regions; i++) {
>>  165       // If there are many pages to touch, different node ids 
>> will be used.
>>  166       uint processed_node_index = itr.next_node_index();
>>  ...
>>  179         zero_filled = _storage.commit(idx, 1, processed_node_index);
>>  180         itr.next();
>>  181       }
>>
>> Do we really need to do this? Can't we just rely on the code in 
>> G1NUMA::touch_memory_roundrobin() to do the right thing? Or is this 
>> just to allow committing a single region on the correct17134 node?
> Yes, this is to commit a single region and touch_memory_roundrobin() is 
> not called.
> i.e. HeapRegion is smaller than page size, so we commit one page at once 
> and skip for next region until the page is filled with region. e.g. 2MB 
> page size, 1MB heap region. every 2 commit for heap regions, 1 page will 
> be committed.
> However, your comment reminded me to enhance this area. Instead of 
> starting from the first node, we can just call G1NUMA::next_numa_index() 
> at G1NodeDistributor::next().
> 
Ok, I see, as long as we need to be able to commit to a specific node 
this is needed.

>> ---
>>
>> src/hotspot/share/gc/g1/g1_globals.hpp
>> --------------------------------------
>> Instead of adding those flags I think we should use log-tags and guard 
>> the logic by using log_is_enabled(), or LogTarget::is_enable() as you 
>> do in heapRegionManager.cpp. To allow these to be excluded with other 
>> logging turned on I suggest using specific log-tag-sets for each of 
>> them something like:
>> G1PrintNUMAIdOfHeapRegions -> gc, numa, region (debug or trace)
>> G1VerifyNUMAIdOfHeapRegions -> gc, numa, region, verification (debug 
>> or trace)
> Will do as you suggested.
> Kim also suggested same one but I wanted to hear opinions. :)
> 
> G1PrintNUMAIdOfHeapRegions -> gc, numa, region (debug)
> G1VerifyNUMAIdOfHeapRegions -> gc, numa, region, verification (debug)
> 
>> ---
>>
>> src/hotspot/share/gc/g1/heapRegionManager.cpp
>> ---------------------------------------------
>>  112   if (mgr->num_active_nodes() > 1) {
>>  113     uint valid_node_index = 
>> mgr->valid_node_index(requested_node_index);
>>  114     // Try to allocate with requested node index.
>>  115     hr = _free_list.remove_region_with_node_index(from_head, 
>> valid_node_index, NULL);
>>  116   }
>>
>> I've seen some problems in my testing with this code, 
>> valid_node_index() now call _numa->next_numa_index() when the index 
>> isn't valid. This means that both the lower level commit code will 
>> call this function as well as the allocation path in some cases (for 
>> example single region humongous). I'm not sure if this will lead to 
>> any real problems but it makes it harder to understand what is going 
>> on underneath and I've seen it lead to having imbalance over the 
>> nodes. To avoid this I think we could instead do:
>>
>> if (mgr->num_active_nodes() > 1 &&
>>     mgr->is_valid_node_index(requested_node_index) {
>>   hr = _free_list.remove_region_with_node_index(from_head, 
>> requested_node_index, NULL);
>> }
>>
>> This way we only remove with index if we have a valid index, otherwise 
>> we get a HeapRegion from the code below, which should be more or less 
>> the same as taking whatever index and removing with that:
>>  118   if (hr == NULL) {
>>  ...
>>  121     hr = _free_list.remove_region(from_head);
>>  122   }
>>
>> If this is an ok approach I think valid_node_index() is not longer 
>> used and can be removed.
> Looks nice suggestion.
> So instead of relying on G1NUMA::next_numa_index(), just using the first 
> region would relief the node imbalance. I like this but let me back with 
> some testing results.
> 
>> ---
>>
>> src/hotspot/share/gc/g1/g1NUMA.inline.hpp
>> -----------------------------------------
>> G1NUMA::next_numa_id() seems to be unused and could then be removed.
> Will remove.
> 
>> ---
>>
>> I'll continue looking at the patch and taking it for some spins. I 
>> might come with additional feedback later on, but these are my initial 
>> findings.
> Okay!
> 
> Thanks,
> Sangheon
> 
> 
>>
>> Thanks,
>> Stefan
>>
>>
>> On 2019-09-21 07:19, sangheon.kim at oracle.com wrote:
>>> Hi Kim,
>>>
>>> Many thanks for this thorough review!
>>>
>>> On 9/16/19 4:12 PM, Kim Barrett wrote:
>>>>> On Sep 4, 2019, at 3:16 AM, sangheon.kim at oracle.com wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Please review this patch, making G1 NUMA aware.
>>>>> This is the first part of G1 NUMA implementation.
>>>>>
>>>>> - At G1 CollectedHeap initialization time, the given heap regions 
>>>>> will be split and touched.
>>>>> - Internally node index(uint) is used instead of node id(int type) 
>>>>> for easier access to arrays.
>>>>> - Only Linux is supported.
>>>>> - New logs are under gc+heap+numa
>>>>> - Current memory touch implementation will be replaced with 
>>>>> WorkGang by JDK-8230411: NUMA aware work gang.
>>>>>
>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8220310
>>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.0
>>>>> Testing: hs-tier 1 ~ 5 with +- UseNUMA.
>>>>>
>>>>> Thanks,
>>>>> Sangheon
>>> I had long off-line discussion with Kim, Thomas and Stefan J.
>>> And 2 major changes are:
>>>
>>> 1) Memory touch thread is removed. This was a general way to 
>>> implement for both Linux and Solaris. However, Solaris is not 
>>> supported anymore, so memory touch thread is just removed.
>>> And this made the code much simpler.
>>>
>>> 2) G1NUMA doesn't manage numa id of each page and we don't care about 
>>> each page instead HeapRegion is only cared. i.e. we assume numa id of 
>>> all pages between bottom and end of HeapRegion to be same. This also 
>>> made related codes much simpler.
>>>
>>> There are some other changes as well.
>>>
>>>
>>>> Here is an initial set of comments. Some of these might be nonsense; 
>>>> I'm
>>>> not all that familiar with NUMA support.
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>>
>>>> Just in general, it's kind of disappointing that NUMA support for G1
>>>> apparently involves this much code, and is completely different from
>>>> ParallelGC NUMA support, and both are completely different from ZGC
>>>> NUMA support.  (And Shenandoah is probably yet another still.)
>>>> Oh well...
>>> Yes, I agree on that.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>>
>>>> According to off-line discussion, the intent is that a HeapRegion only
>>>> contains memory from one NUMA node.  That isn't obvious though, and 
>>>> there
>>>> are API and data structure choices that obscure that.  It seems like 
>>>> some
>>>> things could be simplified with that constraint in mind.  I think 
>>>> some of
>>>> that is a result of trying to leave open support for NUMA aware 
>>>> off-heap
>>>> data structures, but that isn't part of this change, and might be 
>>>> better
>>>> kept separate from NUMA aware Java heap support.
>>>>
>>>> I'll try to point out places where I think there are opportunities for
>>>> simplification.
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>>
>>>> According to off-line discussion, the touching mechanism might not be
>>>> necessary.  If that's true, that would be a significant 
>>>> simplification.  For
>>>> now I won't comment on the code for the touch threads, though I think
>>>> I found some problems there before the discussion suggesting we might
>>>> not need them happened.
>>> Thanks.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.hpp
>>>>    64 // Manages NUMA node related information and memory touch 
>>>> threads.
>>>> ...
>>>>
>>>> I found the comment describing G1NUMA pretty hard to parse. Even
>>>> after looking at the code I'm having trouble figuring out what's being
>>>> said.
>>>>
>>>> This part I was able to figure out, but think it could be worded 
>>>> better.
>>>>    70 // Also provides a way to convert NUMA id and NUMA 
>>>> index(starting from zero and sequatial).
>>>>    71 // Actual NUMA ids would not be sequential nor not start from 
>>>> zero, so a conversion is necessary
>>>>    72 // between id and index.
>>>>
>>>> Something like
>>>>
>>>>    Also provides a mapping between NUMA indices (sequential, starting
>>>>    from zero) and NUMA ids (which may not start at zero and may not be
>>>>    sequential).  Indices are generally used by clients, and mapped to
>>>>    ids when dealing with the OS/hardware layer.
>>> DONE
>>>
>>>>
>>>> (I *think* that's true.)
>>>>
>>>> But then, why expose node_ids at all in G1NUMA? It seems like node_ids
>>>> could be completely relegated to an implementation detail that clients
>>>> never need to deal with.
>>> ...
>>>> As evidence for not needing to expose ids to clients, there are no
>>>> callers of numa_id_of_index in any of the series of 3 patches.  And
>>>> G1NUMA::numa_ids() and G1MemoryMultNodeManager::node_ids() are only
>>>> used in a couple of places for printing/logging and by
>>>> WB_G1MemoryNodeIds.  I'm not sure what the last is for, but the
>>>> printing and logging code could be isolated into G1NUMA.
>>> Basically I totally agree with you for not exposing numa id.
>>> Current patches are only using node id for logging purpose at some 
>>> locations because users are only interested on node id.
>>>
>>> Also Thread::lgrp_id() (renaming lgrp id, numa id etc.. is different 
>>> topic that we had before) is also public, so my feeling is current 
>>> try is enough. :)
>>>
>>> WB_G1MemoryNodeIds() is for jtreg testing purpose which checks 
>>> whether the given memory has node id as we expected.
>>>
>>>>
>>>> Also, it looks like the index to node_id mapping provided here is
>>>> duplicating the information provided by os::Linux::nindex_to_node.
>>>> Not that this necessarily helps with anything in generic code...
>>> Yes, I'm aware of _nindex_to_node but I think maintaining such 
>>> conversion at G1NUMA is okay. Other NUMA implementation such as 
>>> parallel gc relies on node id so completely removing node id at 
>>> general code seems different scope.
>>> Unless you have strong opinion on addressing it in this patch, I 
>>> would like proceed as is.
>>> If it is the case, we can consider your suggestion as a separate 
>>> enhancement.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>>    57   virtual int* node_ids() const { static int dummy_id = 0; 
>>>> return &dummy_id; }
>>>>
>>>> Can the return type here be "const int*" ?  That would be better if 
>>>> possible.
>>>>
>>>> [This is probably moot if clients only deal with numa indexes and not
>>>> node_ids, as suggested earlier.]
>>> DONE
>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>>    63   virtual uint guarantee_valid_node_index(uint node_index) 
>>>> const { return 0; }
>>>>
>>>> Given naming conventions elsewhere, the name of this function makes me
>>>> expect it to be a verification function.
>>> Agree.
>>> valid_node_index() is better? webrev.1 changed name.
>>> Or are you suggesting to remove this and then the caller do such 
>>> thing? e.g.
>>> void A(uint node_index) {
>>> uint valid_node_index = node_index;
>>> if (!is_valid_node_index(valid_node_index)) {
>>>    valid_node_index = random_numa_index();
>>> ...
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.inline.hpp
>>>>    29 #include "gc/shared/taskqueue.inline.hpp"
>>>>
>>>> This #include seems to be just to obtain randomParkAndMiller. It 
>>>> would be
>>>> better to refactor that function into its own file(s). Or make it 
>>>> part of
>>>> some component providing various random number generators.
>>> Not need because of below change.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.inline.hpp
>>>>    45   int seed = randomParkAndMiller(_seed);
>>>>    46   Atomic::store(seed, &_seed);
>>>>
>>>> This sequence may lose updates.  I don't know if that's important.
>>>> And why isn't this just using os::random(), or maybe refactoring that
>>>> slightly to gain access to the external seed version.  I don't want 
>>>> this
>>>> change to be made:
>>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.0/src/hotspot/share/gc/shared/taskqueue.inline.hpp.udiff.html 
>>>>
>>> Thomas and Stefan J. also complained on this.
>>> Changed to get next numa index with round-robin manner. i.e. doesn't 
>>> use random() at all.
>>>
>>>> And do we *really* need randomness here at all?  Is there a reason not
>>>> to simply cycle through the nodes?
>>> Not really.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3016   size_t num_pages_left = size_in_bytes / page_size;
>>>>
>>>> Is size_in_bytes required to be a multiple of page_size?  I believe 
>>>> it is,
>>>> and an assert of that would be useful.
>>> This code is removed. And yes to answer to your question.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3016   size_t num_pages_left = size_in_bytes / page_size;
>>>> 3017   const size_t num_pages_total = num_pages_left;
>>>> 3018   size_t num_buffer_entries = MIN2(num_pages_left, max_pages);
>>>>
>>>> I think clearer would be
>>>>
>>>> 3017   const size_t num_pages_total = size_in_bytes / page_size;
>>>> 3018   size_t num_buffer_entries = MIN2(num_pages_total, max_pages);
>>>>
>>>> and move this down to the loop setup:
>>>> 3016   size_t num_pages_left = num_pages_total;
>>> This code is removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3020   void** pages = (void**)alloca(sizeof(void*) * 
>>>> num_buffer_entries);
>>>> 3021   int* status = (int*)alloca(sizeof(int) * num_buffer_entries);
>>>>
>>>> I think a ResourceMark and NEW_RESOURCE_ARRAYs would be preferable to
>>>> using alloca() here.  alloca() has some dangers and I think is 
>>>> currently
>>>> only used in some very specialized places where it really is needed.
>>> This code is removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3061 }
>>>> 3062 // Moves current thread on a specific node and it will not 
>>>> migrate to
>>>>
>>>> Missing blank line between functions.
>>> This code is removed.
>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3031       pages[i] = (void*)base;
>>>>
>>>> Unnecessary cast.
>>> This code is removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3072     case AFFINITY_NONE:
>>>> 3073       // Do nothing.
>>>> 3074       return true;
>>>> 3075       break;
>>>>
>>>> Shouldn't this be calling numa_run_on_node with -1 node id? The
>>>> documentation for numa_run_on_node() says "Passing -1 permits the
>>>> kernel to schedule on all nodes again."
>>> My understand of that description is let kernel decide the numa node 
>>> which is not actually want here. We want to move current thread to 
>>> the preferred numa node.
>>>
>>> Currently we don't need this API (numa_set_affinity() ) so removed on 
>>> webrev.1.
>>> However following patch which is not scope of this JEP will use it.
>>> I will add your suggestion below  when we need this.
>>>
>>>>
>>>> But see also next comment about numa_affinity_set.
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/runtime/os.hpp
>>>>   401   // Available affinities.
>>>>   402   enum NumaAffinity {
>>>>   403     AFFINITY_STRONG,
>>>>   404     AFFINITY_NONE
>>>>   405   };
>>>>   406   static bool numa_affinity_set(Thread* thread, int numa_id, 
>>>> NumaAffinity affinity);
>>>>
>>>> Could there ever be other affinities?  This API doesn't seem right.
>>>> It seems to me it should be something like
>>>>
>>>>    static bool numa_set_affinity(Thread* thread, int numa_id);
>>>>    static bool numa_clear_affinity(Thread* thread);
>>>>
>>>> No need for NumaAffinity enum.
>>> The enum is also kind of remainder after removing Solaris 
>>> implementation which has more types.
>>> But I agree with you as there are only 2 types now, it is simpler to 
>>> separate API without enum.
>>>
>>>>
>>>> (I think numa_set/clear_affinity is better than 
>>>> numa_affinity_set/clear;
>>>> numa is a "namespace" and we usually set/clear/get_something rather
>>>> than something_set/clear/get.)
>>>>
>>>> Or maybe this should just be
>>>>
>>>>    static bool numa_set_affinity(Thread* thread, int numa_id);
>>>>
>>>> with "clearing" provided by passing AnyId as the numa_id? Still don't
>>>> need the NumaAffinity enum for this.
>>> Agree on this naming stuff.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>>   124   guarantee(rs.size() % page_size == 0, "Reserved space size 
>>>> (" SIZE_FORMAT ") should be aligned "
>>>>   141   guarantee(size_in_bytes % page_size == 0, "The given range 
>>>> (" SIZE_FORMAT ") should be aligned "
>>>>   209     guarantee((uintptr_t)G1HeapRegionSize % page_size == 0, 
>>>> "G1HeapRegionSize (" SIZE_FORMAT ") and page size ("
>>>>   234     guarantee((uintptr_t)_page_size % _heap_region_size == 0, 
>>>> "G1HeapRegionSize (" SIZE_FORMAT ") and page size ("
>>>>
>>>> Use is_aligned.  I think casts shouldn't be needed.
>>> These lines are removed after major refactoring.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>>    43   _numa_id_to_index_map = NEW_C_HEAP_ARRAY_RETURN_NULL(uint, 
>>>> _len_numa_id_to_index_map, mtGC);
>>>>    44   memset(_numa_id_to_index_map, 0, sizeof(uint) * 
>>>> _len_numa_id_to_index_map);
>>>>
>>>> Using NEW_xxx_RETURN_NULL and then ignoring the possibility of a NULL
>>>> result.  It would be better to use the non-_RETURN_NULL variant and
>>>> get a decent error message rather than the segfault from attemptint to
>>>> write to NULL.
>>> Changed to use non-_RETURN_NULL version.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>>    34 void G1NUMA::init_numa_id_to_index_map(int* numa_ids, uint 
>>>> num_numa_ids) {
>>>>
>>>> int* numa_ids => const int* numa_ids.
>>> DONE
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>>    34 void G1NUMA::init_numa_id_to_index_map(int* numa_ids, uint 
>>>> num_numa_ids) {
>>>> ...
>>>>    44   memset(_numa_id_to_index_map, 0, sizeof(uint) * 
>>>> _len_numa_id_to_index_map);
>>>> Rather than this memset and then writing to all of them later, instead
>>>> initialize all entries to InvalidNodeIndex and then assign from 
>>>> numa_ids.
>>> DONE
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryTouchThread.cpp
>>>>   261   _wakeup_monitor = new Monitor(Mutex::nonleaf, "Wakeup 
>>>> Monitor for Touch Control", true, Monitor::_safepoint_check_never);
>>>>   262   _sync_monitor = new Monitor(Mutex::nonleaf, "Sync Monitor 
>>>> for Touch Control", true, Monitor::_safepoint_check_never);
>>>>
>>>> For singleton mutex/monitor, we generally prefer to put them in 
>>>> mutexLocker.
>>>> Probably passed as constructor arguments to G1NUMAThreadsControl.  And
>>>> I know there are counter-examples.
>>> Not necessary as memory touch thread is gone.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>>    30 extern const uint AnyNodeIndex;
>>>>    31 extern const uint InvalidNodeIndex;
>>>>
>>>> Why are these at namespace scope.  And they seem misplaced here.
>>>> Seems like they should belong to the G1NUMA class, which is
>>>> responsible for indexing nodes and mapping between indexes and ids.
>>> Changed to G1MemoryNodeManager::AnyNodeIndex.
>>> And I wanted to hide or minimize exposing G1NUMA in general.
>>> Memory node manager also has index related APIs, so let manager have 
>>> such constants seems okay.
>>>
>>>>
>>>>    30 const uint AnyNodeIndex = (uint)os::AnyId;
>>>>    31 const uint InvalidNodeIndex = (uint)os::InvalidId;
>>>> And why these values?  That seems like a units mismatch.  Just use
>>>> UINT_MAX and UINT_MAX - 1.
>>> I don't have strong opinion on this.
>>> But the intend is to make same value after casting for same meaning 
>>> constants instead of randomly chosen ones.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.hpp
>>>>    80   // For invalid numa id, return InvalidNodeIndex. So the 
>>>> caller need exception handling.
>>>>
>>>> "need exception handling"?
>>> Thomas also commented on it so reworded. But finally the class is 
>>> removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionManager.cpp
>>>>   114     uint allocated_node_index;
>>>>
>>>> This variable gets assigned via an out parameter here:
>>>>   116     hr = _free_list.remove_region_with_node_index(from_head, 
>>>> valid_node_index, &allocated_node_index);
>>>>
>>>> and by assignment here:
>>>>   122         allocated_node_index = 
>>>> mgr->node_index_of_address(hr->bottom());
>>>>
>>>> but is never used. If the lack of use is not itself a mistake, then 
>>>> the out
>>>> parameter of remove_region_with_node_index appears to be unnecessary.
>>> The variable will be used at patch3 (JDK-8220312) but slightly 
>>> changed to allow NULL at allocated_node_index parameter at 
>>> remove_region_with_node_index().
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionSet.inline.hpp
>>>>   151 inline HeapRegion* 
>>>> FreeRegionList::remove_region_with_node_index(bool from_head,
>>>>
>>>> As written, I think this might always return NULL if 
>>>> requested_node_index is
>>>> not a valid index. I think callers ensure that it's valid, but an 
>>>> assert of
>>>> the argument would be help.
>>> Done
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionSet.inline.hpp
>>>>   175     if (cur == NULL || cur_depth >= max_search_depth) { // not 
>>>> found
>>>>
>>>> There's an off-by-one issue here. The max_search_depth time through the
>>>> while-control-expression may have found a matching region, checked 
>>>> via the
>>>> numa_index_of_address lookup.
>>>>
>>>> Either the test should be "current_numa_index != 
>>>> requested_numa_index" or
>>>> the loop should be changed.
>>> This code is removed.
>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionSet.inline.hpp
>>>>   151 inline HeapRegion* 
>>>> FreeRegionList::remove_region_with_node_index(bool from_head,
>>>>
>>>> I think the list manipulation could be simplified quite a bit.  I 
>>>> think we
>>>> don't need the "last" variable.  And the splicing out of cur can be 
>>>> unified
>>>> between the from_head and !from_head cases, e.g. something like this:
>>>>
>>>>    // Find the region to use, searching from _head or _tail as 
>>>> requested.
>>>>    if (from_head) {
>>>>      for (cur = _head;
>>>>           cur != NULL && cur_depth < max_search_depth;
>>>>           cur = cur->next(), ++cur_depth) {
>>>>        if (current_numa_index == 
>>>> numa->numa_index_of_address(cur->bottom)) {
>>>>          break;
>>>>        }
>>>>      }
>>>>    } else {
>>>>      for (cur = _tail;
>>>>           cur != NULL && cur_depth < max_search_depth;
>>>>           cur = cur->prev(), ++cur_depth) {
>>>>        if (current_numa_index == 
>>>> numa->numa_index_of_address(cur->bottom)) {
>>>>          break;
>>>>        }
>>>>      }
>>>>    }
>>>>    if (cur == NULL || cur_depth >= max_search_depth) {
>>>>      return NULL;                      // Didn't find a region to use
>>>>    }
>>>>    // Splice the region out of the list.
>>>>    HeapRegion* prev = cur->prev();
>>>>    HeapRegion* next = cur->next();
>>>>    if (prev == NULL) {
>>>>      _head = next;
>>>>    } else {
>>>>      prev->set_next(next);
>>>>    }
>>>>    if (next == NULL) {
>>>>      _tail = prev;
>>>>    } else {
>>>>      next->set_prev(prev);
>>>>    }
>>>>    cur->set_prev(NULL);
>>>>    cur->set_next(NULL);
>>>>    ...
>>> Nice suggestion!
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>>   166 class DumpIterator : public StackObj {
>>>>
>>>> Polluting global namespace.  Suggest making DumpIterator and the 
>>>> classes
>>>> derived from it (G1HeapRegion{Larger,Smaller}Iterator) private nested
>>>> classes of G1NUMA, making the names unimportant outside of that 
>>>> context.
>>> Not need any more.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/heterogeneousHeapRegionManager.hpp
>>>>   122   virtual HeapRegion* allocate_free_region(HeapRegionType type);
>>>>   123   virtual HeapRegion* allocate_free_region(HeapRegionType 
>>>> type, uint node_index);
>>>>
>>>> The base class (HeapRegionManager) no longer has the one-arg 
>>>> function.  So
>>>> this one-arg function is no longer overriding the base class 
>>>> definition.
>>>> (Too bad it wasn't declared as C++11 "override".)
>>>>
>>>> I don't see any one-arg callers, so seems like it can just be removed.
>>> DONE
>>> Nice finding! :)
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1PageBasedNUMAVirtualSpace.cpp
>>>>
>>>> I dislike the new structure here, with G1PageBasedVirtualSpace now 
>>>> being
>>>> both a base class and a concrete class (which is often a problematic
>>>> design).  And the new derived class doesn't really do all that much.
>>>>
>>>> It seems like it would be much simpler to just have 
>>>> G1PageBasedVirtualSpace
>>>> check UseNUMA to conditionalize it's commit and uncommit functions.
>>>> Especially since it's commit already needs to have some knowlege of 
>>>> NUMA
>>>> because of the new numa_index argument.
>>> DONE
>>> Removed NUMA version and reverted related changes to allow inherit 
>>> are reverted.
>>> Now, G1PageBasedVirtualSpace does additional work after checking NUMA.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1PageBasedNUMAVirtualSpace.cpp
>>>>    52 void G1PageBasedNUMAVirtualSpace::uncommit(size_t start_page, 
>>>> size_t size_in_pages) {
>>>>
>>>> Do we actually need to do anything here?  We're resetting the
>>>> G1NUMA::_numa_id_of_pages_table (to AnyId), but it's not clear we 
>>>> really
>>>> need to do that.  When we later recommit any of those pages, that 
>>>> table will
>>>> be updated appropriately.
>>>>
>>>> Maybe rather than having G1NUMA expose quite so much detail it should
>>>> instead have post_commit and a pre_uncommit functions.
>>> I think clearing the node index may help for debugging. Unlike 
>>> webrev.0, numa id/index information is located at HeapRegion which is 
>>> much simpler. i.e. HeapRegionManager::uncommit_regions() does 
>>> resetting node index to InvalidNodeIndex.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.hpp
>>>>    75   G1NUMAPageIdTable _numa_id_of_pages_table;
>>>>
>>>> I was surprised by the need for this, but I can't find anything in 
>>>> the NUMA
>>>> API that provides this information.  That seems strange, because one 
>>>> would
>>>> think that information must exist somewhere in the NUMA implementation.
>>> Removed.
>>> And this made the patch much simpler!
>>>
>>> And here are your comments from other email.
>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>>    64   // Returns node index of the given address.
>>>>    65   // If the node id of the address is invalid return randomly 
>>>> selected node index.
>>>>    66   virtual uint valid_index_of_address(HeapWord* o) const { 
>>>> return 0; }
>>>>
>>>> This function is unused.
>>> Removed.
>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>>    70   virtual uint node_index_of_address(HeapWord* o) const { 
>>>> return 0; }
>>>>
>>>> The only caller of this function is 
>>>> HeapRegionManager::allocate_free_region.
>>>> As noted in an earlier comment, the result of that call is unused.
>>> Removed as you commented but I had to revert as a part of 
>>> _numa_id_of_pages_table removal.
>>> Previously we retrieved the page info from _numa_id_of_pages_table, 
>>> but new version reads directly via system call.
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>>
>>>> As pointed out to me offline, there*are*  ways to query the address ->
>>>> associated numa node mapping.  So I'm going back and questioning the 
>>>> need
>>>> for _numa_id_of_pages_table and the code for managing it.
>>> As we discussed offline, I removed this table. So webrev.1 checks the 
>>> bottom of HeapRegion and then saves it at HeapRegion::_node_index. 
>>> However we do need a way to check node index of the given address and 
>>> so I added back.
>>>
>>>  From my testing, checking a numa id of one page seems okay to do on 
>>> runtime. Previously I did checking for all pages of given memory so 
>>> it was a bit slow.
>>>
>>> One thing that didn't include at webrev.1 is combining with existing 
>>> one.
>>> The newly introduced os::numa_get_address_id() is a copy from 
>>> zNUMA_linux.cpp. It may be unified but didn't try yet.
>>>
>>> webrev:
>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1
>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1.inc (this may 
>>> not help much! :) )
>>> Testing: hs-tier 1 ~ 5 (with/without UseNUMA)
>>>
>>> Thanks,
>>> Sangheon
>>>
>>>>
>>>> ------------------------------------------------------------------------------ 
>>>>
>>>>
>>>
>