RFR(XL): 8220310: Implementation: NUMA-Aware Memory Allocation for G1, Mutator (1/3)
Stefan Johansson
stefan.johansson at oracle.com
Tue Sep 24 09:02:19 UTC 2019
Hi Sangheon,
On 2019-09-24 00:33, sangheon.kim at oracle.com wrote:
> Hi Stefan,
>
> Many thanks for reviewing and discussing this!
> And please allow me to post next webrev with more comments.
Yes, no rush.
>
> On 9/23/19 8:14 AM, Stefan Johansson wrote:
>> Hi Sangheon,
>>
>> Thanks for the updated webrev, I like where we are heading with this.
>> I still have some comments but most of them are fairly small. The only
>> big thing is that I still think we could simplify things a lot by not
>> passing down node_index to the lower level commit functions and
>> instead just focus on keeping a strict manual interleaving of
>> regions/pages (whichever is largest). I know this wouldn't allow us to
>> expand by a single region on a specific node, but since this is a
>> fairly uncommon case I think we could live with that.
>>
>> src/hotspot/share/gc/g1/g1Allocator.*
>> -------------------------------------
>> There are a few calls like this:
>> uint node_index = _g1h->mem_node_mgr()->index_of_current_thread();
>> I would like a private helper to wrap this, something like
>> current_node_index().
> Will do.
>
>> ---
>>
>> hotspot/share/gc/g1/g1Arguments.cpp
>> -----------------------------------
>> 161 if (UseNUMA) {
>> 162 if (FLAG_IS_DEFAULT(AlwaysPreTouch)) {
>> 163 FLAG_SET_DEFAULT(AlwaysPreTouch, true);
>> 164 }
>> 165 if (!AlwaysPreTouch && FLAG_IS_CMDLINE(AlwaysPreTouch)) {
>> 166 warning("Disabling AlwaysPreTouch is incompatible with
>> UseNUMA. Disabling UseNUMA.");
>> 167 FLAG_SET_ERGO(UseNUMA, false);
>> 168 }
>> 169 }
>>
>> I'm not sure I understand why we have to enforce AlwaysPreTouch when
>> NUMA is enabled. This will improve performance, but I think the NUMA
>> support should work even without it. Or am I missing something?
> When we allocate a new region, we have to know the node index of a
> region came from free list.
> However, without pretouch, we can't know whether the region is properly
> located or not. i.e. when checking node id via
> os::numa_get_address_id(), we will get invalid id even though we called
> numa_make_local() because the page (of the bottom of heap region) is not
> yet faulted.
>
> We may set HeapRegion::_node_index with preferred index after calling
> numa_make_local() ( without actually checking the node id). But not sure
> whether it is a way to go.
> If you are suggesting this, maybe we need more discussions and tests. So
> unless others have strong opinion on this, I would like to address in a
> separate CR.
>
I think we should go with trusting the OS and set the preferred index on
the HeapRegion and then we can have our verification/check on what
actual node we ended up on during a safepoint. But we can discuss this
more. I'm not sure how much more testing this requires, in the end we
cannot know what the OS will do underneath.
>> ---
>>
>> src/hotspot/share/gc/g1/g1CollectedHeap.*
>> -----------------------------------------
>> As we discussed offline I'm not sure we need pass the node_index down
>> in the call to expand. Since we're not strict in giving out memory
>> from the correct node during the mutator phase, I think we could do
>> the same during the GC in the case we need to expand.
> Yeah, the only difference would be: unlike the mutator case, we even
> don't try to get the identical node and expect to get random one. I do
> understand this case would be rare, but I think doing our best to match
> is better even though we have to sacrifice a little bit of code complexity.
>
I see you point, but I don't fully agree. If the case is rare, or we
can't show that handing out a region with the correct index is giving a
benefit I think we should leave this as a separate enhancement later on.
Doing so we might even be able to design it in a smarter way.
>> ---
>>
>> src/hotspot/share/gc/g1/g1RegionToSpaceMapper.cpp
>> -------------------------------------------------
>> 162 G1NodeDistributor itr(node_index);
>> 163
>> 164 for (uint i = start_idx; i < start_idx + num_regions; i++) {
>> 165 // If there are many pages to touch, different node ids
>> will be used.
>> 166 uint processed_node_index = itr.next_node_index();
>> ...
>> 179 zero_filled = _storage.commit(idx, 1, processed_node_index);
>> 180 itr.next();
>> 181 }
>>
>> Do we really need to do this? Can't we just rely on the code in
>> G1NUMA::touch_memory_roundrobin() to do the right thing? Or is this
>> just to allow committing a single region on the correct17134 node?
> Yes, this is to commit a single region and touch_memory_roundrobin() is
> not called.
> i.e. HeapRegion is smaller than page size, so we commit one page at once
> and skip for next region until the page is filled with region. e.g. 2MB
> page size, 1MB heap region. every 2 commit for heap regions, 1 page will
> be committed.
> However, your comment reminded me to enhance this area. Instead of
> starting from the first node, we can just call G1NUMA::next_numa_index()
> at G1NodeDistributor::next().
>
Ok, I see, as long as we need to be able to commit to a specific node
this is needed.
>> ---
>>
>> src/hotspot/share/gc/g1/g1_globals.hpp
>> --------------------------------------
>> Instead of adding those flags I think we should use log-tags and guard
>> the logic by using log_is_enabled(), or LogTarget::is_enable() as you
>> do in heapRegionManager.cpp. To allow these to be excluded with other
>> logging turned on I suggest using specific log-tag-sets for each of
>> them something like:
>> G1PrintNUMAIdOfHeapRegions -> gc, numa, region (debug or trace)
>> G1VerifyNUMAIdOfHeapRegions -> gc, numa, region, verification (debug
>> or trace)
> Will do as you suggested.
> Kim also suggested same one but I wanted to hear opinions. :)
>
> G1PrintNUMAIdOfHeapRegions -> gc, numa, region (debug)
> G1VerifyNUMAIdOfHeapRegions -> gc, numa, region, verification (debug)
>
>> ---
>>
>> src/hotspot/share/gc/g1/heapRegionManager.cpp
>> ---------------------------------------------
>> 112 if (mgr->num_active_nodes() > 1) {
>> 113 uint valid_node_index =
>> mgr->valid_node_index(requested_node_index);
>> 114 // Try to allocate with requested node index.
>> 115 hr = _free_list.remove_region_with_node_index(from_head,
>> valid_node_index, NULL);
>> 116 }
>>
>> I've seen some problems in my testing with this code,
>> valid_node_index() now call _numa->next_numa_index() when the index
>> isn't valid. This means that both the lower level commit code will
>> call this function as well as the allocation path in some cases (for
>> example single region humongous). I'm not sure if this will lead to
>> any real problems but it makes it harder to understand what is going
>> on underneath and I've seen it lead to having imbalance over the
>> nodes. To avoid this I think we could instead do:
>>
>> if (mgr->num_active_nodes() > 1 &&
>> mgr->is_valid_node_index(requested_node_index) {
>> hr = _free_list.remove_region_with_node_index(from_head,
>> requested_node_index, NULL);
>> }
>>
>> This way we only remove with index if we have a valid index, otherwise
>> we get a HeapRegion from the code below, which should be more or less
>> the same as taking whatever index and removing with that:
>> 118 if (hr == NULL) {
>> ...
>> 121 hr = _free_list.remove_region(from_head);
>> 122 }
>>
>> If this is an ok approach I think valid_node_index() is not longer
>> used and can be removed.
> Looks nice suggestion.
> So instead of relying on G1NUMA::next_numa_index(), just using the first
> region would relief the node imbalance. I like this but let me back with
> some testing results.
>
>> ---
>>
>> src/hotspot/share/gc/g1/g1NUMA.inline.hpp
>> -----------------------------------------
>> G1NUMA::next_numa_id() seems to be unused and could then be removed.
> Will remove.
>
>> ---
>>
>> I'll continue looking at the patch and taking it for some spins. I
>> might come with additional feedback later on, but these are my initial
>> findings.
> Okay!
>
> Thanks,
> Sangheon
>
>
>>
>> Thanks,
>> Stefan
>>
>>
>> On 2019-09-21 07:19, sangheon.kim at oracle.com wrote:
>>> Hi Kim,
>>>
>>> Many thanks for this thorough review!
>>>
>>> On 9/16/19 4:12 PM, Kim Barrett wrote:
>>>>> On Sep 4, 2019, at 3:16 AM, sangheon.kim at oracle.com wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Please review this patch, making G1 NUMA aware.
>>>>> This is the first part of G1 NUMA implementation.
>>>>>
>>>>> - At G1 CollectedHeap initialization time, the given heap regions
>>>>> will be split and touched.
>>>>> - Internally node index(uint) is used instead of node id(int type)
>>>>> for easier access to arrays.
>>>>> - Only Linux is supported.
>>>>> - New logs are under gc+heap+numa
>>>>> - Current memory touch implementation will be replaced with
>>>>> WorkGang by JDK-8230411: NUMA aware work gang.
>>>>>
>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8220310
>>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8220310/webrev.0
>>>>> Testing: hs-tier 1 ~ 5 with +- UseNUMA.
>>>>>
>>>>> Thanks,
>>>>> Sangheon
>>> I had long off-line discussion with Kim, Thomas and Stefan J.
>>> And 2 major changes are:
>>>
>>> 1) Memory touch thread is removed. This was a general way to
>>> implement for both Linux and Solaris. However, Solaris is not
>>> supported anymore, so memory touch thread is just removed.
>>> And this made the code much simpler.
>>>
>>> 2) G1NUMA doesn't manage numa id of each page and we don't care about
>>> each page instead HeapRegion is only cared. i.e. we assume numa id of
>>> all pages between bottom and end of HeapRegion to be same. This also
>>> made related codes much simpler.
>>>
>>> There are some other changes as well.
>>>
>>>
>>>> Here is an initial set of comments. Some of these might be nonsense;
>>>> I'm
>>>> not all that familiar with NUMA support.
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>> Just in general, it's kind of disappointing that NUMA support for G1
>>>> apparently involves this much code, and is completely different from
>>>> ParallelGC NUMA support, and both are completely different from ZGC
>>>> NUMA support. (And Shenandoah is probably yet another still.)
>>>> Oh well...
>>> Yes, I agree on that.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>> According to off-line discussion, the intent is that a HeapRegion only
>>>> contains memory from one NUMA node. That isn't obvious though, and
>>>> there
>>>> are API and data structure choices that obscure that. It seems like
>>>> some
>>>> things could be simplified with that constraint in mind. I think
>>>> some of
>>>> that is a result of trying to leave open support for NUMA aware
>>>> off-heap
>>>> data structures, but that isn't part of this change, and might be
>>>> better
>>>> kept separate from NUMA aware Java heap support.
>>>>
>>>> I'll try to point out places where I think there are opportunities for
>>>> simplification.
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>> According to off-line discussion, the touching mechanism might not be
>>>> necessary. If that's true, that would be a significant
>>>> simplification. For
>>>> now I won't comment on the code for the touch threads, though I think
>>>> I found some problems there before the discussion suggesting we might
>>>> not need them happened.
>>> Thanks.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.hpp
>>>> 64 // Manages NUMA node related information and memory touch
>>>> threads.
>>>> ...
>>>>
>>>> I found the comment describing G1NUMA pretty hard to parse. Even
>>>> after looking at the code I'm having trouble figuring out what's being
>>>> said.
>>>>
>>>> This part I was able to figure out, but think it could be worded
>>>> better.
>>>> 70 // Also provides a way to convert NUMA id and NUMA
>>>> index(starting from zero and sequatial).
>>>> 71 // Actual NUMA ids would not be sequential nor not start from
>>>> zero, so a conversion is necessary
>>>> 72 // between id and index.
>>>>
>>>> Something like
>>>>
>>>> Also provides a mapping between NUMA indices (sequential, starting
>>>> from zero) and NUMA ids (which may not start at zero and may not be
>>>> sequential). Indices are generally used by clients, and mapped to
>>>> ids when dealing with the OS/hardware layer.
>>> DONE
>>>
>>>>
>>>> (I *think* that's true.)
>>>>
>>>> But then, why expose node_ids at all in G1NUMA? It seems like node_ids
>>>> could be completely relegated to an implementation detail that clients
>>>> never need to deal with.
>>> ...
>>>> As evidence for not needing to expose ids to clients, there are no
>>>> callers of numa_id_of_index in any of the series of 3 patches. And
>>>> G1NUMA::numa_ids() and G1MemoryMultNodeManager::node_ids() are only
>>>> used in a couple of places for printing/logging and by
>>>> WB_G1MemoryNodeIds. I'm not sure what the last is for, but the
>>>> printing and logging code could be isolated into G1NUMA.
>>> Basically I totally agree with you for not exposing numa id.
>>> Current patches are only using node id for logging purpose at some
>>> locations because users are only interested on node id.
>>>
>>> Also Thread::lgrp_id() (renaming lgrp id, numa id etc.. is different
>>> topic that we had before) is also public, so my feeling is current
>>> try is enough. :)
>>>
>>> WB_G1MemoryNodeIds() is for jtreg testing purpose which checks
>>> whether the given memory has node id as we expected.
>>>
>>>>
>>>> Also, it looks like the index to node_id mapping provided here is
>>>> duplicating the information provided by os::Linux::nindex_to_node.
>>>> Not that this necessarily helps with anything in generic code...
>>> Yes, I'm aware of _nindex_to_node but I think maintaining such
>>> conversion at G1NUMA is okay. Other NUMA implementation such as
>>> parallel gc relies on node id so completely removing node id at
>>> general code seems different scope.
>>> Unless you have strong opinion on addressing it in this patch, I
>>> would like proceed as is.
>>> If it is the case, we can consider your suggestion as a separate
>>> enhancement.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>> 57 virtual int* node_ids() const { static int dummy_id = 0;
>>>> return &dummy_id; }
>>>>
>>>> Can the return type here be "const int*" ? That would be better if
>>>> possible.
>>>>
>>>> [This is probably moot if clients only deal with numa indexes and not
>>>> node_ids, as suggested earlier.]
>>> DONE
>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>> 63 virtual uint guarantee_valid_node_index(uint node_index)
>>>> const { return 0; }
>>>>
>>>> Given naming conventions elsewhere, the name of this function makes me
>>>> expect it to be a verification function.
>>> Agree.
>>> valid_node_index() is better? webrev.1 changed name.
>>> Or are you suggesting to remove this and then the caller do such
>>> thing? e.g.
>>> void A(uint node_index) {
>>> uint valid_node_index = node_index;
>>> if (!is_valid_node_index(valid_node_index)) {
>>> valid_node_index = random_numa_index();
>>> ...
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.inline.hpp
>>>> 29 #include "gc/shared/taskqueue.inline.hpp"
>>>>
>>>> This #include seems to be just to obtain randomParkAndMiller. It
>>>> would be
>>>> better to refactor that function into its own file(s). Or make it
>>>> part of
>>>> some component providing various random number generators.
>>> Not need because of below change.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.inline.hpp
>>>> 45 int seed = randomParkAndMiller(_seed);
>>>> 46 Atomic::store(seed, &_seed);
>>>>
>>>> This sequence may lose updates. I don't know if that's important.
>>>> And why isn't this just using os::random(), or maybe refactoring that
>>>> slightly to gain access to the external seed version. I don't want
>>>> this
>>>> change to be made:
>>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.0/src/hotspot/share/gc/shared/taskqueue.inline.hpp.udiff.html
>>>>
>>> Thomas and Stefan J. also complained on this.
>>> Changed to get next numa index with round-robin manner. i.e. doesn't
>>> use random() at all.
>>>
>>>> And do we *really* need randomness here at all? Is there a reason not
>>>> to simply cycle through the nodes?
>>> Not really.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3016 size_t num_pages_left = size_in_bytes / page_size;
>>>>
>>>> Is size_in_bytes required to be a multiple of page_size? I believe
>>>> it is,
>>>> and an assert of that would be useful.
>>> This code is removed. And yes to answer to your question.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3016 size_t num_pages_left = size_in_bytes / page_size;
>>>> 3017 const size_t num_pages_total = num_pages_left;
>>>> 3018 size_t num_buffer_entries = MIN2(num_pages_left, max_pages);
>>>>
>>>> I think clearer would be
>>>>
>>>> 3017 const size_t num_pages_total = size_in_bytes / page_size;
>>>> 3018 size_t num_buffer_entries = MIN2(num_pages_total, max_pages);
>>>>
>>>> and move this down to the loop setup:
>>>> 3016 size_t num_pages_left = num_pages_total;
>>> This code is removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3020 void** pages = (void**)alloca(sizeof(void*) *
>>>> num_buffer_entries);
>>>> 3021 int* status = (int*)alloca(sizeof(int) * num_buffer_entries);
>>>>
>>>> I think a ResourceMark and NEW_RESOURCE_ARRAYs would be preferable to
>>>> using alloca() here. alloca() has some dangers and I think is
>>>> currently
>>>> only used in some very specialized places where it really is needed.
>>> This code is removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3061 }
>>>> 3062 // Moves current thread on a specific node and it will not
>>>> migrate to
>>>>
>>>> Missing blank line between functions.
>>> This code is removed.
>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3031 pages[i] = (void*)base;
>>>>
>>>> Unnecessary cast.
>>> This code is removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/os/linux/os_linux.cpp
>>>> 3072 case AFFINITY_NONE:
>>>> 3073 // Do nothing.
>>>> 3074 return true;
>>>> 3075 break;
>>>>
>>>> Shouldn't this be calling numa_run_on_node with -1 node id? The
>>>> documentation for numa_run_on_node() says "Passing -1 permits the
>>>> kernel to schedule on all nodes again."
>>> My understand of that description is let kernel decide the numa node
>>> which is not actually want here. We want to move current thread to
>>> the preferred numa node.
>>>
>>> Currently we don't need this API (numa_set_affinity() ) so removed on
>>> webrev.1.
>>> However following patch which is not scope of this JEP will use it.
>>> I will add your suggestion below when we need this.
>>>
>>>>
>>>> But see also next comment about numa_affinity_set.
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/runtime/os.hpp
>>>> 401 // Available affinities.
>>>> 402 enum NumaAffinity {
>>>> 403 AFFINITY_STRONG,
>>>> 404 AFFINITY_NONE
>>>> 405 };
>>>> 406 static bool numa_affinity_set(Thread* thread, int numa_id,
>>>> NumaAffinity affinity);
>>>>
>>>> Could there ever be other affinities? This API doesn't seem right.
>>>> It seems to me it should be something like
>>>>
>>>> static bool numa_set_affinity(Thread* thread, int numa_id);
>>>> static bool numa_clear_affinity(Thread* thread);
>>>>
>>>> No need for NumaAffinity enum.
>>> The enum is also kind of remainder after removing Solaris
>>> implementation which has more types.
>>> But I agree with you as there are only 2 types now, it is simpler to
>>> separate API without enum.
>>>
>>>>
>>>> (I think numa_set/clear_affinity is better than
>>>> numa_affinity_set/clear;
>>>> numa is a "namespace" and we usually set/clear/get_something rather
>>>> than something_set/clear/get.)
>>>>
>>>> Or maybe this should just be
>>>>
>>>> static bool numa_set_affinity(Thread* thread, int numa_id);
>>>>
>>>> with "clearing" provided by passing AnyId as the numa_id? Still don't
>>>> need the NumaAffinity enum for this.
>>> Agree on this naming stuff.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>> 124 guarantee(rs.size() % page_size == 0, "Reserved space size
>>>> (" SIZE_FORMAT ") should be aligned "
>>>> 141 guarantee(size_in_bytes % page_size == 0, "The given range
>>>> (" SIZE_FORMAT ") should be aligned "
>>>> 209 guarantee((uintptr_t)G1HeapRegionSize % page_size == 0,
>>>> "G1HeapRegionSize (" SIZE_FORMAT ") and page size ("
>>>> 234 guarantee((uintptr_t)_page_size % _heap_region_size == 0,
>>>> "G1HeapRegionSize (" SIZE_FORMAT ") and page size ("
>>>>
>>>> Use is_aligned. I think casts shouldn't be needed.
>>> These lines are removed after major refactoring.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>> 43 _numa_id_to_index_map = NEW_C_HEAP_ARRAY_RETURN_NULL(uint,
>>>> _len_numa_id_to_index_map, mtGC);
>>>> 44 memset(_numa_id_to_index_map, 0, sizeof(uint) *
>>>> _len_numa_id_to_index_map);
>>>>
>>>> Using NEW_xxx_RETURN_NULL and then ignoring the possibility of a NULL
>>>> result. It would be better to use the non-_RETURN_NULL variant and
>>>> get a decent error message rather than the segfault from attemptint to
>>>> write to NULL.
>>> Changed to use non-_RETURN_NULL version.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>> 34 void G1NUMA::init_numa_id_to_index_map(int* numa_ids, uint
>>>> num_numa_ids) {
>>>>
>>>> int* numa_ids => const int* numa_ids.
>>> DONE
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>> 34 void G1NUMA::init_numa_id_to_index_map(int* numa_ids, uint
>>>> num_numa_ids) {
>>>> ...
>>>> 44 memset(_numa_id_to_index_map, 0, sizeof(uint) *
>>>> _len_numa_id_to_index_map);
>>>> Rather than this memset and then writing to all of them later, instead
>>>> initialize all entries to InvalidNodeIndex and then assign from
>>>> numa_ids.
>>> DONE
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryTouchThread.cpp
>>>> 261 _wakeup_monitor = new Monitor(Mutex::nonleaf, "Wakeup
>>>> Monitor for Touch Control", true, Monitor::_safepoint_check_never);
>>>> 262 _sync_monitor = new Monitor(Mutex::nonleaf, "Sync Monitor
>>>> for Touch Control", true, Monitor::_safepoint_check_never);
>>>>
>>>> For singleton mutex/monitor, we generally prefer to put them in
>>>> mutexLocker.
>>>> Probably passed as constructor arguments to G1NUMAThreadsControl. And
>>>> I know there are counter-examples.
>>> Not necessary as memory touch thread is gone.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>> 30 extern const uint AnyNodeIndex;
>>>> 31 extern const uint InvalidNodeIndex;
>>>>
>>>> Why are these at namespace scope. And they seem misplaced here.
>>>> Seems like they should belong to the G1NUMA class, which is
>>>> responsible for indexing nodes and mapping between indexes and ids.
>>> Changed to G1MemoryNodeManager::AnyNodeIndex.
>>> And I wanted to hide or minimize exposing G1NUMA in general.
>>> Memory node manager also has index related APIs, so let manager have
>>> such constants seems okay.
>>>
>>>>
>>>> 30 const uint AnyNodeIndex = (uint)os::AnyId;
>>>> 31 const uint InvalidNodeIndex = (uint)os::InvalidId;
>>>> And why these values? That seems like a units mismatch. Just use
>>>> UINT_MAX and UINT_MAX - 1.
>>> I don't have strong opinion on this.
>>> But the intend is to make same value after casting for same meaning
>>> constants instead of randomly chosen ones.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.hpp
>>>> 80 // For invalid numa id, return InvalidNodeIndex. So the
>>>> caller need exception handling.
>>>>
>>>> "need exception handling"?
>>> Thomas also commented on it so reworded. But finally the class is
>>> removed.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionManager.cpp
>>>> 114 uint allocated_node_index;
>>>>
>>>> This variable gets assigned via an out parameter here:
>>>> 116 hr = _free_list.remove_region_with_node_index(from_head,
>>>> valid_node_index, &allocated_node_index);
>>>>
>>>> and by assignment here:
>>>> 122 allocated_node_index =
>>>> mgr->node_index_of_address(hr->bottom());
>>>>
>>>> but is never used. If the lack of use is not itself a mistake, then
>>>> the out
>>>> parameter of remove_region_with_node_index appears to be unnecessary.
>>> The variable will be used at patch3 (JDK-8220312) but slightly
>>> changed to allow NULL at allocated_node_index parameter at
>>> remove_region_with_node_index().
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionSet.inline.hpp
>>>> 151 inline HeapRegion*
>>>> FreeRegionList::remove_region_with_node_index(bool from_head,
>>>>
>>>> As written, I think this might always return NULL if
>>>> requested_node_index is
>>>> not a valid index. I think callers ensure that it's valid, but an
>>>> assert of
>>>> the argument would be help.
>>> Done
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionSet.inline.hpp
>>>> 175 if (cur == NULL || cur_depth >= max_search_depth) { // not
>>>> found
>>>>
>>>> There's an off-by-one issue here. The max_search_depth time through the
>>>> while-control-expression may have found a matching region, checked
>>>> via the
>>>> numa_index_of_address lookup.
>>>>
>>>> Either the test should be "current_numa_index !=
>>>> requested_numa_index" or
>>>> the loop should be changed.
>>> This code is removed.
>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/heapRegionSet.inline.hpp
>>>> 151 inline HeapRegion*
>>>> FreeRegionList::remove_region_with_node_index(bool from_head,
>>>>
>>>> I think the list manipulation could be simplified quite a bit. I
>>>> think we
>>>> don't need the "last" variable. And the splicing out of cur can be
>>>> unified
>>>> between the from_head and !from_head cases, e.g. something like this:
>>>>
>>>> // Find the region to use, searching from _head or _tail as
>>>> requested.
>>>> if (from_head) {
>>>> for (cur = _head;
>>>> cur != NULL && cur_depth < max_search_depth;
>>>> cur = cur->next(), ++cur_depth) {
>>>> if (current_numa_index ==
>>>> numa->numa_index_of_address(cur->bottom)) {
>>>> break;
>>>> }
>>>> }
>>>> } else {
>>>> for (cur = _tail;
>>>> cur != NULL && cur_depth < max_search_depth;
>>>> cur = cur->prev(), ++cur_depth) {
>>>> if (current_numa_index ==
>>>> numa->numa_index_of_address(cur->bottom)) {
>>>> break;
>>>> }
>>>> }
>>>> }
>>>> if (cur == NULL || cur_depth >= max_search_depth) {
>>>> return NULL; // Didn't find a region to use
>>>> }
>>>> // Splice the region out of the list.
>>>> HeapRegion* prev = cur->prev();
>>>> HeapRegion* next = cur->next();
>>>> if (prev == NULL) {
>>>> _head = next;
>>>> } else {
>>>> prev->set_next(next);
>>>> }
>>>> if (next == NULL) {
>>>> _tail = prev;
>>>> } else {
>>>> next->set_prev(prev);
>>>> }
>>>> cur->set_prev(NULL);
>>>> cur->set_next(NULL);
>>>> ...
>>> Nice suggestion!
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.cpp
>>>> 166 class DumpIterator : public StackObj {
>>>>
>>>> Polluting global namespace. Suggest making DumpIterator and the
>>>> classes
>>>> derived from it (G1HeapRegion{Larger,Smaller}Iterator) private nested
>>>> classes of G1NUMA, making the names unimportant outside of that
>>>> context.
>>> Not need any more.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/heterogeneousHeapRegionManager.hpp
>>>> 122 virtual HeapRegion* allocate_free_region(HeapRegionType type);
>>>> 123 virtual HeapRegion* allocate_free_region(HeapRegionType
>>>> type, uint node_index);
>>>>
>>>> The base class (HeapRegionManager) no longer has the one-arg
>>>> function. So
>>>> this one-arg function is no longer overriding the base class
>>>> definition.
>>>> (Too bad it wasn't declared as C++11 "override".)
>>>>
>>>> I don't see any one-arg callers, so seems like it can just be removed.
>>> DONE
>>> Nice finding! :)
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1PageBasedNUMAVirtualSpace.cpp
>>>>
>>>> I dislike the new structure here, with G1PageBasedVirtualSpace now
>>>> being
>>>> both a base class and a concrete class (which is often a problematic
>>>> design). And the new derived class doesn't really do all that much.
>>>>
>>>> It seems like it would be much simpler to just have
>>>> G1PageBasedVirtualSpace
>>>> check UseNUMA to conditionalize it's commit and uncommit functions.
>>>> Especially since it's commit already needs to have some knowlege of
>>>> NUMA
>>>> because of the new numa_index argument.
>>> DONE
>>> Removed NUMA version and reverted related changes to allow inherit
>>> are reverted.
>>> Now, G1PageBasedVirtualSpace does additional work after checking NUMA.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1PageBasedNUMAVirtualSpace.cpp
>>>> 52 void G1PageBasedNUMAVirtualSpace::uncommit(size_t start_page,
>>>> size_t size_in_pages) {
>>>>
>>>> Do we actually need to do anything here? We're resetting the
>>>> G1NUMA::_numa_id_of_pages_table (to AnyId), but it's not clear we
>>>> really
>>>> need to do that. When we later recommit any of those pages, that
>>>> table will
>>>> be updated appropriately.
>>>>
>>>> Maybe rather than having G1NUMA expose quite so much detail it should
>>>> instead have post_commit and a pre_uncommit functions.
>>> I think clearing the node index may help for debugging. Unlike
>>> webrev.0, numa id/index information is located at HeapRegion which is
>>> much simpler. i.e. HeapRegionManager::uncommit_regions() does
>>> resetting node index to InvalidNodeIndex.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1NUMA.hpp
>>>> 75 G1NUMAPageIdTable _numa_id_of_pages_table;
>>>>
>>>> I was surprised by the need for this, but I can't find anything in
>>>> the NUMA
>>>> API that provides this information. That seems strange, because one
>>>> would
>>>> think that information must exist somewhere in the NUMA implementation.
>>> Removed.
>>> And this made the patch much simpler!
>>>
>>> And here are your comments from other email.
>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>> 64 // Returns node index of the given address.
>>>> 65 // If the node id of the address is invalid return randomly
>>>> selected node index.
>>>> 66 virtual uint valid_index_of_address(HeapWord* o) const {
>>>> return 0; }
>>>>
>>>> This function is unused.
>>> Removed.
>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> src/hotspot/share/gc/g1/g1MemoryNodeManager.hpp
>>>> 70 virtual uint node_index_of_address(HeapWord* o) const {
>>>> return 0; }
>>>>
>>>> The only caller of this function is
>>>> HeapRegionManager::allocate_free_region.
>>>> As noted in an earlier comment, the result of that call is unused.
>>> Removed as you commented but I had to revert as a part of
>>> _numa_id_of_pages_table removal.
>>> Previously we retrieved the page info from _numa_id_of_pages_table,
>>> but new version reads directly via system call.
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>> As pointed out to me offline, there*are* ways to query the address ->
>>>> associated numa node mapping. So I'm going back and questioning the
>>>> need
>>>> for _numa_id_of_pages_table and the code for managing it.
>>> As we discussed offline, I removed this table. So webrev.1 checks the
>>> bottom of HeapRegion and then saves it at HeapRegion::_node_index.
>>> However we do need a way to check node index of the given address and
>>> so I added back.
>>>
>>> From my testing, checking a numa id of one page seems okay to do on
>>> runtime. Previously I did checking for all pages of given memory so
>>> it was a bit slow.
>>>
>>> One thing that didn't include at webrev.1 is combining with existing
>>> one.
>>> The newly introduced os::numa_get_address_id() is a copy from
>>> zNUMA_linux.cpp. It may be unified but didn't try yet.
>>>
>>> webrev:
>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1
>>> http://cr.openjdk.java.net/~sangheki/8220310/webrev.1.inc (this may
>>> not help much! :) )
>>> Testing: hs-tier 1 ~ 5 (with/without UseNUMA)
>>>
>>> Thanks,
>>> Sangheon
>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>
>
More information about the hotspot-gc-dev
mailing list