Review Request: UseNUMAInterleaving #6

Wed Aug 24 16:26:54 UTC 2011

I believe I have addressed ramki's comments with 
http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.06/

-- Tom

> -----Original Message-----
> From: Y. S. Ramakrishna [mailto:y.s.ramakrishna at oracle.com]
> Sent: Tuesday, August 23, 2011 6:51 PM
> To: Deneau, Tom
> Cc: hotspot-gc-dev at openjdk.java.net
> Subject: Re: Review Request: UseNUMAInterleaving #4
> 
> Hi Tom -- the perf improvement on windows is impressive.
> 
> The changes look good. Just a few very minor nits below:
> 
> globals.hpp: In the doc string field for NUMAInterleaveGranularity, you
> might state that this is a Windows only option. (although i recognize
> that this hasn't been done for some of the other windows options that
> i became aware of now as being used exclusively in windows before
> your changes, for instance: UseLargePagesIndividualAllocation
> and LargePagesIndividualAllocationInjectError).
> 
> arguments.cpp: you could get rid of the empty lines 1432-1433, and move
> the
> content of 1428-1430 into the if-scope of 1422-1426.
> 
> os_windows.cpp: you can probably get rid of the extra newline
> introduced at line 1967.
> 
> line 3018, typo: "NUMAInterleavaing"
> also at line 3033: "thNUMANodeListHolderat"
> The comment at lines 3030-3033 would also benefit
> from a few missing punctuation marks.
> 
> at lines 3040 and 3043, it might read better to place the returns
> on lines of their own.
> 
> If you run with +UseNUMAInterleaving and a commit failed,
> it would seem that the error message at line 2987 would be
> confusing and incorrect. Perhaps you want to suitably modify
> it or just suppress the additional text in that case.
> 
> os_solaris.cpp: 2780-2784, it might make sense to do the madvise
> global/many
> call only if the mmap_chunk() succeeds, rather than all the time as you
> are doing. May be something like:-
> 
> 2780   char *res = Solaris::mmap_chunk(addr, size, MAP_PRIVATE|MAP_FIXED,
> prot);
> 2781   if (res != NULL) {
>           if (UseNUMAInterleaving) {
> 2782       numa_make_global(res, size);
>           }
>           return true;
> 2783   }
> 2784   return false;
> 
> At line 3444, would it make sense to use "size" instead of "bytes"
> (although
> size is just a copy of bytes -- i don't understand the reason for making
> the copy, so feel free to ignore if this is some recherche style issue;
> otherwise
> it might make sense to get rid of the copy and just use the formal
> parameter as is
> the case for the Linux code; although this is really not code that you
> introduced,
> but just because you happen to be touching code in the vicinity... your
> choice.)
> 
> In the same vein, i'd make the Linux code similar in shape to
> the solaris code for the two hunks changed in os_linux.cpp.
> 
> rest looks good.
> -- ramki
> 
> On 08/23/11 12:59, Deneau, Tom wrote:
> > OK, http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.05/
> > should address the concerns listed below...
> >
> > -- Tom
> >
> >
> >> -----Original Message-----
> >> From: Igor Veresov [mailto:igor.veresov at oracle.com]
> >> Sent: Tuesday, August 23, 2011 1:53 PM
> >> To: Deneau, Tom
> >> Cc: hotspot-gc-dev at openjdk.java.net
> >> Subject: Re: Review Request: UseNUMAInterleaving #4
> >>
> >>  Tom,
> >>
> >> This looks good to me, except three minor things:
> >>
> >> os_windows.cpp:
> >>
> >> - you should check for null here:
> >> 2630 ~NUMANodeListHolder() {
> >>> if (_numa_used_node_list != NULL) {
> >> 2631 FREE_C_HEAP_ARRAY(int, _numa_used_node_list);
> >>> }
> >> 2632 }
> >>
> >> - if NUMANodeListHolder::build() will be called multiple times, you'll
> >> leak memory. I guess you should check if _numa_used_node_list is NULL
> and
> >> if not free it first.
> >>
> >> - you didn't modify os::numa_get_leaf_groups() to handle the situation
> >> when the value of argument "size" is bigger than
> >> NUMANodeListHolder::get_count(). You can use MIN2 to adjust the value.
> >> See my comment in the previous mail.
> >>
> >>
> >> igor
> >>
> >> On Tuesday, August 23, 2011 at 11:23 AM, Deneau, Tom wrote:
> >>
> >>> Please review this patch which adds a new flag called
> >>> UseNUMAInterleaving. This flag provides a subset of the functionality
> >>> provided by UseNUMA. In Hotspot UseNUMA terminology,
> >>> UseNUMAInterleaved makes all memory "numa_global" which is
> implemented
> >>> as interleaved. This patch's main purpose is to provide that subset
> >>> on OSes like Windows which do not support the full UseNUMA
> >>> functionality. However, a simple implementation of
> UseNUMAInterleaving
> >> is
> >>> also provided for other OSes
> >>>
> >>> The situations where this shows the biggest benefits would be:
> >>>  * Windows platforms with multiple numa nodes (eg, 4)
> >>>
> >>>  * The JVM process is run across all the nodes (not affinitized to
> >>>  one node).
> >>>
> >>>  * A workload that has enough threads so that it uses the majority
> >>>  of the cores in the machine, so that the heap is being accessed
> >>>  from many cores, including remote ones.
> >>>
> >>>  * Enough memory per node and a heap size such that the default heap
> >>>  placement policy on windows would end up with the heap (or
> >>>  nursery) placed on one node.
> >>>
> >>> jbb2005 and SPECPower_ssj2008 are examples of such workloads. In our
> >>> measurements, we have seen some cases where the performance with
> >>> UseNUMAInterleaving was 2.7x vs. the performance without. There were
> >>> gains of varying sizes across all systems.
> >>>
> >>> The webrev is at
> >>> http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.04/
> >>>
> >>> Summary of changes in webrev.04 from webrev.03:
> >>>
> >>>  * As suggested by Igor Veresov, UseNUMA can imply
> >>>  UseNUMAInterleaving on all platforms. This is in arguments.cpp
> >>>
> >>>  * In NUMANodeListHolder in os_windows.cpp, allocates the node_list
> >>>  dynamically rather than assuming a length of 64. The method
> >>>  NUMANodeListHolder::get_node_list_entry checks returns -1 for
> >>>  indexes that are out of bounds.
> >>>
> >>>  * Several code convention cleanups suggested by Igor.
> >>>
> >>>  * Merge with the new style system dll function resolutions from
> >>>  "7016797: Hotspot: securely/restrictive load dlls and new API for
> >>>  loading system dlls" Note: my new NUMA functions are outside the
> >> ifdefs.
> >>>
> >>> Summary of changes in webrev.03 from webrev.02:
> >>>
> >>>  * As suggested by Igor Veresov, reverts to using
> >>>  UseNUMAInterleaving as the enabling flag. This will make it
> >>>  easier in the future when there are GCs that enable fuller
> >>>  UseNUMA on Windows.
> >>>
> >>>  * Adds a simple implementation of UseNUMAInterleaving on Linux and
> >>>  Solaris, which just calls numa_make_global after commit_memory
> >>>  and reserve_memory_special
> >>>
> >>>  * Adds a flag NUMAInterleaveGranularity which allows setting the
> >>>  granularity with which we move to a different node in a memory
> >>>  allocation. The default is 2MB. This flag only applies to
> >>>  Windows for now.
> >>>
> >>>  * Several code cleanups in os_windows.cpp suggested by Igor.
> >>>
> >>>
> >>> Summary of overall changes in os_windows.cpp:
> >>>
> >>>  * Some static routines were added to set things up init time. These
> >>>  * check that the required APIs (VirtualAllocExNuma,
> >>>  GetNumaHighestNodeNumber, GetNumaNodeProcessorMask) exist in
> >>>  the OS
> >>>
> >>>  * build the list of numa nodes on which this process has affinity
> >>>
> >>>  * Changes to os::reserve_memory
> >>>  * There was already a routine that reserved pages one page at a
> >>>  time (used for Individual Large Page Allocation on WS2003).
> >>>  This was abstracted to a separate routine, called
> >>>  allocate_pages_individually. This gets called both for the
> >>>  Individual Large Page Allocation thing mentioned above and for
> >>>  UseNUMAInterleaving (for both small and large pages)
> >>>
> >>>  * When used for NUMA Interleaving this just goes thru the numa
> >>>  node list in a round-robin fashion, allocating chunks at the
> >>>  NUMAInterleaveGranularity using a different allocation for
> >>>  each chunk
> >>>
> >>>  * Whether we do just a reserve or a combined reserve/commit is
> >>>  determined by the caller of allocate_pages_individually
> >>>
> >>>  * When used with large pages, we do a Reserve and Commit at
> >>>  the same time which is the way it always worked and the way
> >>>  it has to work on windows.
> >>>
> >>>  * For small pages, only the reserve is done, the commit will
> >>>  come later. (which is the way it worked for
> >>>  non-interleaved)
> >>>
> >>>  * os::commit_memory changes
> >>>  * If UseNUMAIntereaving is true, os::commit_memory has to check
> >>>  whether it was being asked to commit memory that might have
> >>>  come from multiple Reserve allocations, if so, the commits
> >>>  must also be broken up. We don't keep any data structure to
> >>>  keep track of this, we just use VirtualQuery which queries the
> >>>  properties of a VA range and can tell us how much came from
> >>>  one VirtualAlloc call.
> >>>
> >>> I do not have a bug id for this.
> >>>
> >>> -- Tom Deneau, AMD
> >>
> >