Review Request: UseNUMAInterleaving #4

Tue Aug 23 19:35:22 UTC 2011

OK, that makes things clearer, thanks.

> -----Original Message-----
> From: Igor Veresov [mailto:igor.veresov at oracle.com]
> Sent: Tuesday, August 23, 2011 2:33 PM
> To: Deneau, Tom
> Cc: hotspot-gc-dev at openjdk.java.net
> Subject: Re: Review Request: UseNUMAInterleaving #4
> 
> Yeah, but the semantics of numa_get_leaf_groups is the following:
> - the size that it is supplied as an argument is the maximum amount of
> elements that would fit in ids. It's the upper limit.
> - the returned value is the real number of elements in ids that make
> sense.
> 
> Returning the size large than NUMANodeListHolder::get_count() and
> sticking -1 into entries higher than that is wrong.
> Sorry about not making it clear how this function should work.
> 
> igor
> 
> On Tuesday, August 23, 2011 at 12:17 PM, Deneau, Tom wrote:
> 
> > Igor --
> >
> > For your comment on numa_get_leaf_groups, the bounds checking is done
> in NUMANodeListHolder::get_node_list_entry. Although I wasn't sure what
> you were supposed
> > to return for indices that are out of bounds, so I returned -1.
> >
> > I'll take care of the other cleanups.
> >
> > -- Tom
> >
> >
> >
> > > -----Original Message-----
> > > From: Igor Veresov [mailto:igor.veresov at oracle.com]
> > > Sent: Tuesday, August 23, 2011 1:53 PM
> > > To: Deneau, Tom
> > > Cc: hotspot-gc-dev at openjdk.java.net (mailto:hotspot-gc-
> dev at openjdk.java.net)
> > > Subject: Re: Review Request: UseNUMAInterleaving #4
> > >
> > >  Tom,
> > >
> > > This looks good to me, except three minor things:
> > >
> > > os_windows.cpp:
> > >
> > > - you should check for null here:
> > > 2630 ~NUMANodeListHolder() {
> > > > if (_numa_used_node_list != NULL) {
> > > 2631 FREE_C_HEAP_ARRAY(int, _numa_used_node_list);
> > > > }
> > > 2632 }
> > >
> > > - if NUMANodeListHolder::build() will be called multiple times,
> you'll
> > > leak memory. I guess you should check if _numa_used_node_list is NULL
> and
> > > if not free it first.
> > >
> > > - you didn't modify os::numa_get_leaf_groups() to handle the
> situation
> > > when the value of argument "size" is bigger than
> > > NUMANodeListHolder::get_count(). You can use MIN2 to adjust the
> value.
> > > See my comment in the previous mail.
> > >
> > >
> > > igor
> > >
> > > On Tuesday, August 23, 2011 at 11:23 AM, Deneau, Tom wrote:
> > >
> > > > Please review this patch which adds a new flag called
> > > > UseNUMAInterleaving. This flag provides a subset of the
> functionality
> > > > provided by UseNUMA. In Hotspot UseNUMA terminology,
> > > > UseNUMAInterleaved makes all memory "numa_global" which is
> implemented
> > > > as interleaved. This patch's main purpose is to provide that subset
> > > > on OSes like Windows which do not support the full UseNUMA
> > > > functionality. However, a simple implementation of
> UseNUMAInterleaving
> > > is
> > > > also provided for other OSes
> > > >
> > > > The situations where this shows the biggest benefits would be:
> > > >  * Windows platforms with multiple numa nodes (eg, 4)
> > > >
> > > >  * The JVM process is run across all the nodes (not affinitized to
> > > >  one node).
> > > >
> > > >  * A workload that has enough threads so that it uses the majority
> > > >  of the cores in the machine, so that the heap is being accessed
> > > >  from many cores, including remote ones.
> > > >
> > > >  * Enough memory per node and a heap size such that the default
> heap
> > > >  placement policy on windows would end up with the heap (or
> > > >  nursery) placed on one node.
> > > >
> > > > jbb2005 and SPECPower_ssj2008 are examples of such workloads. In
> our
> > > > measurements, we have seen some cases where the performance with
> > > > UseNUMAInterleaving was 2.7x vs. the performance without. There
> were
> > > > gains of varying sizes across all systems.
> > > >
> > > > The webrev is at
> > > > http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.04/
> > > >
> > > > Summary of changes in webrev.04 from webrev.03:
> > > >
> > > >  * As suggested by Igor Veresov, UseNUMA can imply
> > > >  UseNUMAInterleaving on all platforms. This is in arguments.cpp
> > > >
> > > >  * In NUMANodeListHolder in os_windows.cpp, allocates the node_list
> > > >  dynamically rather than assuming a length of 64. The method
> > > >  NUMANodeListHolder::get_node_list_entry checks returns -1 for
> > > >  indexes that are out of bounds.
> > > >
> > > >  * Several code convention cleanups suggested by Igor.
> > > >
> > > >  * Merge with the new style system dll function resolutions from
> > > >  "7016797: Hotspot: securely/restrictive load dlls and new API for
> > > >  loading system dlls" Note: my new NUMA functions are outside the
> > > ifdefs.
> > > >
> > > >
> > > > Summary of changes in webrev.03 from webrev.02:
> > > >
> > > >  * As suggested by Igor Veresov, reverts to using
> > > >  UseNUMAInterleaving as the enabling flag. This will make it
> > > >  easier in the future when there are GCs that enable fuller
> > > >  UseNUMA on Windows.
> > > >
> > > >  * Adds a simple implementation of UseNUMAInterleaving on Linux and
> > > >  Solaris, which just calls numa_make_global after commit_memory
> > > >  and reserve_memory_special
> > > >
> > > >  * Adds a flag NUMAInterleaveGranularity which allows setting the
> > > >  granularity with which we move to a different node in a memory
> > > >  allocation. The default is 2MB. This flag only applies to
> > > >  Windows for now.
> > > >
> > > >  * Several code cleanups in os_windows.cpp suggested by Igor.
> > > >
> > > >
> > > > Summary of overall changes in os_windows.cpp:
> > > >
> > > >  * Some static routines were added to set things up init time.
> These
> > > >  * check that the required APIs (VirtualAllocExNuma,
> > > >  GetNumaHighestNodeNumber, GetNumaNodeProcessorMask) exist in
> > > >  the OS
> > > >
> > > >  * build the list of numa nodes on which this process has affinity
> > > >
> > > >  * Changes to os::reserve_memory
> > > >  * There was already a routine that reserved pages one page at a
> > > >  time (used for Individual Large Page Allocation on WS2003).
> > > >  This was abstracted to a separate routine, called
> > > >  allocate_pages_individually. This gets called both for the
> > > >  Individual Large Page Allocation thing mentioned above and for
> > > >  UseNUMAInterleaving (for both small and large pages)
> > > >
> > > >  * When used for NUMA Interleaving this just goes thru the numa
> > > >  node list in a round-robin fashion, allocating chunks at the
> > > >  NUMAInterleaveGranularity using a different allocation for
> > > >  each chunk
> > > >
> > > >  * Whether we do just a reserve or a combined reserve/commit is
> > > >  determined by the caller of allocate_pages_individually
> > > >
> > > >  * When used with large pages, we do a Reserve and Commit at
> > > >  the same time which is the way it always worked and the way
> > > >  it has to work on windows.
> > > >
> > > >  * For small pages, only the reserve is done, the commit will
> > > >  come later. (which is the way it worked for
> > > >  non-interleaved)
> > > >
> > > >  * os::commit_memory changes
> > > >  * If UseNUMAIntereaving is true, os::commit_memory has to check
> > > >  whether it was being asked to commit memory that might have
> > > >  come from multiple Reserve allocations, if so, the commits
> > > >  must also be broken up. We don't keep any data structure to
> > > >  keep track of this, we just use VirtualQuery which queries the
> > > >  properties of a VA range and can tell us how much came from
> > > >  one VirtualAlloc call.
> > > >
> > > > I do not have a bug id for this.
> > > >
> > > > -- Tom Deneau, AMD
> 
>