Review Request: UseNUMAInterleaving #4

Tue Aug 23 20:14:38 UTC 2011

John, it works precisely as you say. You just have to say UseNUMA and it will imply interleaving for the collectors that don't do any better.  The extra flag is for case when you don't want a full blown numa allocator but just want the interleaving.

igor

On Aug 23, 2011, at 12:45 PM, John Pampuch <john.pampuch at oracle.com> wrote:

> Could this be done without adding a new flag?  Eg, could we just detect that the
> platform doesn't support the full UseNUMA functionality, and only leverage what
> it does implement?
> 
> -John
> 
> On 8/23/11 11:23 AM, Deneau, Tom wrote:
>> 
>> Please review this patch which adds a new flag called
>> UseNUMAInterleaving.  This flag provides a subset of the functionality
>> provided by UseNUMA.  In Hotspot UseNUMA terminology,
>> UseNUMAInterleaved makes all memory "numa_global" which is implemented
>> as interleaved.  This patch's main purpose is to provide that subset
>> on OSes like Windows which do not support the full UseNUMA
>> functionality.  However, a simple implementation of UseNUMAInterleaving is
>> also provided for other OSes
>> 
>> The situations where this shows the biggest benefits would be:
>>     * Windows platforms with multiple numa nodes (eg, 4)
>> 
>>     * The JVM process is run across all the nodes (not affinitized to
>>       one node).
>> 
>>     * A workload that has enough threads so that it uses the majority
>>       of the cores in the machine, so that the heap is being accessed
>>       from many cores, including remote ones.
>> 
>>     * Enough memory per node and a heap size such that the default heap
>>       placement policy on windows would end up with the heap (or
>>       nursery) placed on one node.
>> 
>> jbb2005 and SPECPower_ssj2008 are examples of such workloads.  In our
>> measurements, we have seen some cases where the performance with
>> UseNUMAInterleaving was 2.7x vs. the performance without. There were
>> gains of varying sizes across all systems.
>> 
>> The webrev is at
>> http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.04/
>> 
>> Summary of changes in webrev.04 from webrev.03:
>> 
>>     * As suggested by Igor Veresov, UseNUMA can imply
>>       UseNUMAInterleaving on all platforms.  This is in arguments.cpp
>> 
>>     * In NUMANodeListHolder in os_windows.cpp, allocates the node_list
>>       dynamically rather than assuming a length of 64.  The method
>>       NUMANodeListHolder::get_node_list_entry checks returns -1 for
>>       indexes that are out of bounds.  
>> 
>>     * Several code convention cleanups suggested by Igor.
>> 
>>     * Merge with the new style system dll function resolutions from
>>       "7016797: Hotspot: securely/restrictive load dlls and new API for
>>       loading system dlls"  Note: my new NUMA functions are outside the ifdefs.
>> 
>> 
>> Summary of changes in webrev.03 from webrev.02:
>> 
>>     * As suggested by Igor Veresov, reverts to using
>>       UseNUMAInterleaving as the enabling flag.  This will make it
>>       easier in the future when there are GCs that enable fuller
>>       UseNUMA on Windows.
>> 
>>     * Adds a simple implementation of UseNUMAInterleaving on Linux and
>>       Solaris, which just calls numa_make_global after commit_memory
>>       and reserve_memory_special
>> 
>>     * Adds a flag NUMAInterleaveGranularity which allows setting the
>>       granularity with which we move to a different node in a memory
>>       allocation.  The default is 2MB.  This flag only applies to
>>       Windows for now.
>> 
>>     * Several code cleanups in os_windows.cpp suggested by Igor.
>> 
>> 
>> Summary of overall changes in os_windows.cpp:
>> 
>>     * Some static routines were added to set things up init time.  These
>>        * check that the required APIs (VirtualAllocExNuma,
>>          GetNumaHighestNodeNumber, GetNumaNodeProcessorMask) exist in
>>          the OS
>> 
>>        * build the list of numa nodes on which this process has affinity
>> 
>>     * Changes to os::reserve_memory
>>        * There was already a routine that reserved pages one page at a
>>          time (used for Individual Large Page Allocation on WS2003).
>>          This was abstracted to a separate routine, called
>>          allocate_pages_individually.  This gets called both for the
>>          Individual Large Page Allocation thing mentioned above and for
>>          UseNUMAInterleaving (for both small and large pages)
>> 
>>        * When used for NUMA Interleaving this just goes thru the numa
>>          node list in a round-robin fashion, allocating chunks at the
>>          NUMAInterleaveGranularity using a different allocation for
>>          each chunk
>> 
>>        * Whether we do just a reserve or a combined reserve/commit is
>>          determined by the caller of allocate_pages_individually
>> 
>>           * When used with large pages, we do a Reserve and Commit at
>>             the same time which is the way it always worked and the way
>>             it has to work on windows.
>> 
>>           * For small pages, only the reserve is done, the commit will
>>             come later. (which is the way it worked for
>>             non-interleaved)
>> 
>>     * os::commit_memory changes
>>        * If UseNUMAIntereaving is true, os::commit_memory has to check
>>          whether it was being asked to commit memory that might have
>>          come from multiple Reserve allocations, if so, the commits
>>          must also be broken up.  We don't keep any data structure to
>>          keep track of this, we just use VirtualQuery which queries the
>>          properties of a VA range and can tell us how much came from
>>          one VirtualAlloc call.
>> 
>> I do not have a bug id for this.
>> 
>> -- Tom Deneau, AMD
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20110823/d54032fd/attachment.htm>