Review Request: UseNUMAInterleaving

Mon Aug 8 18:42:36 UTC 2011

Hi, Tom!

Sorry it took me so long to get to that.

1. I don't think the new version of flag usage is prudent. The reason I 
proposed to introduce a new flag for interleaving is that it would make 
life easier in the future when the proper NUMA-aware implementation of 
GCs are added (G1 would be the most probable candidate). I would propose 
to still have UseNUMAInterleaving flag.

The usage would be as follows:
- If UseNUMA is specified on Windows that would turn UseNUMAInterleaving 
(for the time being, and that behavior would change in the future).
- If UseNUMAInterleaving is specified on the command line, you just do 
the interleaving. If you don't add this flag now, you'll have to do that 
anyway as soon as NUMA-aware GCs start supporting windows.

2. I guess the accepted coding convention in hotspot is that "else" 
should have closing and open bracket be on one line.
2846     }
2847     else {
And in all other places...

3. Did you forget to remove that?
3149       // tty->print("VirtualQuery AllocBase=%p, RegionSize=%Id\n", 
allocInfo.AllocationBase, allocInfo.RegionSize);

4. Does it make sense to pass UseLargePages and UseNUMAInterleaving to 
allocate_pages_individually()? They are global variables anyway.

5. What is the typical allocation granularity on windows? Wouldn't that 
be a problem if we tried to allocate a large heap with small interleaved 
pages? Have you tried using larger interleaving granularity for modern 
windows version? Doing a syscall and creating a segment per even a large 
page seems bit excessive. If you did try that, was there any difference?

6. The usage of "result" doesn't seem right here, did you mean "if 
(!result) return false;" ?
3129     bool result = VirtualAlloc(addr, bytes, MEM_COMMIT, 
PAGE_READWRITE) != 0;
3130     if (result == NULL) return false;

7. Wouldn't it be nicer instead of the idiom
          BOOL ok = SysCall();
          if (!ok) return false;
just to say
          if (!SysCall()) return false;
?

8. Instead of introducing a global variable numa_used_node_count, could 
you implement os::numa_get_groups_num() that was intended to return this 
number?
Also build_numa_used_node_list() seems to have the  same functionality 
as os::numa_get_leaf_groups() was intended to have. Could you implement 
it and use it instead?

Please name function parameters in lower case with words separated with 
underscores. I know that there are exceptions, especially in 
os_windows.cpp, but it's better if we stick to the general convention.

igor

On 5/26/11 4:37 PM, Deneau, Tom wrote:
> I have incorporated the change suggested by Paul Hohensee to just use the existing UseNUMA flag rather than introduce a new flag.  Please let me know when you think this will be able to be checked in...
>
> The new webrev is at
> http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.02/
>
> -- Tom Deneau, AMD
>
>
>
>> -----Original Message-----
>> From: Deneau, Tom
>> Sent: Monday, May 16, 2011 12:54 PM
>> To: 'hotspot-compiler-dev at openjdk.java.net'
>> Subject: Review Request: UseNUMAInterleaving
>>
>> Please review this patch which adds a new flag called
>> UseNUMAInterleaving.  This flag provides a subset of the functionality
>> provided by UseNUMA, and its main purpose is to provide that subset on
>> OSes like Windows which do not support the full UseNUMA functionality.
>> In UseNUMA terminology, UseNUMAInterleaved makes all memory
>> "numa_global" which is implemented as interleaved.
>>
>> The situations where this shows the biggest benefits would be:
>>     * Windows platforms with multiple numa nodes (eg, 4)
>>
>>     * The JVM process is run across all the nodes (not affinitized to one
>> node).
>>
>>     * A workload that uses the majority of the cores in the machine, so
>>       that the heap is being accessed from many cores, including remote
>>       ones.
>>
>>     * Enough memory per node and a heap size such that the default heap
>>       placement policy on windows would end up with the heap (or
>>       nursery) placed on one node.
>>
>> jbb2005 and SPECPower_ssj2008 are examples of such workloads.  In our
>> measurements, we have seen some cases where the performance with
>> UseNUMAInterleaving was 2.7x vs. the performance without. There were
>> gains of varying sizes across all systems.
>>
>> As currently implemented this flag is ignored on Linux and Solaris
>> since they already support the full UseNUMA flag.
>>
>> The webrev is at
>> http://cr.openjdk.java.net/~tdeneau/UseNUMAInterleaving/webrev.01/
>>
>> Summary of changes:
>>
>>     * Other than adding the new UseNUMAInterleaving global flag, all of
>>       the changes are in src/os/windows/vm/os_windows.cpp
>>
>>     * Some static routines were added to set things up init time.  These
>>        * check that the required APIs (VirtualAllocExNuma,
>>          GetNumaHighestNodeNumber, GetNumaNodeProcessorMask) exist in
>>          the OS
>>
>>        * build the list of numa nodes on which this process has affinity
>>
>>     * Changes to os::reserve_memory
>>        * There was already a routine that reserved pages one page at a
>>          time (used for Individual Large Page Allocation on WS2003).
>>          This was abstracted to a separate routine, called
>>          allocate_pages_individually.  This gets called both for the
>>          Individual Large Page Allocation thing mentioned above and for
>>          UseNUMAInterleaving (for both small and large pages)
>>
>>        * When used for NUMA Interleaving this just goes thru the numa
>>          node list in a round-robin fashion, using a different one for
>>          each chunk (with 4K pages, the minimum allocation granularity
>>          is 64K, with 2M pages it is 1 Page)
>>
>>        * Whether we do just a reserve or a combined reserve/commit is
>>          determined by the caller of allocate_pages_individually
>>
>>           * When used with large pages, we do a Reserve and Commit at
>>             the same time which is the way it always worked and the way
>>             it has to work on windows.
>>
>>           * For small pages, only the reserve is done, the commit will
>>             come later. (which is the way it worked for
>>             non-interleaved)
>>
>>     * os::commit_memory changes
>>        * If UseNUMAIntereaving is true, os::commit_memory has to check
>>          whether it was being asked to commit memory that might have
>>          come from multiple Reserve allocations, if so, the commits
>>          must also be broken up.  We don't keep any data structure to
>>          keep track of this, we just use VirtualQuery which queries the
>>          properties of a VA range and can tell us how much came from
>>          one VirtualAlloc call.
>>
>> I do not have a bug id for this.
>>
>> -- Tom Deneau, AMD