Turn on UseNUMA by default when prudent

Wed May 30 20:31:56 UTC 2012

I put a much simpler, still linux only rev at

  http://cr.openjdk.java.net/~ecaspole/numa_default_2/

Simply doing UseNUMA on by default might work but there are so many  
os/platforms to consider it's more than I can try to test.
Eric

On May 30, 2012, at 4:14 PM, Jesper Wilhelmsson wrote:

> On 2012-05-30 20:41, Igor Veresov wrote:
>> Actually UseNUMA should already do what you want. Even if  
>> specified on the command line it will switch itself off if there's  
>> only one node present.
>
> So, will setting UseNUMA to true as default be a platform  
> independent way of solving this?
> /Jesper
>
>>
>> igor
>>
>> On May 30, 2012, at 12:27 AM, Thomas Schatzl wrote:
>>
>>> Hi all,
>>>
>>> On Tue, 2012-05-29 at 21:56 +0200, Jesper Wilhelmsson wrote:
>>>> Hi Eric,
>>>>
>>>> As long as this is based on actual data and not just a hunch, I  
>>>> personally
>>>> think it is a good idea. I don't know if we have any policies  
>>>> about platform
>>>> specific optimizations like this though.
>>>>
>>>> I have some comments on the code layout and there are a few  
>>>> typos, but I guess
>>>> this is still a draft so I won't pick on that right now.
>>>>
>>>> One thing I wonder though is in os_linux_x86.cpp:
>>>>
>>>> if (VM_Version::cpu_family() == 0x15 || VM_Version::cpu_family()  
>>>> == 0x10) {
>>>>
>>>> Is this the only way to identify the proper processor family? It  
>>>> doesn't seem
>>>> very future proof. How often would you have to change this code  
>>>> to keep it up
>>>> to date with new hardware?
>>> just a question, if this is implemented, wouldn't it more prudent to
>>> actually check whether the VM process runs on a NUMA machine, and
>>> actually has its computing (or memory) resources distributed across
>>> several nodes instead of a check for some arbitrary processors and
>>> processor identifiers?
>>>
>>> This would, given that the OS typically provides this information
>>> anyway, also immediately support e.g. sparc setups. It also avoids
>>> distributing memory when the user explicitly assigned the VM to a  
>>> single
>>> node...
>>>
>>>>  From memory, on solaris above mentioned detection works  
>>>> approximately as
>>> follows:
>>>
>>>   - detect the total amount of leaf locality groups (=nodes on  
>>> Solaris)
>>> in the system, e.g. via lgrp_nlgrps()
>>>   - from the root node (retrieved via lgrp_root()), iterate over its
>>> children and leaf lgroups via lgrp_children().
>>>     - for each of the leaf lgroups found, check whether there is an
>>> active cpu for this process in it using lgrp_cpus(); if so,  
>>> increment
>>> counter
>>>
>>> Maybe there is a better way to do that though.
>>>
>>> On Linux, numa_get_run_node_mask() may provide the same  
>>> information when
>>> called during initialization.
>>> On Windows, it seems that a combination of GetProcessAffinityMask 
>>> () and
>>> GetNUMAProcessorNode() may be useful.
>>> (From a cursory web search for the latter two; not sure about other
>>> OSes, but you could simply provide a dummy for those)
>>>
>>> I'd guess that some of the needed functionality to implement this is
>>> already provided by the current Hotspot code base.
>>>
>>>
>>> Ergonomics stuff is typically handled in runtime/arguments.?pp,  
>>> so it
>>> might be a better place as a location for updating globals than  
>>> putting
>>> this detection in some os-specific initialization code.
>>>
>>> Eg.
>>>
>>>   if (FLAG_IS_DEFAULT(UseNUMA)) {
>>>     UseNUMA := [maybe some other conditions&&]
>>> (os::get_num_active_numa_nodes()>  1);
>>>   }
>>>
>>> in e.g. Arguments::set_ergonomics_flags() or similar.
>>>
>>> Seems a lot nicer than an explicit check for some processor family.
>>> Maybe a little more work though.
>>>
>>> Hth,
>>>   Thomas
>>>
>>>
>>> <jesper_wilhelmsson.vcf>