Turn on UseNUMA by default when prudent

Fri Jun 8 19:49:22 UTC 2012

Hi everybody,
I made a similar change for Windows

  http://cr.openjdk.java.net/~ecaspole/numa_default_win_1/

in addition to the linux one:

  http://cr.openjdk.java.net/~ecaspole/numa_default_3/

Doing this is even more effective on Windows since windows seems to  
have an aggressive policy of allocating process memory on the "home  
node" where it first ran. In my worst case test on Windows the test  
would be up to 3x faster with +UseNUMA with Windows Server 2008 R2  
compared to the existing default if the application had at least as  
many threads as cores.

Regards,
Eric

On Jun 1, 2012, at 11:27 AM, Vladimir Kozlov wrote:

> Can GC group sponsor this change? I think we also need to do the  
> same for Solaris (the code is similar there).
>
> Thanks,
> Vladimir
>
> On 6/1/12 4:43 AM, Jesper Wilhelmsson wrote:
>> This looks OK to me.
>> /Jesper
>>
>>
>> On 2012-05-31 17:05, Eric Caspole wrote:
>>> OK, I removed the warning, see
>>>
>>> http://cr.openjdk.java.net/~ecaspole/numa_default_3/
>>>
>>> Thanks,
>>> Eric
>>>
>>>
>>> On May 30, 2012, at 4:58 PM, Vladimir Kozlov wrote:
>>>
>>>> We issue a warning only if something is not right which not the  
>>>> case here:
>>>>
>>>> + warning("Turned on UseNUMA in os::init_2");
>>>>
>>>> otherwise looks good.
>>>>
>>>> Vladimir
>>>>
>>>> Eric Caspole wrote:
>>>>> I put a much simpler, still linux only rev at
>>>>> http://cr.openjdk.java.net/~ecaspole/numa_default_2/
>>>>> Simply doing UseNUMA on by default might work but there are so  
>>>>> many
>>>>> os/platforms to consider it's more than I can try to test.
>>>>> Eric
>>>>> On May 30, 2012, at 4:14 PM, Jesper Wilhelmsson wrote:
>>>>>> On 2012-05-30 20:41, Igor Veresov wrote:
>>>>>>> Actually UseNUMA should already do what you want. Even if  
>>>>>>> specified on
>>>>>>> the command line it will switch itself off if there's only  
>>>>>>> one node present.
>>>>>>
>>>>>> So, will setting UseNUMA to true as default be a platform  
>>>>>> independent way
>>>>>> of solving this?
>>>>>> /Jesper
>>>>>>
>>>>>>>
>>>>>>> igor
>>>>>>>
>>>>>>> On May 30, 2012, at 12:27 AM, Thomas Schatzl wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> On Tue, 2012-05-29 at 21:56 +0200, Jesper Wilhelmsson wrote:
>>>>>>>>> Hi Eric,
>>>>>>>>>
>>>>>>>>> As long as this is based on actual data and not just a  
>>>>>>>>> hunch, I personally
>>>>>>>>> think it is a good idea. I don't know if we have any  
>>>>>>>>> policies about
>>>>>>>>> platform
>>>>>>>>> specific optimizations like this though.
>>>>>>>>>
>>>>>>>>> I have some comments on the code layout and there are a few  
>>>>>>>>> typos, but
>>>>>>>>> I guess
>>>>>>>>> this is still a draft so I won't pick on that right now.
>>>>>>>>>
>>>>>>>>> One thing I wonder though is in os_linux_x86.cpp:
>>>>>>>>>
>>>>>>>>> if (VM_Version::cpu_family() == 0x15 ||  
>>>>>>>>> VM_Version::cpu_family() ==
>>>>>>>>> 0x10) {
>>>>>>>>>
>>>>>>>>> Is this the only way to identify the proper processor  
>>>>>>>>> family? It
>>>>>>>>> doesn't seem
>>>>>>>>> very future proof. How often would you have to change this  
>>>>>>>>> code to keep
>>>>>>>>> it up
>>>>>>>>> to date with new hardware?
>>>>>>>> just a question, if this is implemented, wouldn't it more  
>>>>>>>> prudent to
>>>>>>>> actually check whether the VM process runs on a NUMA  
>>>>>>>> machine, and
>>>>>>>> actually has its computing (or memory) resources distributed  
>>>>>>>> across
>>>>>>>> several nodes instead of a check for some arbitrary  
>>>>>>>> processors and
>>>>>>>> processor identifiers?
>>>>>>>>
>>>>>>>> This would, given that the OS typically provides this  
>>>>>>>> information
>>>>>>>> anyway, also immediately support e.g. sparc setups. It also  
>>>>>>>> avoids
>>>>>>>> distributing memory when the user explicitly assigned the VM  
>>>>>>>> to a single
>>>>>>>> node...
>>>>>>>>
>>>>>>>>> From memory, on solaris above mentioned detection works  
>>>>>>>>> approximately as
>>>>>>>> follows:
>>>>>>>>
>>>>>>>> - detect the total amount of leaf locality groups (=nodes on  
>>>>>>>> Solaris)
>>>>>>>> in the system, e.g. via lgrp_nlgrps()
>>>>>>>> - from the root node (retrieved via lgrp_root()), iterate  
>>>>>>>> over its
>>>>>>>> children and leaf lgroups via lgrp_children().
>>>>>>>> - for each of the leaf lgroups found, check whether there is an
>>>>>>>> active cpu for this process in it using lgrp_cpus(); if so,  
>>>>>>>> increment
>>>>>>>> counter
>>>>>>>>
>>>>>>>> Maybe there is a better way to do that though.
>>>>>>>>
>>>>>>>> On Linux, numa_get_run_node_mask() may provide the same  
>>>>>>>> information when
>>>>>>>> called during initialization.
>>>>>>>> On Windows, it seems that a combination of  
>>>>>>>> GetProcessAffinityMask() and
>>>>>>>> GetNUMAProcessorNode() may be useful.
>>>>>>>> (From a cursory web search for the latter two; not sure  
>>>>>>>> about other
>>>>>>>> OSes, but you could simply provide a dummy for those)
>>>>>>>>
>>>>>>>> I'd guess that some of the needed functionality to implement  
>>>>>>>> this is
>>>>>>>> already provided by the current Hotspot code base.
>>>>>>>>
>>>>>>>>
>>>>>>>> Ergonomics stuff is typically handled in runtime/arguments.? 
>>>>>>>> pp, so it
>>>>>>>> might be a better place as a location for updating globals  
>>>>>>>> than putting
>>>>>>>> this detection in some os-specific initialization code.
>>>>>>>>
>>>>>>>> Eg.
>>>>>>>>
>>>>>>>> if (FLAG_IS_DEFAULT(UseNUMA)) {
>>>>>>>> UseNUMA := [maybe some other conditions&&]
>>>>>>>> (os::get_num_active_numa_nodes()> 1);
>>>>>>>> }
>>>>>>>>
>>>>>>>> in e.g. Arguments::set_ergonomics_flags() or similar.
>>>>>>>>
>>>>>>>> Seems a lot nicer than an explicit check for some processor  
>>>>>>>> family.
>>>>>>>> Maybe a little more work though.
>>>>>>>>
>>>>>>>> Hth,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>> <jesper_wilhelmsson.vcf>
>>>>
>>>
>>>
>