RFR(s): 8023905: Failing to initialize VM with small initial heap when NUMA and large pages are enabled

Stefan Johansson stefan.johansson at oracle.com
Thu Aug 25 14:36:23 UTC 2016


Hi Sangheon,

On 2016-08-24 19:53, sangheon wrote:
> Hi Stefan,
>
> Thanks for reviewing this.
>
> On 08/24/2016 06:29 AM, Stefan Johansson wrote:
>> Hi Sangheon,
>>
>> Thanks for looking at this issue.
>>
>> On 2016-08-10 19:28, sangheon wrote:
>>> Hi all,
>>>
>>> Can I have some reviews for this change?
>>>
>>> NUMA and large pages are not compatible in Linux as large pages 
>>> cannot uncommit pages(os_linux.cpp:line 4828 [1]). So we use pin 
>>> region for this case. If we succeed to reserve with large pages for 
>>> small initial heap, we will fail when free memory for biasing. The 
>>> reason is that when we initialize NUMA with large pages, we change 
>>> the page size to the default page size if the allocated pages are 
>>> small.
>>>
>>> I am proposing to exit the VM at that time. Adding an exception 
>>> seems not good idea for this small heap which seems not practical 
>>> for NUMA + large page case.
>>> The added test is checking the exit message if both NUMA and large 
>>> pages are supported.
>>>
>>> CR: https://bugs.openjdk.java.net/browse/JDK-8023905
>>> Webrev: http://cr.openjdk.java.net/~sangheki/8023905/webrev.0
>> A few comments.
>>
>> I agree that having the VM exit is good, but I think the exit message 
>> should include info that large pages caused this. Something like: 
>> "Failed initializing NUMA with large pages. Too small heap size"
> OK.
> "Failed initializing NUMA with large pages. Too small heap size" seems 
> much better.
>
>>
>> Another thing is the use of #ifdef to make this conditional for 
>> Linux. Is this needed? Isn't the return value for 
>> can_commit_large_page_memory() the conditional we should care about? 
> We can know whether we are using 'pin region' from the return value of 
> can_commit_large_page_memory() and UseLargePages flag.
>
> However the condition of comparison with page size in Linux version of 
> os::pd_free_memory() makes this problem. For other platforms such as 
> Windows, AIX and BSD which have empty os::pd_free_memory(), it doesn't 
> matter. And Solaris doesn't have such condition. This means for other 
> platforms we don't need to exit because of reverting to default page 
> size.
>
> I mean if can_commit_large_page_memory() returns false and 
> UseLargePages enabled, we will try to use pin region. But NUMA could 
> try to use default page size. It is okay in general except on Linux 
> because of above reason.
>
>> Or will we fail some platform too early. If so, we could add another 
>> capability method to the os class and use that to avoid having the 
>> #ifdef in the code.
> I also considered shortly to add a new method to decide.
> And I agree that not using #ifdef is better in general. But I'm not 
> sure for this case as it is too Linux implementation specific. i.e. 
> Linux version is implemented pd_free_memory() to conditionally commit 
> after comparing with page size. If Linux pd_free_memory() becomes 
> blank or the condition is changed, the decision method also should be 
> changed which seems not worth for me. This is why I stopped 
> considering it.
>
Ok, I don't see things changing as a big problem, you could let the new 
capability just return the same as can_commit_large_page_memory() for 
Linux and have the other platforms return true. This would have the same 
maintenance requirements as the current solution in my eyes.

>>
>> Regarding the comment, I'm not sure what you mean by "pin region". I 
>> might be missing something but I think the comment need more 
>> information to be easier to understand.
> Looking at the ReservedHeapSpace::try_reserve_heap() line 314, there 
> is a comment about current allocation.
> // If OS doesn't support demand paging for large page memory, we need
> // to use reserve_memory_special() to reserve and pin the entire region.
>
> And I agree adding more information is better.
> How about this? (I will also update the comment at test)
> -     // If we are using pin region, we cannot change the page size to 
> default size
> -     // as we could free memory which is not expected for pin region 
> in Linux.
>
> +     // If we are using pin region which is reserved and pinned the 
> entire region,
> +     // we cannot change the page size to default size as we could 
> free memory
> +     // which is not expected for pin region in Linux.
>
Ok, I see what you mean. Just never thought about it as "pin region". I 
would say something like:
     // Changing the page size below can lead to freeing of memory. When 
using large pages
     // and the memory has been both reserved and committed, some 
platforms do not support
     // freeing parts of it. For those platforms we fail initialization.
     if (UseLargePages && !os::can_free_parts_of_large_page_memory()) {
       vm_exit_during_initialization("Failed initializing NUMA. Too 
small heap size");
     }

What do you think about that?

Thanks,
Stefan
> Let me upload webrev after our discussion ended.
>
> Thanks,
> Sangheon
>
>
>>
>> Thanks,
>> Stefan
>>
>>> Testing: JPRT, manual test on NUMA + large page supported machine.
>>>
>>> Thanks,
>>> Sangheon
>>>
>>> [1]:
>>> // With SHM and HugeTLBFS large pages we cannot uncommit a page, so 
>>> there's no way
>>> // we can make the adaptive lgrp chunk resizing work. If the user 
>>> specified
>>> // both UseNUMA and UseLargePages (or UseSHM/UseHugeTLBFS) on the 
>>> command line - warn and
>>> // disable adaptive resizing.
>>
>




More information about the hotspot-gc-dev mailing list