RFR(s): 8023905: Failing to initialize VM with small initial heap when NUMA and large pages are enabled

Thu Aug 25 21:02:17 UTC 2016

Hi Stefan,

On 08/25/2016 07:36 AM, Stefan Johansson wrote:
> Hi Sangheon,
>
> On 2016-08-24 19:53, sangheon wrote:
>> Hi Stefan,
>>
>> Thanks for reviewing this.
>>
>> On 08/24/2016 06:29 AM, Stefan Johansson wrote:
>>> Hi Sangheon,
>>>
>>> Thanks for looking at this issue.
>>>
>>> On 2016-08-10 19:28, sangheon wrote:
>>>> Hi all,
>>>>
>>>> Can I have some reviews for this change?
>>>>
>>>> NUMA and large pages are not compatible in Linux as large pages 
>>>> cannot uncommit pages(os_linux.cpp:line 4828 [1]). So we use pin 
>>>> region for this case. If we succeed to reserve with large pages for 
>>>> small initial heap, we will fail when free memory for biasing. The 
>>>> reason is that when we initialize NUMA with large pages, we change 
>>>> the page size to the default page size if the allocated pages are 
>>>> small.
>>>>
>>>> I am proposing to exit the VM at that time. Adding an exception 
>>>> seems not good idea for this small heap which seems not practical 
>>>> for NUMA + large page case.
>>>> The added test is checking the exit message if both NUMA and large 
>>>> pages are supported.
>>>>
>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8023905
>>>> Webrev: http://cr.openjdk.java.net/~sangheki/8023905/webrev.0
>>> A few comments.
>>>
>>> I agree that having the VM exit is good, but I think the exit 
>>> message should include info that large pages caused this. Something 
>>> like: "Failed initializing NUMA with large pages. Too small heap size"
>> OK.
>> "Failed initializing NUMA with large pages. Too small heap size" 
>> seems much better.
>>
>>>
>>> Another thing is the use of #ifdef to make this conditional for 
>>> Linux. Is this needed? Isn't the return value for 
>>> can_commit_large_page_memory() the conditional we should care about? 
>> We can know whether we are using 'pin region' from the return value 
>> of can_commit_large_page_memory() and UseLargePages flag.
>>
>> However the condition of comparison with page size in Linux version 
>> of os::pd_free_memory() makes this problem. For other platforms such 
>> as Windows, AIX and BSD which have empty os::pd_free_memory(), it 
>> doesn't matter. And Solaris doesn't have such condition. This means 
>> for other platforms we don't need to exit because of reverting to 
>> default page size.
>>
>> I mean if can_commit_large_page_memory() returns false and 
>> UseLargePages enabled, we will try to use pin region. But NUMA could 
>> try to use default page size. It is okay in general except on Linux 
>> because of above reason.
>>
>>> Or will we fail some platform too early. If so, we could add another 
>>> capability method to the os class and use that to avoid having the 
>>> #ifdef in the code.
>> I also considered shortly to add a new method to decide.
>> And I agree that not using #ifdef is better in general. But I'm not 
>> sure for this case as it is too Linux implementation specific. i.e. 
>> Linux version is implemented pd_free_memory() to conditionally commit 
>> after comparing with page size. If Linux pd_free_memory() becomes 
>> blank or the condition is changed, the decision method also should be 
>> changed which seems not worth for me. This is why I stopped 
>> considering it.
>>
> Ok, I don't see things changing as a big problem, you could let the 
> new capability just return the same as can_commit_large_page_memory() 
> for Linux and have the other platforms return true. This would have 
> the same maintenance requirements as the current solution in my eyes.
I think we have to consider not only the maintenance but also the 
necessity of the method. I don't think we could re-use it and this is 
why I described it as too Linux implementation specific.

If you strongly want to introduce a new method, may I ask what is 
expected name/roll of the method?

>
>>>
>>> Regarding the comment, I'm not sure what you mean by "pin region". I 
>>> might be missing something but I think the comment need more 
>>> information to be easier to understand.
>> Looking at the ReservedHeapSpace::try_reserve_heap() line 314, there 
>> is a comment about current allocation.
>> // If OS doesn't support demand paging for large page memory, we need
>> // to use reserve_memory_special() to reserve and pin the entire region.
>>
>> And I agree adding more information is better.
>> How about this? (I will also update the comment at test)
>> -     // If we are using pin region, we cannot change the page size 
>> to default size
>> -     // as we could free memory which is not expected for pin region 
>> in Linux.
>>
>> +     // If we are using pin region which is reserved and pinned the 
>> entire region,
>> +     // we cannot change the page size to default size as we could 
>> free memory
>> +     // which is not expected for pin region in Linux.
>>
> Ok, I see what you mean. Just never thought about it as "pin region".
We are already using the term 'pinned region (not pin region)' 5 times 
at G1 code.

> I would say something like:
>     // Changing the page size below can lead to freeing of memory. 
> When using large pages
>     // and the memory has been both reserved and committed, some 
> platforms do not support
>     // freeing parts of it. For those platforms we fail initialization.
>     if (UseLargePages && !os::can_free_parts_of_large_page_memory()) {
>       vm_exit_during_initialization("Failed initializing NUMA. Too 
> small heap size");
>     }
>
> What do you think about that?
Basically I like above but it would be better to contain 'pinned region' 
as there's no reason to avoid using it.

Thanks,
Sangheon

>
> Thanks,
> Stefan
>> Let me upload webrev after our discussion ended.
>>
>> Thanks,
>> Sangheon
>>
>>
>>>
>>> Thanks,
>>> Stefan
>>>
>>>> Testing: JPRT, manual test on NUMA + large page supported machine.
>>>>
>>>> Thanks,
>>>> Sangheon
>>>>
>>>> [1]:
>>>> // With SHM and HugeTLBFS large pages we cannot uncommit a page, so 
>>>> there's no way
>>>> // we can make the adaptive lgrp chunk resizing work. If the user 
>>>> specified
>>>> // both UseNUMA and UseLargePages (or UseSHM/UseHugeTLBFS) on the 
>>>> command line - warn and
>>>> // disable adaptive resizing.
>>>
>>
>