RFR(s): 8023905: Failing to initialize VM with small initial heap when NUMA and large pages are enabled

sangheon sangheon.kim at oracle.com
Wed Aug 24 17:53:43 UTC 2016


Hi Stefan,

Thanks for reviewing this.

On 08/24/2016 06:29 AM, Stefan Johansson wrote:
> Hi Sangheon,
>
> Thanks for looking at this issue.
>
> On 2016-08-10 19:28, sangheon wrote:
>> Hi all,
>>
>> Can I have some reviews for this change?
>>
>> NUMA and large pages are not compatible in Linux as large pages 
>> cannot uncommit pages(os_linux.cpp:line 4828 [1]). So we use pin 
>> region for this case. If we succeed to reserve with large pages for 
>> small initial heap, we will fail when free memory for biasing. The 
>> reason is that when we initialize NUMA with large pages, we change 
>> the page size to the default page size if the allocated pages are small.
>>
>> I am proposing to exit the VM at that time. Adding an exception seems 
>> not good idea for this small heap which seems not practical for NUMA 
>> + large page case.
>> The added test is checking the exit message if both NUMA and large 
>> pages are supported.
>>
>> CR: https://bugs.openjdk.java.net/browse/JDK-8023905
>> Webrev: http://cr.openjdk.java.net/~sangheki/8023905/webrev.0
> A few comments.
>
> I agree that having the VM exit is good, but I think the exit message 
> should include info that large pages caused this. Something like: 
> "Failed initializing NUMA with large pages. Too small heap size"
OK.
"Failed initializing NUMA with large pages. Too small heap size" seems 
much better.

>
> Another thing is the use of #ifdef to make this conditional for Linux. 
> Is this needed? Isn't the return value for 
> can_commit_large_page_memory() the conditional we should care about? 
We can know whether we are using 'pin region' from the return value of 
can_commit_large_page_memory() and UseLargePages flag.

However the condition of comparison with page size in Linux version of 
os::pd_free_memory() makes this problem. For other platforms such as 
Windows, AIX and BSD which have empty os::pd_free_memory(), it doesn't 
matter. And Solaris doesn't have such condition. This means for other 
platforms we don't need to exit because of reverting to default page size.

I mean if can_commit_large_page_memory() returns false and UseLargePages 
enabled, we will try to use pin region. But NUMA could try to use 
default page size. It is okay in general except on Linux because of 
above reason.

> Or will we fail some platform too early. If so, we could add another 
> capability method to the os class and use that to avoid having the 
> #ifdef in the code.
I also considered shortly to add a new method to decide.
And I agree that not using #ifdef is better in general. But I'm not sure 
for this case as it is too Linux implementation specific. i.e. Linux 
version is implemented pd_free_memory() to conditionally commit after 
comparing with page size. If Linux pd_free_memory() becomes blank or the 
condition is changed, the decision method also should be changed which 
seems not worth for me. This is why I stopped considering it.

>
> Regarding the comment, I'm not sure what you mean by "pin region". I 
> might be missing something but I think the comment need more 
> information to be easier to understand.
Looking at the ReservedHeapSpace::try_reserve_heap() line 314, there is 
a comment about current allocation.
// If OS doesn't support demand paging for large page memory, we need
// to use reserve_memory_special() to reserve and pin the entire region.

And I agree adding more information is better.
How about this? (I will also update the comment at test)
-     // If we are using pin region, we cannot change the page size to 
default size
-     // as we could free memory which is not expected for pin region in 
Linux.

+     // If we are using pin region which is reserved and pinned the 
entire region,
+     // we cannot change the page size to default size as we could free 
memory
+     // which is not expected for pin region in Linux.

Let me upload webrev after our discussion ended.

Thanks,
Sangheon


>
> Thanks,
> Stefan
>
>> Testing: JPRT, manual test on NUMA + large page supported machine.
>>
>> Thanks,
>> Sangheon
>>
>> [1]:
>> // With SHM and HugeTLBFS large pages we cannot uncommit a page, so 
>> there's no way
>> // we can make the adaptive lgrp chunk resizing work. If the user 
>> specified
>> // both UseNUMA and UseLargePages (or UseSHM/UseHugeTLBFS) on the 
>> command line - warn and
>> // disable adaptive resizing.
>




More information about the hotspot-gc-dev mailing list