RFR(s): 8023905: Failing to initialize VM with small initial heap when NUMA and large pages are enabled
Stefan Johansson
stefan.johansson at oracle.com
Thu Aug 25 14:36:23 UTC 2016
Hi Sangheon,
On 2016-08-24 19:53, sangheon wrote:
> Hi Stefan,
>
> Thanks for reviewing this.
>
> On 08/24/2016 06:29 AM, Stefan Johansson wrote:
>> Hi Sangheon,
>>
>> Thanks for looking at this issue.
>>
>> On 2016-08-10 19:28, sangheon wrote:
>>> Hi all,
>>>
>>> Can I have some reviews for this change?
>>>
>>> NUMA and large pages are not compatible in Linux as large pages
>>> cannot uncommit pages(os_linux.cpp:line 4828 [1]). So we use pin
>>> region for this case. If we succeed to reserve with large pages for
>>> small initial heap, we will fail when free memory for biasing. The
>>> reason is that when we initialize NUMA with large pages, we change
>>> the page size to the default page size if the allocated pages are
>>> small.
>>>
>>> I am proposing to exit the VM at that time. Adding an exception
>>> seems not good idea for this small heap which seems not practical
>>> for NUMA + large page case.
>>> The added test is checking the exit message if both NUMA and large
>>> pages are supported.
>>>
>>> CR: https://bugs.openjdk.java.net/browse/JDK-8023905
>>> Webrev: http://cr.openjdk.java.net/~sangheki/8023905/webrev.0
>> A few comments.
>>
>> I agree that having the VM exit is good, but I think the exit message
>> should include info that large pages caused this. Something like:
>> "Failed initializing NUMA with large pages. Too small heap size"
> OK.
> "Failed initializing NUMA with large pages. Too small heap size" seems
> much better.
>
>>
>> Another thing is the use of #ifdef to make this conditional for
>> Linux. Is this needed? Isn't the return value for
>> can_commit_large_page_memory() the conditional we should care about?
> We can know whether we are using 'pin region' from the return value of
> can_commit_large_page_memory() and UseLargePages flag.
>
> However the condition of comparison with page size in Linux version of
> os::pd_free_memory() makes this problem. For other platforms such as
> Windows, AIX and BSD which have empty os::pd_free_memory(), it doesn't
> matter. And Solaris doesn't have such condition. This means for other
> platforms we don't need to exit because of reverting to default page
> size.
>
> I mean if can_commit_large_page_memory() returns false and
> UseLargePages enabled, we will try to use pin region. But NUMA could
> try to use default page size. It is okay in general except on Linux
> because of above reason.
>
>> Or will we fail some platform too early. If so, we could add another
>> capability method to the os class and use that to avoid having the
>> #ifdef in the code.
> I also considered shortly to add a new method to decide.
> And I agree that not using #ifdef is better in general. But I'm not
> sure for this case as it is too Linux implementation specific. i.e.
> Linux version is implemented pd_free_memory() to conditionally commit
> after comparing with page size. If Linux pd_free_memory() becomes
> blank or the condition is changed, the decision method also should be
> changed which seems not worth for me. This is why I stopped
> considering it.
>
Ok, I don't see things changing as a big problem, you could let the new
capability just return the same as can_commit_large_page_memory() for
Linux and have the other platforms return true. This would have the same
maintenance requirements as the current solution in my eyes.
>>
>> Regarding the comment, I'm not sure what you mean by "pin region". I
>> might be missing something but I think the comment need more
>> information to be easier to understand.
> Looking at the ReservedHeapSpace::try_reserve_heap() line 314, there
> is a comment about current allocation.
> // If OS doesn't support demand paging for large page memory, we need
> // to use reserve_memory_special() to reserve and pin the entire region.
>
> And I agree adding more information is better.
> How about this? (I will also update the comment at test)
> - // If we are using pin region, we cannot change the page size to
> default size
> - // as we could free memory which is not expected for pin region
> in Linux.
>
> + // If we are using pin region which is reserved and pinned the
> entire region,
> + // we cannot change the page size to default size as we could
> free memory
> + // which is not expected for pin region in Linux.
>
Ok, I see what you mean. Just never thought about it as "pin region". I
would say something like:
// Changing the page size below can lead to freeing of memory. When
using large pages
// and the memory has been both reserved and committed, some
platforms do not support
// freeing parts of it. For those platforms we fail initialization.
if (UseLargePages && !os::can_free_parts_of_large_page_memory()) {
vm_exit_during_initialization("Failed initializing NUMA. Too
small heap size");
}
What do you think about that?
Thanks,
Stefan
> Let me upload webrev after our discussion ended.
>
> Thanks,
> Sangheon
>
>
>>
>> Thanks,
>> Stefan
>>
>>> Testing: JPRT, manual test on NUMA + large page supported machine.
>>>
>>> Thanks,
>>> Sangheon
>>>
>>> [1]:
>>> // With SHM and HugeTLBFS large pages we cannot uncommit a page, so
>>> there's no way
>>> // we can make the adaptive lgrp chunk resizing work. If the user
>>> specified
>>> // both UseNUMA and UseLargePages (or UseSHM/UseHugeTLBFS) on the
>>> command line - warn and
>>> // disable adaptive resizing.
>>
>
More information about the hotspot-gc-dev
mailing list