Large page use crashes the JVM on some Linux systems

B. Blaser bsrbnd at gmail.com
Wed Apr 25 12:00:41 UTC 2018


[further private conversation summary]

On 24 April 2018 at 21:15, B. Blaser <bsrbnd at gmail.com> wrote:
> On 24 April 2018 at 11:47, Claes Redestad <claes.redestad at oracle.com> wrote:
>> Hi Bernard,
>>
>> On 2018-04-24 11:27, B. Blaser wrote:
>>> Hi Claes,
>>>
>>> Thanks for your feedback, I'll try to improve the fix as suggested.
>>
>> someone pointed out we already do a sanity check similar to the one you're
>> proposing..
>>
>> src/hotspot/os/linux/os_linux.cpp:
>>
>> bool os::Linux::hugetlbfs_sanity_check(bool warn, size_t page_size) {
>>   [...]
>> }
>>
>> It seems it'll warn only if you explicitly use -XX:+UseHugeTLBFS.
>> -XX:+UseLargePages
>> on linux first attempts to use UseHugeTLBFS, then falls back to -XX:+UseSHM.
>>
>> ... what errors do you see on your system when you run -version with
>> -XX:+UseLargePages,
>> -XX:+UseHugeTLBFS and -XX:+UseSHM respectively? Most systems aren't
>> configured to
>> use HugeTLBFS, so my guess is your system actually has an issue with
>> UseSHM...
>
> I'm aware of this sanity check. The problem is that on my system
> 'mmap()' always fails and then the JVM attempts to use SHM instead.
> I'll check more deeply my configuration and read twice the kernel vm doc:
>
> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
>
> but, in short terms, both 'mmap()' and SHM can access large pages (2Mb
> on my computer) but it has to be enabled (also with SHM) which doesn't
> seem to be the case by default.
>
> So, to answer your questions:
> 1) -XX:+UseLargePages and -XX:+UseHugeTLBFS have the same effect than
> -XX:UseSHM because 'mmap' nicely complains when trying to use huge TLB
> and then SHM is used instead.
> 2) unfortunately, SHM doesn't complain (no problem when calling
> 'shmget' or 'shmat') but the allocated memory isn't aligned with the
> large page size (2Mb) which crashes the JVM (SHM probably allocates
> memory using the default page size even if requesting 2Mb pages -
> which I have to verify).
>
> In conclusion, the current JVM behavior of trying to use SHM if
> 'mmap()' fails seems to be brittle.
>
> I think, we have to check if large pages are supported/enabled when
> starting the JVM.
> Probably checking '/proc/meminfo' - '/proc/filesystems' -
> '/proc/sys/vm/nr_hugepages' would be faster than calling 'mmap()'.
>
> I'll read again the kernel doc, but I think calling 'mmap()' is a
> robust "slow" way to see if large pages can be used but I agree that
> it doesn't tell if they are not *enabled* or not *supported*.
>
> What do you think we should do?
>
> Bernard
>
>> /Claes
>>
>>> Thanks,
>>> Bernard

---------------------------------------------------------

On 24 April 2018 at 21:39, Claes Redestad <claes.redestad at oracle.com> wrote:
 > The root issue here could very well be that the SHM sanity test is
 > insufficient. Adding the same test as we already do for TLBFS seems like the
 > wrong approach.
 >
 > I'm not the most knowledgeable about SHM, though, in fact not knowledgeable
 > at all, so let's try and get you subscribed to hotspot-dev and spark a
 > discussion on the list.
 >
 > /Claes

In concrete terms (on my system):

 $ grep "hugetlbfs" /proc/filesystems
 nodev   hugetlbfs

 $ grep -e "HugePages_" -e "Hugepagesize" /proc/meminfo
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB

 Which means that huge pages are supported but not configured.

 $ ./build/linux-x86_64-normal-server-release/jdk/bin/java
 -XX:+UseLargePages -version
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  Internal Error (g1PageBasedVirtualSpace.cpp:49), pid=2914, tid=2915
 #  guarantee(is_aligned(rs.base(), page_size)) failed: Reserved space
 base 0x00007f5c20b10000 is not aligned to requested page size 2097152
 #
 # JRE version:  (11.0) (build )
 # Java VM: OpenJDK 64-Bit Server VM (11-internal+0-adhoc.devel.jdk,
 mixed mode, aot, tiered, compressed oops, g1 gc, linux-amd64)
 # Core dump will be written. Default location: core.2914 (may not exist)
 #
 # An error report file with more information is saved as:
 # /home/****/jdk/hs_err_pid2914.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.java.com/bugreport/crash.jsp
 #
 Aborted (core dumped)

 $ ./build/linux-x86_64-normal-server-release/jdk/bin/java
 -XX:+UseHugeTLBFS -version
 OpenJDK 64-Bit Server VM warning: HugeTLBFS is not supported by the
 operating system.
 openjdk version "11-internal" 2018-09-25
 OpenJDK Runtime Environment (build 11-internal+0-adhoc.devel.jdk)
 OpenJDK 64-Bit Server VM (build 11-internal+0-adhoc.devel.jdk, mixed mode)

 $ ./build/linux-x86_64-normal-server-release/jdk/bin/java -XX:+UseSHM -version
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  Internal Error (g1PageBasedVirtualSpace.cpp:49), pid=2974, tid=2975
 #  guarantee(is_aligned(rs.base(), page_size)) failed: Reserved space
 base 0x00007f8a06890000 is not aligned to requested page size 2097152
 #
 # JRE version:  (11.0) (build )
 # Java VM: OpenJDK 64-Bit Server VM (11-internal+0-adhoc.devel.jdk,
 mixed mode, aot, tiered, compressed oops, g1 gc, linux-amd64)
 # Core dump will be written. Default location: core.2974 (may not exist)
 #
 # An error report file with more information is saved as:
 # /home/****/jdk/hs_err_pid2974.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.java.com/bugreport/crash.jsp
 #
 Aborted (core dumped)

 So, I guess the least the JVM should do is unconditionally disabling
 large page use when starting if 'HugePages_Total:       0' in
 '/proc/meminfo'.

 But I'll investigate what can be done to improve SHM sanity check too.

 Or maybe someone on hotspot-dev would have another idea?

 Bernard

---------------------------------------------------------

On 23 April 2018 at 11:18, Claes Redestad <claes.redestad at oracle.com> wrote:
> [ /bcc amber-dev, /cc hotspot-dev ]
>
> Hi,
>
> unconditionally mapping and unmapping a large page on startup seems
> sub-optimal to me - could this be checked directly after
> -XX:+UseLargePages flag has been parsed?
>
> I'd also note that explicitly configured large pages are typically a limited
> resource: does this test distinguish between a failure due the system not
> supporting the feature and a failure due not having any free pages left?
> Printing a "UseLargePages is unsupported" message in the latter case
> would be misleading.
>
> I wonder if checking something like /proc/meminfo for HugePages_* is a
> more robust way to probe capabilities, and also whether this is more
> suited as a test harness feature, i.e., enhance jtreg and tag these tests
> so that they're ignored on systems that doesn't have any/enough huge
> pages.
>
> Thanks!
>
> /Claes
>
>
> On 2018-04-22 23:18, B. Blaser wrote:
>>
>> [ I've trouble subscribing to hotspot-dev, please forward if necessary. ]
>>
>> Hi,
>>
>> After a clean build, some hotspot tests related to large page use are
>> failing on my 64-bit Linux system, for example:
>>
>> gc/g1/TestLargePageUseForAuxMemory.java
>> [...]
>>
>> Or simply:
>>
>> $ ./build/linux-x86_64-normal-server-release/images/jdk/bin/java
>> -XX:+UseLargePages -version
>>
>> is crashing the JVM because the latter assumes that large pages are
>> always supported on Linux, which appears to be wrong.
>>
>> I suggest to make sure that large pages are supported when parsing the
>> arguments, as below.
>>
>> Does this look reasonable (tier1 looks better now)?
>>
>> Thanks,
>> Bernard
>>
>> diff -r 8c85a1855e10 src/hotspot/share/runtime/arguments.cpp
>> --- a/src/hotspot/share/runtime/arguments.cpp Fri Apr 13 11:14:49 2018
>> -0700
>> +++ b/src/hotspot/share/runtime/arguments.cpp Sun Apr 22 20:29:21 2018
>> +0200
>> @@ -60,6 +60,7 @@
>>   #include "utilities/defaultStream.hpp"
>>   #include "utilities/macros.hpp"
>>   #include "utilities/stringUtils.hpp"
>> +#include "sys/mman.h"
>>   #if INCLUDE_JVMCI
>>   #include "jvmci/jvmciRuntime.hpp"
>>   #endif
>> @@ -4107,6 +4108,18 @@
>>     UNSUPPORTED_OPTION(UseLargePages);
>>   #endif
>>
>> +#ifdef LINUX
>> +  void *p = mmap(NULL, os::large_page_size(), PROT_READ|PROT_WRITE,
>> +                 MAP_ANONYMOUS|MAP_PRIVATE|MAP_HUGETLB,
>> +                 -1, 0);
>> +  if (p != MAP_FAILED) {
>> +    munmap(p, os::large_page_size());
>> +  }
>> +  else {
>> +    UNSUPPORTED_OPTION(UseLargePages);
>> +  }
>> +#endif
>> +
>>     ArgumentsExt::report_unsupported_options();
>>
>>   #ifndef PRODUCT
>> diff -r 8c85a1855e10
>> test/hotspot/jtreg/runtime/memory/LargePages/TestLargePagesFlags.java
>> ---
>> a/test/hotspot/jtreg/runtime/memory/LargePages/TestLargePagesFlags.java
>> Fri Apr 13 11:14:49 2018 -0700
>> +++
>> b/test/hotspot/jtreg/runtime/memory/LargePages/TestLargePagesFlags.java
>> Sun Apr 22 20:29:21 2018 +0200
>> @@ -37,7 +37,7 @@
>>   public class TestLargePagesFlags {
>>
>>     public static void main(String [] args) throws Exception {
>> -    if (!Platform.isLinux()) {
>> +      if (!Platform.isLinux() || !canUse(UseLargePages(true))) {
>>         System.out.println("Skipping. TestLargePagesFlags has only been
>> implemented for Linux.");
>>         return;
>>       }
>
>


More information about the hotspot-dev mailing list