RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)

Stefan Karlsson stefan.karlsson at oracle.com
Wed Apr 22 12:20:13 UTC 2015


Hi Thomas,

On 2015-04-21 18:31, Thomas Stüfe wrote:
> Hi Stefan,
>
> thanks for reviewing!
>
> Here is the new webrev: 
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/ 
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/>
>
> I worked in all your remarks except one: the test unfortunately cannot 
> just assert if allocation fails, because that may happen sometimes, 
> especially with small allocation sizes 
> (os::Linux::reserve_memory_special_huge_tlbfs_mixed() will refuse to 
> allocate if the area cannot accomodate one large page). The test also 
> did not assert prior to my change, so I did not want to change that 
> behaviour.
>
> I changed some more things:
>
> 1) Before my change, 
> os::Linux::reserve_memory_special_huge_tlbfs_mixed() would call 
> os::reserve_memory() or os::reserve_memory_aligned(), depending on 
> whether req_addr was NULL or not. That was a bit unclean (a 
> lower-level API calling a high-level API) and led to the weird 
> workaround at os_linux.cpp:3497, where the NMT Memory tracker had to 
> deregister a allocation which it expected to be registered again later 
> at an upper level. Yikes.
>
> Because I changed the first case to use low-level mmap, I also changed 
> the second - replaced os::reserve_memory_aligned with low level 
> coding. Actually I merged both branches together as good as possible, 
> so the new code (os_linux.cpp:3484 ff) now handles alignment as well: 
> If req_addr is set, allocation is attempted at req_addr and alignment 
> is ignored. If req_addr is NULL, alignment is honored. This way I 
> could get rid of the workaround for the NMT completely.
>
> 2) I completely rewrote the test function. It now tests three 
> scenarios, see comment. I hope the code is clear and easy to understand.

http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html

---
+  start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
+    MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
+  if (start == MAP_FAILED) {
      return NULL;
    }

You've changed the behavior of this function. It used to return NULL if 
reserve_memory couldn't reserve memory at the requested address. Your 
proposed patch can return another address. This will be a problem for 
the upper layers when using compressed oops. I think it would be better 
if you keep the old behavior.

---
Instead of inlining the reserve memory code into 
reserve_memory_special_huge_tlbfs_mixed, could you extract it into 
function? So that we have one function that's responsible for reservinb 
PROT_NONE memory, and one that commits the small, large, small pages.

---
-  size_t large_page_size = os::large_page_size();
+  char* start;

+  size_t large_page_size = os::large_page_size();

There's no need anymore to declare 'start' before it's set. So this 
should be incorporated into:

3489   start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
3490     MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);

---
3480   // first reserve - but not commit - the address range in small pages.
...
3529   // Commit small-paged leading area

Could you start your comments with a capital and with a period?

---
3481   // Note: if a wish address was given

There's no other reference to a "wish address". Could you reword it to 
be req_addr or "requested address"?

---
3481   // Note: if a wish address was given, we still do not use 
MAP_FIXED - do not want to trash
3482   // existing mappings. We hand down req_addr to mmap and hope the 
OS does its best.

Could you move this comment down to the mmap call?

---
3529   // Commit small-paged leading area
...
3540   // Commit large paged area

Shouldn't it say large-paged as well?

---
3483   // Note2: if wish address was given, alignment is ignored

Previously, we asserted that our in-parameters were aligned. Why did you 
replace that with a comment?

---
+  size_t extra_size = ...
+    char* const end0 = start0 + bytes;
+    char* const end = start + extra_size;

It's not obvious from the variable names what they represent, IMHO. 
Could you try to give them more descriptive names?

I'll take a look at the tests after next review round.

Thanks,
StefanK

>
> Thanks!
>
> ..Thomas
>
>
> On Wed, Apr 15, 2015 at 1:55 PM, Stefan Karlsson 
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     Thanks for fixing this  bug.
>
>     Some comments inlined:
>
>     On 2015-04-09 17:08, Thomas Stüfe wrote:
>
>         Hi all,
>
>         please review this fix to huge page allocation on Linux.
>
>         bug: https://bugs.openjdk.java.net/browse/JDK-8077276
>         webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
>         <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/>
>
>
>     http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html
>     <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html>
>
>     +  const int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC :
>     PROT_READ|PROT_WRITE;
>         char* start;
>        if (req_addr != NULL) {
>          assert(is_ptr_aligned(req_addr, alignment), "Must be");
>          assert(is_size_aligned(bytes, alignment), "Must be");
>     -    start = os::reserve_memory(bytes, req_addr);
>     -    assert(start == NULL || start == req_addr, "Must be");
>     +    // do NOT use MAP_FIXED here - do not trash existing mappings.
>     +    start = (char*) ::mmap(req_addr, bytes, prot,
>     +      MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>     +    if (start == MAP_FAILED) {
>     +      start = NULL;
>     +    }
>     +    if (!is_ptr_aligned(start, alignment)) {
>     +      ::munmap(start, bytes);
>     +      start = NULL;
>     +    }
>
>
>     Previously, the small page area was mapped with PROT_NONE, but now
>     you change it to RW. Is there a reason why you changed that?
>
>
>        result = ::mmap(lp_start, lp_bytes, prot,
>      MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_HUGETLB,
>                        -1, 0);
>        if (result == MAP_FAILED) {
>     -    warn_on_large_pages_failure(req_addr, bytes, errno);
>     +    warn_on_large_pages_failure(start, bytes, errno);
>
>     Shouldn't we report lp_start instead of start here?
>
>
>     http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp
>     <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp>
>
>     There are a number of things that I would like to change with the
>     the test.
>
>     Can you use mapping == NULL in null checks instead of just if
>     (mapping). The same for the asserts.
>
>     I think the test will be easier to understand if you get rid of
>     the for loop and extract code into helper functions, or do a small
>     amount of code duplication:
>
>         // do the test twice: once with a req_addr=NULL, once with a
>     non-NULL
>         // req_addr; the latter shall test if we trash existing mappings
>         for (int run = 0; run < 2; run ++) {
>
>     I think you should fail or exit the test if you can't get the
>     expected mapping, instead of continuing and have :
>
>           if (addr) {
>             test_log("... mapped [" PTR_FORMAT "-" PTR_FORMAT ").",
>     addr, addr + size);
>           } else {
>             test_log("... failed.");
>           }
>
>           if (addr != NULL) {
>
>     Thanks,
>     StefanK
>
>
>
>         os::Linux::reserve_memory_special_huge_tlbfs_mixed() first
>         establishes a
>         mapping with small pages over the whole requested range, then
>         exchanges the
>         parts of the mapping which are aligned to large page size with
>         large pages.
>
>         Now, when establishing the first mapping, it uses
>         os::reserve_memory() with
>         a potentially non-null req_addr. In that case,
>         os::reserve_memory() will do
>         a mmap(MAP_FIXED).
>
>         This will trash any pre-existing mappings.
>
>         Note that I could not willingly reproduce the error with an
>         unmodified VM.
>         But I added a reproduction case which demonstrated that if one
>         were to call
>         os::Linux::reserve_memory_special_huge_tlbfs_mixed() with a
>         non-NULL
>         req_addr, existing mappings would be trashed. Depending on
>         where we are in
>         address space, we also would overwrite libc structures and crash
>         immediately.
>
>         The repro case is part of the change, see changes in
>         test_reserve_memory_special_huge_tlbfs_mixed(), and can be
>         executed with:
>
>         ./images/jdk/bin/java -XX:+UseLargePages -XX:+UseHugeTLBFS
>         -XX:-UseSHM
>         -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
>
>         The fix: instead of using os::reserve_memory(),
>         os::Linux::reserve_memory_special_huge_tlbfs_mixed() now calls
>         mmap()
>         directly with the non-NULL req_addr, but without MAP_FIXED.
>         This means the
>         OS may do its best to allocate at req_addr, but will not trash any
>         mappings. This also follows the pattern in
>         os::Linux::reserve_memory_special_huge_tlbfs_only(), which is
>         a sister
>         function of os::Linux::reserve_memory_special_huge_tlbfs_mixed().
>
>         Note also discussion at:
>         http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html.
>
>         Thanks for reviewing!
>
>         Kind Regards, Thomas
>
>
>



More information about the hotspot-runtime-dev mailing list