RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)
Stefan Karlsson
stefan.karlsson at oracle.com
Wed Apr 22 12:20:13 UTC 2015
Hi Thomas,
On 2015-04-21 18:31, Thomas Stüfe wrote:
> Hi Stefan,
>
> thanks for reviewing!
>
> Here is the new webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/>
>
> I worked in all your remarks except one: the test unfortunately cannot
> just assert if allocation fails, because that may happen sometimes,
> especially with small allocation sizes
> (os::Linux::reserve_memory_special_huge_tlbfs_mixed() will refuse to
> allocate if the area cannot accomodate one large page). The test also
> did not assert prior to my change, so I did not want to change that
> behaviour.
>
> I changed some more things:
>
> 1) Before my change,
> os::Linux::reserve_memory_special_huge_tlbfs_mixed() would call
> os::reserve_memory() or os::reserve_memory_aligned(), depending on
> whether req_addr was NULL or not. That was a bit unclean (a
> lower-level API calling a high-level API) and led to the weird
> workaround at os_linux.cpp:3497, where the NMT Memory tracker had to
> deregister a allocation which it expected to be registered again later
> at an upper level. Yikes.
>
> Because I changed the first case to use low-level mmap, I also changed
> the second - replaced os::reserve_memory_aligned with low level
> coding. Actually I merged both branches together as good as possible,
> so the new code (os_linux.cpp:3484 ff) now handles alignment as well:
> If req_addr is set, allocation is attempted at req_addr and alignment
> is ignored. If req_addr is NULL, alignment is honored. This way I
> could get rid of the workaround for the NMT completely.
>
> 2) I completely rewrote the test function. It now tests three
> scenarios, see comment. I hope the code is clear and easy to understand.
http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html
---
+ start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
+ MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
+ if (start == MAP_FAILED) {
return NULL;
}
You've changed the behavior of this function. It used to return NULL if
reserve_memory couldn't reserve memory at the requested address. Your
proposed patch can return another address. This will be a problem for
the upper layers when using compressed oops. I think it would be better
if you keep the old behavior.
---
Instead of inlining the reserve memory code into
reserve_memory_special_huge_tlbfs_mixed, could you extract it into
function? So that we have one function that's responsible for reservinb
PROT_NONE memory, and one that commits the small, large, small pages.
---
- size_t large_page_size = os::large_page_size();
+ char* start;
+ size_t large_page_size = os::large_page_size();
There's no need anymore to declare 'start' before it's set. So this
should be incorporated into:
3489 start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
3490 MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
---
3480 // first reserve - but not commit - the address range in small pages.
...
3529 // Commit small-paged leading area
Could you start your comments with a capital and with a period?
---
3481 // Note: if a wish address was given
There's no other reference to a "wish address". Could you reword it to
be req_addr or "requested address"?
---
3481 // Note: if a wish address was given, we still do not use
MAP_FIXED - do not want to trash
3482 // existing mappings. We hand down req_addr to mmap and hope the
OS does its best.
Could you move this comment down to the mmap call?
---
3529 // Commit small-paged leading area
...
3540 // Commit large paged area
Shouldn't it say large-paged as well?
---
3483 // Note2: if wish address was given, alignment is ignored
Previously, we asserted that our in-parameters were aligned. Why did you
replace that with a comment?
---
+ size_t extra_size = ...
+ char* const end0 = start0 + bytes;
+ char* const end = start + extra_size;
It's not obvious from the variable names what they represent, IMHO.
Could you try to give them more descriptive names?
I'll take a look at the tests after next review round.
Thanks,
StefanK
>
> Thanks!
>
> ..Thomas
>
>
> On Wed, Apr 15, 2015 at 1:55 PM, Stefan Karlsson
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
> Hi Thomas,
>
> Thanks for fixing this bug.
>
> Some comments inlined:
>
> On 2015-04-09 17:08, Thomas Stüfe wrote:
>
> Hi all,
>
> please review this fix to huge page allocation on Linux.
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8077276
> webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/>
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html>
>
> + const int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC :
> PROT_READ|PROT_WRITE;
> char* start;
> if (req_addr != NULL) {
> assert(is_ptr_aligned(req_addr, alignment), "Must be");
> assert(is_size_aligned(bytes, alignment), "Must be");
> - start = os::reserve_memory(bytes, req_addr);
> - assert(start == NULL || start == req_addr, "Must be");
> + // do NOT use MAP_FIXED here - do not trash existing mappings.
> + start = (char*) ::mmap(req_addr, bytes, prot,
> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
> + if (start == MAP_FAILED) {
> + start = NULL;
> + }
> + if (!is_ptr_aligned(start, alignment)) {
> + ::munmap(start, bytes);
> + start = NULL;
> + }
>
>
> Previously, the small page area was mapped with PROT_NONE, but now
> you change it to RW. Is there a reason why you changed that?
>
>
> result = ::mmap(lp_start, lp_bytes, prot,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_HUGETLB,
> -1, 0);
> if (result == MAP_FAILED) {
> - warn_on_large_pages_failure(req_addr, bytes, errno);
> + warn_on_large_pages_failure(start, bytes, errno);
>
> Shouldn't we report lp_start instead of start here?
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp>
>
> There are a number of things that I would like to change with the
> the test.
>
> Can you use mapping == NULL in null checks instead of just if
> (mapping). The same for the asserts.
>
> I think the test will be easier to understand if you get rid of
> the for loop and extract code into helper functions, or do a small
> amount of code duplication:
>
> // do the test twice: once with a req_addr=NULL, once with a
> non-NULL
> // req_addr; the latter shall test if we trash existing mappings
> for (int run = 0; run < 2; run ++) {
>
> I think you should fail or exit the test if you can't get the
> expected mapping, instead of continuing and have :
>
> if (addr) {
> test_log("... mapped [" PTR_FORMAT "-" PTR_FORMAT ").",
> addr, addr + size);
> } else {
> test_log("... failed.");
> }
>
> if (addr != NULL) {
>
> Thanks,
> StefanK
>
>
>
> os::Linux::reserve_memory_special_huge_tlbfs_mixed() first
> establishes a
> mapping with small pages over the whole requested range, then
> exchanges the
> parts of the mapping which are aligned to large page size with
> large pages.
>
> Now, when establishing the first mapping, it uses
> os::reserve_memory() with
> a potentially non-null req_addr. In that case,
> os::reserve_memory() will do
> a mmap(MAP_FIXED).
>
> This will trash any pre-existing mappings.
>
> Note that I could not willingly reproduce the error with an
> unmodified VM.
> But I added a reproduction case which demonstrated that if one
> were to call
> os::Linux::reserve_memory_special_huge_tlbfs_mixed() with a
> non-NULL
> req_addr, existing mappings would be trashed. Depending on
> where we are in
> address space, we also would overwrite libc structures and crash
> immediately.
>
> The repro case is part of the change, see changes in
> test_reserve_memory_special_huge_tlbfs_mixed(), and can be
> executed with:
>
> ./images/jdk/bin/java -XX:+UseLargePages -XX:+UseHugeTLBFS
> -XX:-UseSHM
> -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
>
> The fix: instead of using os::reserve_memory(),
> os::Linux::reserve_memory_special_huge_tlbfs_mixed() now calls
> mmap()
> directly with the non-NULL req_addr, but without MAP_FIXED.
> This means the
> OS may do its best to allocate at req_addr, but will not trash any
> mappings. This also follows the pattern in
> os::Linux::reserve_memory_special_huge_tlbfs_only(), which is
> a sister
> function of os::Linux::reserve_memory_special_huge_tlbfs_mixed().
>
> Note also discussion at:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html.
>
> Thanks for reviewing!
>
> Kind Regards, Thomas
>
>
>
More information about the hotspot-runtime-dev
mailing list