RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)

Thomas Stüfe thomas.stuefe at gmail.com
Tue Apr 21 16:31:42 UTC 2015


Hi Stefan,

thanks for reviewing!

Here is the new webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/

I worked in all your remarks except one: the test unfortunately cannot just
assert if allocation fails, because that may happen sometimes, especially
with small allocation sizes
(os::Linux::reserve_memory_special_huge_tlbfs_mixed()
will refuse to allocate if the area cannot accomodate one large page). The
test also did not assert prior to my change, so I did not want to change
that behaviour.

I changed some more things:

1) Before my change, os::Linux::reserve_memory_special_huge_tlbfs_mixed()
would call os::reserve_memory() or os::reserve_memory_aligned(), depending
on whether req_addr was NULL or not. That was a bit unclean (a lower-level
API calling a high-level API) and led to the weird workaround at
os_linux.cpp:3497, where the NMT Memory tracker had to deregister a
allocation which it expected to be registered again later at an upper
level. Yikes.

Because I changed the first case to use low-level mmap, I also changed the
second - replaced os::reserve_memory_aligned with low level coding.
Actually I merged both branches together as good as possible, so the new
code (os_linux.cpp:3484 ff) now handles alignment as well: If req_addr is
set, allocation is attempted at req_addr and alignment is ignored. If
req_addr is NULL, alignment is honored. This way I could get rid of the
workaround for the NMT completely.

2) I completely rewrote the test function. It now tests three scenarios,
see comment. I hope the code is clear and easy to understand.

Thanks!

..Thomas


On Wed, Apr 15, 2015 at 1:55 PM, Stefan Karlsson <stefan.karlsson at oracle.com
> wrote:

> Hi Thomas,
>
> Thanks for fixing this  bug.
>
> Some comments inlined:
>
> On 2015-04-09 17:08, Thomas Stüfe wrote:
>
>> Hi all,
>>
>> please review this fix to huge page allocation on Linux.
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8077276
>> webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
>>
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html
>
> +  const int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC :
> PROT_READ|PROT_WRITE;
>     char* start;
>    if (req_addr != NULL) {
>      assert(is_ptr_aligned(req_addr, alignment), "Must be");
>      assert(is_size_aligned(bytes, alignment), "Must be");
> -    start = os::reserve_memory(bytes, req_addr);
> -    assert(start == NULL || start == req_addr, "Must be");
> +    // do NOT use MAP_FIXED here - do not trash existing mappings.
> +    start = (char*) ::mmap(req_addr, bytes, prot,
> +      MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
> +    if (start == MAP_FAILED) {
> +      start = NULL;
> +    }
> +    if (!is_ptr_aligned(start, alignment)) {
> +      ::munmap(start, bytes);
> +      start = NULL;
> +    }
>
>
> Previously, the small page area was mapped with PROT_NONE, but now you
> change it to RW. Is there a reason why you changed that?
>
>
>    result = ::mmap(lp_start, lp_bytes, prot,
>                    MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_HUGETLB,
>                    -1, 0);
>    if (result == MAP_FAILED) {
> -    warn_on_large_pages_failure(req_addr, bytes, errno);
> +    warn_on_large_pages_failure(start, bytes, errno);
>
> Shouldn't we report lp_start instead of start here?
>
>
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp
>
> There are a number of things that I would like to change with the the test.
>
> Can you use mapping == NULL in null checks instead of just if (mapping).
> The same for the asserts.
>
> I think the test will be easier to understand if you get rid of the for
> loop and extract code into helper functions, or do a small amount of code
> duplication:
>
>     // do the test twice: once with a req_addr=NULL, once with a non-NULL
>     // req_addr; the latter shall test if we trash existing mappings
>     for (int run = 0; run < 2; run ++) {
>
> I think you should fail or exit the test if you can't get the expected
> mapping, instead of continuing and have :
>
>       if (addr) {
>         test_log("... mapped [" PTR_FORMAT "-" PTR_FORMAT ").", addr, addr
> + size);
>       } else {
>         test_log("... failed.");
>       }
>
>       if (addr != NULL) {
>
> Thanks,
> StefanK
>
>
>
>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() first establishes a
>> mapping with small pages over the whole requested range, then exchanges
>> the
>> parts of the mapping which are aligned to large page size with large
>> pages.
>>
>> Now, when establishing the first mapping, it uses os::reserve_memory()
>> with
>> a potentially non-null req_addr. In that case, os::reserve_memory() will
>> do
>> a mmap(MAP_FIXED).
>>
>> This will trash any pre-existing mappings.
>>
>> Note that I could not willingly reproduce the error with an unmodified VM.
>> But I added a reproduction case which demonstrated that if one were to
>> call
>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() with a non-NULL
>> req_addr, existing mappings would be trashed. Depending on where we are in
>> address space, we also would overwrite libc structures and crash
>> immediately.
>>
>> The repro case is part of the change, see changes in
>> test_reserve_memory_special_huge_tlbfs_mixed(), and can be executed with:
>>
>> ./images/jdk/bin/java -XX:+UseLargePages -XX:+UseHugeTLBFS -XX:-UseSHM
>> -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
>>
>> The fix: instead of using os::reserve_memory(),
>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() now calls mmap()
>> directly with the non-NULL req_addr, but without MAP_FIXED. This means the
>> OS may do its best to allocate at req_addr, but will not trash any
>> mappings. This also follows the pattern in
>> os::Linux::reserve_memory_special_huge_tlbfs_only(), which is a sister
>> function of os::Linux::reserve_memory_special_huge_tlbfs_mixed().
>>
>> Note also discussion at:
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html
>> .
>>
>> Thanks for reviewing!
>>
>> Kind Regards, Thomas
>>
>
>


More information about the hotspot-runtime-dev mailing list