RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)

Fri Apr 24 09:41:35 UTC 2015

Hi Stefan,

ok. I see it makes sense to limit the scope of the change as much as
possible to the actual error and do semantic changes separately.

I worked for a long time with os::reserve_memory() and friends, especially
because I had to port it to AIX here at SAP. That was annoyingly difficult
because of AIX specifics, so it basically ended up being a complete new
implementation. I guess that is the reason I am a bit "looser" around this
coding and tend to change too much - hard to get this "ownership" feeling
out of the bones.

On a side note, I am quite happy now that SAP is contributing to the
OpenJDK, so now I have the chance to actually bring some changes upstream :)

Ok, what you suggest makes sense and I will change the fix accordingly.

Some more remarks inline:

On Thu, Apr 23, 2015 at 9:53 PM, Stefan Karlsson <stefan.karlsson at oracle.com
> wrote:

>  Hi Thomas,
>
> On 2015-04-23 18:09, Thomas Stüfe wrote:
>
> Hi Stefan,
>
>  new webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.03/webrev/
>
>
> I think I've realized that we have different views on what the API for the
> reserve function/code should be.
>
> The code you have written has two more or less mutual exclusive execution
> modes:
> 1) req_addr != NULL => ignore the alignment
> 2) req_addr == NULL => adhere to the alignment
>
> this reflects in the code, the name of the new function and in the
> comments.
>
> If we take the previous implementation of the code, before your patches:
> 3482   if (req_addr != NULL) {
> 3483     assert(is_ptr_aligned(req_addr, alignment), "Must be");
> 3484     assert(is_size_aligned(bytes, alignment), "Must be");
> 3485     start = os::reserve_memory(bytes, req_addr);
> 3486     assert(start == NULL || start == req_addr, "Must be");
> 3487   } else {
> 3488     start = os::reserve_memory_aligned(bytes, alignment);
> 3489   }
> 3490
> 3491   if (start == NULL) {
> 3492     return NULL;
> 3493   }
> 3494
> 3495   assert(is_ptr_aligned(start, alignment), "Must be");
>
>
> If we would have extracted that into a function then the API would have
> been:
> INPUT:
> alignment - requested alignment
>

So... alignment can be 0 or not? It was never tested, the old code somewhat
assumes that it would never be 0.

> req_addr - requested address (must be 'alignment' aligned)
> bytes - the size of the memory chunk to reserve (must be [arguably]
> 'alignment' aligned)
>

Almost. bytes had to be aligned to alignment only for the req_addr != NULL
case, which makes not much sense.

> OUTPUT:
> address - a memory chunk that is 'alignment' aligned and 'bytes' large,
> that starts at 'req_addr' if 'req_addr' is given.  Or NULL at failures.
>
> Your current code is written to allow for a broader use-case that the
> current code doesn't need, where the req_addr doesn't have to be
> 'alignment' aligned. I would much prefer if you kept the INPUT and OUPUT
> requirements above and made sure that the function implemented those. And
> _if_ we need to alter the current restrictions, we go back and do it as a
> separate change.
>
>
ok!

> The code code be written as (not tested with a compiler):
>
> static char* anon_mmap_aligned(size_t bytes, size_t alignment, char*
> req_addr) {
>   // Input requirements.
>   assert(is_ptr_aligned(req_addr, alignment), "Must be");
>   assert(is_size_aligned(bytes, alignment), "Must be");
>
>   size_t extra_size = bytes;
>   if (alignment > 0) {
>     extra_size += alignment;
>   }
>
>   char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
>                                MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
>                                -1, 0);
>   if (start == MAP_FAILED) {
>     return NULL;
>   }
>
>   if (start != req_addr) {
>     ::munmap(start, extra_size);
>     return NULL;
>   }
>
>   if (alignment > 0) {
>     char* const start_aligned = (char*) align_ptr_up(start, alignment);
>     char* const end_aligned = start_aligned + bytes;
>     char* const end = start + extra_size;
>     if (start_aligned > start) {
>       ::munmap(start, start_aligned - start);
>     }
>     if (end_aligned < end) {
>       ::munmap(end_aligned, end - end_aligned);
>     }
>     start = start_aligned;
>   }
>
>   // Output requirements.
>   assert(req_addr == NULL || start == req_addr, "Must be");
>   assert(is_ptr_aligned(start, alignment), "Must be");
>   return start;
> }
>
> and we could even get rid of the alignment > 0 checks, if we want to:
>
>
>
No, because you'd change the behaviour in case req_addr != NULL compared to
the old coding. You would always mmap more than needed, then discard the
rest. Aside from being unnecessary, this would also change mapping
placement, because OS must fit a larger mapping.

static char* anon_map_aligned(size_t bytes, size_t alignment, char*
> req_addr) {
>   // Input requirements.
>   assert(is_ptr_aligned(req_addr, alignment), "Must be");
>   assert(is_size_aligned(bytes, alignment), "Must be");
>
>   size_t extra_size = bytes + alignment;
>
>   char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
>                                MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
>                                -1, 0);
>   if (start == MAP_FAILED) {
>     return NULL;
>   }
>
>   if (start != req_addr) {
>     ::munmap(start, extra_size);
>     return NULL;
>   }
>
>   // Potentially adjust to get an aligned mmapping.
>   char* const start_aligned = (char*) align_ptr_up(start, alignment);
>   char* const end_aligned = start_aligned + bytes;
>   char* const end = start + extra_size;
>   if (start_aligned > start) {
>     ::munmap(start, start_aligned - start);
>   }
>   if (end_aligned < end) {
>     ::munmap(end_aligned, end - end_aligned);
>   }
>   start = start_aligned;
>
>   // Output requirements.
>   assert(req_addr == NULL || start == req_addr, "Must be");
>   assert(is_ptr_aligned(start, alignment), "Must be");
>   return start;
> }
>
> And as a side-note, even with these restrictions, we're not far from
> implementing anon_mmap as:
>
> static char* anon_mmap(char* requested_addr, size_t bytes, bool fixed) {
>   return anon_mmap_aligned(bytes, 0, requested_addr);
> }
>
> we just need to cater for the fixed flag. (The input parameters are in
> different order in the current patch. That should probably be fixed.).
>
> ---
>
> Smaller style issues:
>
> - Don't NULL-check with if (p), use if (p != NULL)
> - i ++ should be written as i++
>
>
ok, will change.

> Thanks,
> StefanK
>
>
Thanks, Thomas

>
>  Best Regards, Thomas
>
>
> On Thu, Apr 23, 2015 at 4:16 PM, Stefan Karlsson <
> stefan.karlsson at oracle.com> wrote:
>
>>  Hi Thomas,
>>
>>
>> On 2015-04-23 14:55, Thomas Stüfe wrote:
>>
>> Hi Stefan,
>>
>>  lets talk a bit more to get an understanding before I change the webrev
>> again:
>>
>> On Thu, Apr 23, 2015 at 10:48 AM, Stefan Karlsson <
>> stefan.karlsson at oracle.com> wrote:
>>
>>>
>>>
>>> On 2015-04-22 17:20, Thomas Stüfe wrote:
>>>
>>> Hi Stefan,
>>>
>>>  new webrev:
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.02/webrev/
>>>
>>>  See my answers inline.
>>>
>>> On Wed, Apr 22, 2015 at 2:20 PM, Stefan Karlsson <
>>> stefan.karlsson at oracle.com> wrote:
>>>
>>>>  Hi Thomas,
>>>>
>>>> On 2015-04-21 18:31, Thomas Stüfe wrote:
>>>>
>>>> Hi Stefan,
>>>>
>>>>  thanks for reviewing!
>>>>
>>>>  Here is the new webrev:
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/
>>>>
>>>>  I worked in all your remarks except one: the test unfortunately
>>>> cannot just assert if allocation fails, because that may happen sometimes,
>>>> especially with small allocation sizes (os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>> will refuse to allocate if the area cannot accomodate one large page).
>>>> The test also did not assert prior to my change, so I did not want to
>>>> change that behaviour.
>>>>
>>>>  I changed some more things:
>>>>
>>>>  1) Before my change, os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>> would call os::reserve_memory() or os::reserve_memory_aligned(), depending
>>>> on whether req_addr was NULL or not. That was a bit unclean (a lower-level
>>>> API calling a high-level API) and led to the weird workaround at
>>>> os_linux.cpp:3497, where the NMT Memory tracker had to deregister a
>>>> allocation which it expected to be registered again later at an upper
>>>> level. Yikes.
>>>>
>>>> Because I changed the first case to use low-level mmap, I also changed
>>>> the second - replaced os::reserve_memory_aligned with low level coding.
>>>> Actually I merged both branches together as good as possible, so the new
>>>> code (os_linux.cpp:3484 ff) now handles alignment as well: If req_addr is
>>>> set, allocation is attempted at req_addr and alignment is ignored. If
>>>> req_addr is NULL, alignment is honored. This way I could get rid of the
>>>> workaround for the NMT completely.
>>>>
>>>>  2) I completely rewrote the test function. It now tests three
>>>> scenarios, see comment. I hope the code is clear and easy to understand.
>>>>
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html
>>>>
>>>> ---
>>>> +  start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
>>>> +    MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>> +  if (start == MAP_FAILED) {
>>>>       return NULL;
>>>>    }
>>>>
>>>> You've changed the behavior of this function. It used to return NULL if
>>>> reserve_memory couldn't reserve memory at the requested address. Your
>>>> proposed patch can return another address. This will be a problem for the
>>>> upper layers when using compressed oops. I think it would be better if you
>>>> keep the old behavior.
>>>>
>>>
>>>  Arguably the new behaviour is better: The code which allocates the
>>> heap for compressed Oops actually deals with this case quite nicely by
>>> doing a range check for the address, not an identity check (see
>>> virtualspace.cpp "ReservedHeapSpace::try_reserve_range()"). The other code
>>> path in "ReservedSpace::initialize()" at least handles this case (if
>>> returned address is unequal to requested address, releases memory again).
>>> So if the OS cooperates by returning an address in the vicinity of the
>>> requested address, it would actually be better to use this address if it
>>> fits instead of retrying the allocation.
>>>
>>>  However, I am not completely sure about other callers of
>>> os::reserve_memory_special(), and changing the behaviour would be beyond
>>> the scope of my change anyway, so I restored the old behaviour as you
>>> wished. Now we either return the requested address or NULL if a requested
>>> address was given.
>>>
>>>
>>>  Sounds good to me.
>>>
>>>
>>  Ok, I'll keep it that way.
>>
>>>
>>>
>>>>
>>>> ---
>>>> Instead of inlining the reserve memory code into
>>>> reserve_memory_special_huge_tlbfs_mixed, could you extract it into
>>>> function? So that we have one function that's responsible for reservinb
>>>> PROT_NONE memory, and one that commits the small, large, small pages.
>>>>
>>>>
>>>  Ok, did this. Because there is already a anon_mmap(), which fits the
>>> req_addr != NULL case, I just reused this and added a sister function
>>> "anon_mmap_aligned" for the alignment case.
>>>
>>>
>>>  I didn't try to argue that you should split up the code, but rather
>>> move the entire reserve code that you wrote for webrev.01, and move it into
>>> a separate function. The code would then look like:
>>>
>>> char* start = reserve_memory_something(bytes, req_addr, alignment);
>>> if (start == NULL) {
>>>   return NULL;
>>> }
>>>
>>> and the reader of reserve_memory_special_huge_tlbfs_mixed could simply
>>> ignore the details of reserve_memory_something, unless they were important
>>> for some reason. That way you could also write separate tests for
>>> reserve_memory_something (or whatever it would be called).
>>>
>>> ---
>>> You are now using anon_mmap which has the comment:
>>>
>>> 3070     // anon_mmap() should only get called during VM initialization,
>>> 3071     // don't need lock (actually we can skip locking even it can be
>>> called
>>> 3072     // from multiple threads, because _highest_vm_reserved_address
>>> is just a
>>> 3073     // hint about the upper limit of non-stack memory regions.)
>>> 3074     if ((address)addr + bytes > _highest_vm_reserved_address) {
>>> 3075       _highest_vm_reserved_address = (address)addr + bytes;
>>> 3076     }
>>>
>>> So, it's not entirely clear to me that you should be using this
>>> function. I don't know what _highest_vm_reserved_address is used for. Your
>>> new function, anon_mmap_aligned isn't updating this variable.
>>>
>>>
>>  The comment is obsolete and does not reflect reality; anon_mmap is used
>> for os::reserve_memory(), which in turn can be used at any time in the life
>> of the VM.
>>
>>  The comment - and the associated variable
>> "_highest_vm_reserved_address" - only affects the LinuxThreads
>> implementation which is legacy and should be removed (I just did ask this
>> question on hotspot-dev and will do the cleanup if I get green light).
>>
>>  It is also broken since forever, since a lot of allocations happen in
>> VM and JDK where _highest_vm_reserved_address is just not updated.
>>
>>
>>  OK. It would be good to get rid of it at some point then.
>>
>>
>>
>>>  Could we go back and use the reserve code you had in webrev.01, so that
>>> we don't have to consider the impact of this?
>>>
>>>
>>  It actually changes less than my last webrev:
>>
>>  - old code was using os::reserve_memory() and
>> os::reserve_memory_aligned().
>>    os::reserve_memory() directly called anon_mmap().
>>    os::reserve_memory_aligned() called os::reserve_memory(), which
>> directly called anon_mmap().
>>
>>  - the new code:
>>     for the req_addr != NULL, calls anon_mmap directly, only difference
>> to before is that we now do not specify MAP_FIXED.
>>     for the req_addr==NULL case, it calls the new function
>> anon_mmap_aligned(), which did not exist before, so no compatibility issues
>> can arise.
>>
>>  I can live with both my webrev.01 proposal and this one, but the
>> solution to factor out all the allocation to a single function would be my
>> least favorite choice: I do not really like factoring out a function which
>> only exists to be called from a single other function, and which is too
>> awkward to be used any other way.
>>
>>
>>  The reason for moving it out to a function is to get better "separation
>> of concerns" and hide the implementation details, it doesn't really matter
>> if it's used only once.
>>
>>    And your "reserved_memory_something()"  would be quite awkward:
>> bundling mmap allocation for the req_addr case (which duplicates what
>> anon_mmap() does), plus adding support for allocating at an alignment. It
>> does not easily fit together imho.
>>
>>
>>  But you almost already had the "awkward" code in webrev.01, and it was
>> quite easy to understand. With the requirement that req_addr is aligned if
>> an alignment is passed in, then the code would be even easier. It could
>> probably be used to implement anon_mmap with just a few small changes. And
>> by doing that we would get rid of some code duplication as well. Not sure
>> it worth it though.
>>
>>
>>  Of course, it is all a matter of taste, and if you insist, I'll do it.
>> Most important is to fix the bug.
>>
>>
>>  OK. Then, please, extract it out to a separate function so that
>> reserve_memory_special_huge_tlbfs_mixed only have one call out to a
>> function that reserves the memory. It's OK if you want to implement it as a
>> anon_mmap/anon_mmap_aligned pair or inline the code (as in webrev.01), as
>> long as I don't have to see the code in
>> reserve_memory_special_huge_tlbfs_mixed.
>>
>> Thanks,
>> StefanK
>>
>>
>>
>>     ---
>>>> -  size_t large_page_size = os::large_page_size();
>>>> +  char* start;
>>>>
>>>> +  size_t large_page_size = os::large_page_size();
>>>>
>>>> There's no need anymore to declare 'start' before it's set. So this
>>>> should be incorporated into:
>>>>
>>>> 3489   start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
>>>> 3490     MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>
>>>> ---
>>>> 3480   // first reserve - but not commit - the address range in small
>>>> pages.
>>>> ...
>>>> 3529   // Commit small-paged leading area
>>>>
>>>> Could you start your comments with a capital and with a period?
>>>>
>>>
>>>  Done. Note that there are comments where the first word is a parameter
>>> name - if that name is lowercase, I would rather not write it upper case.
>>>
>>>
>>>  3082 // reserve small-paged memory, optionally aligned to given
>>> boundary.
>>> 3083 // alignment must be a multiple of allocation granularity. bytes
>>> must be
>>>
>>> and some in the test as well.
>>>
>>
>>  Okay, will change.
>>
>>>
>>>
>>>>
>>>> ---
>>>> 3481   // Note: if a wish address was given
>>>>
>>>> There's no other reference to a "wish address". Could you reword it to
>>>> be req_addr or "requested address"?
>>>>
>>>
>>>  Done.
>>>
>>>
>>>>
>>>> ---
>>>> 3481   // Note: if a wish address was given, we still do not use
>>>> MAP_FIXED - do not want to trash
>>>> 3482   // existing mappings. We hand down req_addr to mmap and hope the
>>>> OS does its best.
>>>>
>>>> Could you move this comment down to the mmap call?
>>>>
>>>
>>>  Done.
>>>
>>>
>>>>
>>>> ---
>>>> 3529   // Commit small-paged leading area
>>>> ...
>>>> 3540   // Commit large paged area
>>>>
>>>> Shouldn't it say large-paged as well?
>>>>
>>>
>>>  Done.
>>>
>>>
>>>>
>>>> ---
>>>> 3483   // Note2: if wish address was given, alignment is ignored
>>>>
>>>> Previously, we asserted that our in-parameters were aligned. Why did
>>>> you replace that with a comment?
>>>>
>>>>
>>>  Ok, I restored the asserts, but I really am not happy with that. I
>>> find the asserts way too strict and they limit the usefulness of the
>>> functions:
>>>
>>>  - "assert(is_ptr_aligned(req_addr, alignment), "Must be");"
>>>
>>>  Why must this be? Why do I even have to input an alignment value if I
>>> input a requested address? It would make more sense just to say that
>>> alignment is ignored if req_addr is set.
>>> Note that this was not really well tested beforehand, because there were
>>> no tests for a non-null req_addr.
>>>
>>>  - "assert(is_size_aligned(bytes, alignment), "Must be");"
>>>
>>>  This is even worse. Why should the size of the allocation be aligned
>>> to the alignment? Why can't I allocate a small chunk of memory at a larger
>>> alignment? I do not understand this limit at all.
>>>
>>>  But ok, I restored the asserts and had to limit the test accordingly
>>> to not to trigger those asserts.
>>>
>>>
>>>  You are probably right, and those alignment restrictions should
>>> probably have been put in the higher layers were it's needed. But, please,
>>> don't change this kind of things in an unrelated bug fix. It needs a
>>> separate RFE so that it gets enough scrutiny to make sure that we don't
>>> break something.
>>>
>>>
>>  I agree with you. It is out of scope for this change.
>>
>>  Kind Regards, Thomas
>>
>>
>>
>>>  Thanks,
>>> StefanK
>>>
>>>
>>>
>>>
>>>>  ---
>>>> +  size_t extra_size = ...
>>>> +    char* const end0 = start0 + bytes;
>>>> +    char* const end = start + extra_size;
>>>>
>>>> It's not obvious from the variable names what they represent, IMHO.
>>>> Could you try to give them more descriptive names?
>>>>
>>>>
>>>  Done.
>>>
>>>
>>>>  I'll take a look at the tests after next review round.
>>>>
>>>>
>>>  Ok, thanks for taking the time!
>>>
>>>  ..Thomas
>>>
>>>
>>>>  Thanks,
>>>> StefanK
>>>>
>>>>
>>>>
>>>>  Thanks!
>>>>
>>>>  ..Thomas
>>>>
>>>>
>>>> On Wed, Apr 15, 2015 at 1:55 PM, Stefan Karlsson <
>>>> stefan.karlsson at oracle.com> wrote:
>>>>
>>>>> Hi Thomas,
>>>>>
>>>>> Thanks for fixing this  bug.
>>>>>
>>>>> Some comments inlined:
>>>>>
>>>>> On 2015-04-09 17:08, Thomas Stüfe wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> please review this fix to huge page allocation on Linux.
>>>>>>
>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8077276
>>>>>> webrev:
>>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
>>>>>>
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html
>>>>>
>>>>> +  const int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC :
>>>>> PROT_READ|PROT_WRITE;
>>>>>     char* start;
>>>>>    if (req_addr != NULL) {
>>>>>      assert(is_ptr_aligned(req_addr, alignment), "Must be");
>>>>>      assert(is_size_aligned(bytes, alignment), "Must be");
>>>>> -    start = os::reserve_memory(bytes, req_addr);
>>>>> -    assert(start == NULL || start == req_addr, "Must be");
>>>>> +    // do NOT use MAP_FIXED here - do not trash existing mappings.
>>>>> +    start = (char*) ::mmap(req_addr, bytes, prot,
>>>>> +      MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>> +    if (start == MAP_FAILED) {
>>>>> +      start = NULL;
>>>>> +    }
>>>>> +    if (!is_ptr_aligned(start, alignment)) {
>>>>> +      ::munmap(start, bytes);
>>>>> +      start = NULL;
>>>>> +    }
>>>>>
>>>>>
>>>>> Previously, the small page area was mapped with PROT_NONE, but now you
>>>>> change it to RW. Is there a reason why you changed that?
>>>>>
>>>>>
>>>>>    result = ::mmap(lp_start, lp_bytes, prot,
>>>>>                    MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_HUGETLB,
>>>>>                    -1, 0);
>>>>>    if (result == MAP_FAILED) {
>>>>> -    warn_on_large_pages_failure(req_addr, bytes, errno);
>>>>> +    warn_on_large_pages_failure(start, bytes, errno);
>>>>>
>>>>> Shouldn't we report lp_start instead of start here?
>>>>>
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp
>>>>>
>>>>> There are a number of things that I would like to change with the the
>>>>> test.
>>>>>
>>>>> Can you use mapping == NULL in null checks instead of just if
>>>>> (mapping). The same for the asserts.
>>>>>
>>>>> I think the test will be easier to understand if you get rid of the
>>>>> for loop and extract code into helper functions, or do a small amount of
>>>>> code duplication:
>>>>>
>>>>>     // do the test twice: once with a req_addr=NULL, once with a
>>>>> non-NULL
>>>>>     // req_addr; the latter shall test if we trash existing mappings
>>>>>     for (int run = 0; run < 2; run ++) {
>>>>>
>>>>> I think you should fail or exit the test if you can't get the expected
>>>>> mapping, instead of continuing and have :
>>>>>
>>>>>       if (addr) {
>>>>>         test_log("... mapped [" PTR_FORMAT "-" PTR_FORMAT ").", addr,
>>>>> addr + size);
>>>>>       } else {
>>>>>         test_log("... failed.");
>>>>>       }
>>>>>
>>>>>       if (addr != NULL) {
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>>
>>>>>
>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() first
>>>>>> establishes a
>>>>>> mapping with small pages over the whole requested range, then
>>>>>> exchanges the
>>>>>> parts of the mapping which are aligned to large page size with large
>>>>>> pages.
>>>>>>
>>>>>> Now, when establishing the first mapping, it uses
>>>>>> os::reserve_memory() with
>>>>>> a potentially non-null req_addr. In that case, os::reserve_memory()
>>>>>> will do
>>>>>> a mmap(MAP_FIXED).
>>>>>>
>>>>>> This will trash any pre-existing mappings.
>>>>>>
>>>>>> Note that I could not willingly reproduce the error with an
>>>>>> unmodified VM.
>>>>>> But I added a reproduction case which demonstrated that if one were
>>>>>> to call
>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() with a non-NULL
>>>>>> req_addr, existing mappings would be trashed. Depending on where we
>>>>>> are in
>>>>>> address space, we also would overwrite libc structures and crash
>>>>>> immediately.
>>>>>>
>>>>>> The repro case is part of the change, see changes in
>>>>>> test_reserve_memory_special_huge_tlbfs_mixed(), and can be executed
>>>>>> with:
>>>>>>
>>>>>> ./images/jdk/bin/java -XX:+UseLargePages -XX:+UseHugeTLBFS -XX:-UseSHM
>>>>>> -XX:+ExecuteInternalVMTests -XX:+VerboseInternalVMTests
>>>>>>
>>>>>> The fix: instead of using os::reserve_memory(),
>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed() now calls mmap()
>>>>>> directly with the non-NULL req_addr, but without MAP_FIXED. This
>>>>>> means the
>>>>>> OS may do its best to allocate at req_addr, but will not trash any
>>>>>> mappings. This also follows the pattern in
>>>>>> os::Linux::reserve_memory_special_huge_tlbfs_only(), which is a sister
>>>>>> function of os::Linux::reserve_memory_special_huge_tlbfs_mixed().
>>>>>>
>>>>>> Note also discussion at:
>>>>>>
>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html
>>>>>> .
>>>>>>
>>>>>> Thanks for reviewing!
>>>>>>
>>>>>> Kind Regards, Thomas
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>