RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)

Stefan Karlsson stefan.karlsson at oracle.com
Thu Apr 23 20:35:39 UTC 2015


On 2015-04-23 21:53, Stefan Karlsson wrote:
> Hi Thomas,
>
> On 2015-04-23 18:09, Thomas Stüfe wrote:
>> Hi Stefan,
>>
>> new webrev: 
>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.03/webrev/ 
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.03/webrev/>
>
> I think I've realized that we have different views on what the API for 
> the reserve function/code should be.
>
> The code you have written has two more or less mutual exclusive 
> execution modes:
> 1) req_addr != NULL => ignore the alignment
> 2) req_addr == NULL => adhere to the alignment
>
> this reflects in the code, the name of the new function and in the 
> comments.
>
> If we take the previous implementation of the code, before your patches:
> 3482   if (req_addr != NULL) {
> 3483     assert(is_ptr_aligned(req_addr, alignment), "Must be");
> 3484     assert(is_size_aligned(bytes, alignment), "Must be");
> 3485     start = os::reserve_memory(bytes, req_addr);
> 3486     assert(start == NULL || start == req_addr, "Must be");
> 3487   } else {
> 3488     start = os::reserve_memory_aligned(bytes, alignment);
> 3489   }
> 3490
> 3491   if (start == NULL) {
> 3492     return NULL;
> 3493   }
> 3494
> 3495   assert(is_ptr_aligned(start, alignment), "Must be");
>
>
> If we would have extracted that into a function then the API would 
> have been:
> INPUT:
> alignment - requested alignment
> req_addr - requested address (must be 'alignment' aligned)
> bytes - the size of the memory chunk to reserve (must be [arguably] 
> 'alignment' aligned)
>
> OUTPUT:
> address - a memory chunk that is 'alignment' aligned and 'bytes' 
> large, that starts at 'req_addr' if 'req_addr' is given.  Or NULL at 
> failures.
>
> Your current code is written to allow for a broader use-case that the 
> current code doesn't need, where the req_addr doesn't have to be 
> 'alignment' aligned. I would much prefer if you kept the INPUT and 
> OUPUT requirements above and made sure that the function implemented 
> those. And _if_ we need to alter the current restrictions, we go back 
> and do it as a separate change.
>
> The code code be written as (not tested with a compiler):
>
> static char* anon_mmap_aligned(size_t bytes, size_t alignment, char* 
> req_addr) {
>   // Input requirements.
>   assert(is_ptr_aligned(req_addr, alignment), "Must be");
>   assert(is_size_aligned(bytes, alignment), "Must be");
>
>   size_t extra_size = bytes;
>   if (alignment > 0) {
>     extra_size += alignment;
>   }
>
>   char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
>                                -1, 0);
>   if (start == MAP_FAILED) {
>     return NULL;
>   }
>
>   if (start != req_addr) {
>     ::munmap(start, extra_size);
>     return NULL;
>   }
>
>   if (alignment > 0) {
>     char* const start_aligned = (char*) align_ptr_up(start, alignment);
>     char* const end_aligned = start_aligned + bytes;
>     char* const end = start + extra_size;
>     if (start_aligned > start) {
>       ::munmap(start, start_aligned - start);
>     }
>     if (end_aligned < end) {
>       ::munmap(end_aligned, end - end_aligned);
>     }
>     start = start_aligned;
>   }
>
>   // Output requirements.
>   assert(req_addr == NULL || start == req_addr, "Must be");
>   assert(is_ptr_aligned(start, alignment), "Must be");
>   return start;
> }
>
> and we could even get rid of the alignment > 0 checks, if we want to:
>
>
> static char* anon_map_aligned(size_t bytes, size_t alignment, char* 
> req_addr) {
>   // Input requirements.
>   assert(is_ptr_aligned(req_addr, alignment), "Must be");
>   assert(is_size_aligned(bytes, alignment), "Must be");
>
>   size_t extra_size = bytes + alignment;
>
>   char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
>                                -1, 0);
>   if (start == MAP_FAILED) {
>     return NULL;
>   }
>
>   if (start != req_addr) {
>     ::munmap(start, extra_size);
>     return NULL;
>   }
>
>   // Potentially adjust to get an aligned mmapping.
>   char* const start_aligned = (char*) align_ptr_up(start, alignment);
>   char* const end_aligned = start_aligned + bytes;
>   char* const end = start + extra_size;
>   if (start_aligned > start) {
>     ::munmap(start, start_aligned - start);
>   }
>   if (end_aligned < end) {
>     ::munmap(end_aligned, end - end_aligned);
>   }
>   start = start_aligned;
>
>   // Output requirements.
>   assert(req_addr == NULL || start == req_addr, "Must be");
>   assert(is_ptr_aligned(start, alignment), "Must be");
>   return start;
> }

Well, we can't use is_size_aligned and align_ptr_up with alignment == 0. 
On the other hand, we never call these functions with an alignment less 
than a page.

StefanK
>
> And as a side-note, even with these restrictions, we're not far from 
> implementing anon_mmap as:
>
> static char* anon_mmap(char* requested_addr, size_t bytes, bool fixed) {
>   return anon_mmap_aligned(bytes, 0, requested_addr);
> }
>
> we just need to cater for the fixed flag. (The input parameters are in 
> different order in the current patch. That should probably be fixed.).
>
> ---
>
> Smaller style issues:
>
> - Don't NULL-check with if (p), use if (p != NULL)
> - i ++ should be written as i++
>
> Thanks,
> StefanK
>
>> Best Regards, Thomas
>>
>>
>> On Thu, Apr 23, 2015 at 4:16 PM, Stefan Karlsson 
>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>
>>     On 2015-04-23 14:55, Thomas Stüfe wrote:
>>>     Hi Stefan,
>>>
>>>     lets talk a bit more to get an understanding before I change the
>>>     webrev again:
>>>
>>>     On Thu, Apr 23, 2015 at 10:48 AM, Stefan Karlsson
>>>     <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>>>     wrote:
>>>
>>>
>>>
>>>         On 2015-04-22 17:20, Thomas Stüfe wrote:
>>>>         Hi Stefan,
>>>>
>>>>         new webrev:
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.02/webrev/
>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.02/webrev/>
>>>>
>>>>         See my answers inline.
>>>>
>>>>         On Wed, Apr 22, 2015 at 2:20 PM, Stefan Karlsson
>>>>         <stefan.karlsson at oracle.com
>>>>         <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>
>>>>             Hi Thomas,
>>>>
>>>>             On 2015-04-21 18:31, Thomas Stüfe wrote:
>>>>>             Hi Stefan,
>>>>>
>>>>>             thanks for reviewing!
>>>>>
>>>>>             Here is the new webrev:
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/>
>>>>>
>>>>>             I worked in all your remarks except one: the test
>>>>>             unfortunately cannot just assert if allocation fails,
>>>>>             because that may happen sometimes, especially with
>>>>>             small allocation sizes
>>>>> (os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>>             will refuse to allocate if the area cannot accomodate
>>>>>             one large page). The test also did not assert prior to
>>>>>             my change, so I did not want to change that behaviour.
>>>>>
>>>>>             I changed some more things:
>>>>>
>>>>>             1) Before my change,
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>>             would call os::reserve_memory() or
>>>>>             os::reserve_memory_aligned(), depending on whether
>>>>>             req_addr was NULL or not. That was a bit unclean (a
>>>>>             lower-level API calling a high-level API) and led to
>>>>>             the weird workaround at os_linux.cpp:3497, where the
>>>>>             NMT Memory tracker had to deregister a allocation which
>>>>>             it expected to be registered again later at an upper
>>>>>             level. Yikes.
>>>>>
>>>>>             Because I changed the first case to use low-level mmap,
>>>>>             I also changed the second - replaced
>>>>>             os::reserve_memory_aligned with low level coding.
>>>>>             Actually I merged both branches together as good as
>>>>>             possible, so the new code (os_linux.cpp:3484 ff) now
>>>>>             handles alignment as well: If req_addr is set,
>>>>>             allocation is attempted at req_addr and alignment is
>>>>>             ignored. If req_addr is NULL, alignment is honored.
>>>>>             This way I could get rid of the workaround for the NMT
>>>>>             completely.
>>>>>
>>>>>             2) I completely rewrote the test function. It now tests
>>>>>             three scenarios, see comment. I hope the code is clear
>>>>>             and easy to understand.
>>>>
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html
>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html>
>>>>
>>>>             ---
>>>>             +  start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
>>>>             + MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>             +  if (start == MAP_FAILED) {
>>>>                  return NULL;
>>>>                }
>>>>
>>>>             You've changed the behavior of this function. It used to
>>>>             return NULL if reserve_memory couldn't reserve memory at
>>>>             the requested address. Your proposed patch can return
>>>>             another address. This will be a problem for the upper
>>>>             layers when using compressed oops. I think it would be
>>>>             better if you keep the old behavior.
>>>>
>>>>
>>>>         Arguably the new behaviour is better: The code which
>>>>         allocates the heap for compressed Oops actually deals with
>>>>         this case quite nicely by doing a range check for the
>>>>         address, not an identity check (see virtualspace.cpp
>>>>         "ReservedHeapSpace::try_reserve_range()"). The other code
>>>>         path in "ReservedSpace::initialize()" at least handles this
>>>>         case (if returned address is unequal to requested address,
>>>>         releases memory again). So if the OS cooperates by returning
>>>>         an address in the vicinity of the requested address, it
>>>>         would actually be better to use this address if it fits
>>>>         instead of retrying the allocation.
>>>>
>>>>         However, I am not completely sure about other callers of
>>>>         os::reserve_memory_special(), and changing the behaviour
>>>>         would be beyond the scope of my change anyway, so I restored
>>>>         the old behaviour as you wished. Now we either return the
>>>>         requested address or NULL if a requested address was given.
>>>
>>>         Sounds good to me.
>>>
>>>
>>>     Ok, I'll keep it that way.
>>>
>>>>
>>>>
>>>>             ---
>>>>             Instead of inlining the reserve memory code into
>>>>             reserve_memory_special_huge_tlbfs_mixed, could you
>>>>             extract it into function? So that we have one function
>>>>             that's responsible for reservinb PROT_NONE memory, and
>>>>             one that commits the small, large, small pages.
>>>>
>>>>
>>>>         Ok, did this. Because there is already a anon_mmap(), which
>>>>         fits the req_addr != NULL case, I just reused this and added
>>>>         a sister function "anon_mmap_aligned" for the alignment case.
>>>
>>>         I didn't try to argue that you should split up the code, but
>>>         rather move the entire reserve code that you wrote for
>>>         webrev.01, and move it into a separate function. The code
>>>         would then look like:
>>>
>>>         char* start = reserve_memory_something(bytes, req_addr,
>>>         alignment);
>>>         if (start == NULL) {
>>>           return NULL;
>>>         }
>>>
>>>         and the reader of reserve_memory_special_huge_tlbfs_mixed
>>>         could simply ignore the details of reserve_memory_something,
>>>         unless they were important for some reason. That way you
>>>         could also write separate tests for reserve_memory_something
>>>         (or whatever it would be called).
>>>
>>>         ---
>>>         You are now using anon_mmap which has the comment:
>>>
>>>         3070     // anon_mmap() should only get called during VM
>>>         initialization,
>>>         3071     // don't need lock (actually we can skip locking
>>>         even it can be called
>>>         3072     // from multiple threads, because
>>>         _highest_vm_reserved_address is just a
>>>         3073     // hint about the upper limit of non-stack memory
>>>         regions.)
>>>         3074     if ((address)addr + bytes >
>>>         _highest_vm_reserved_address) {
>>>         3075       _highest_vm_reserved_address = (address)addr + 
>>> bytes;
>>>         3076     }
>>>
>>>         So, it's not entirely clear to me that you should be using
>>>         this function. I don't know what _highest_vm_reserved_address
>>>         is used for. Your new function, anon_mmap_aligned isn't
>>>         updating this variable.
>>>
>>>
>>>     The comment is obsolete and does not reflect reality; anon_mmap
>>>     is used for os::reserve_memory(), which in turn can be used at
>>>     any time in the life of the VM.
>>>
>>>     The comment - and the associated variable
>>>     "_highest_vm_reserved_address" - only affects the LinuxThreads
>>>     implementation which is legacy and should be removed (I just did
>>>     ask this question on hotspot-dev and will do the cleanup if I get
>>>     green light).
>>>
>>>     It is also broken since forever, since a lot of allocations
>>>     happen in VM and JDK where _highest_vm_reserved_address is just
>>>     not updated.
>>
>>     OK. It would be good to get rid of it at some point then.
>>
>>>
>>>         Could we go back and use the reserve code you had in
>>>         webrev.01, so that we don't have to consider the impact of 
>>> this?
>>>
>>>
>>>     It actually changes less than my last webrev:
>>>
>>>     - old code was using os::reserve_memory() and
>>>     os::reserve_memory_aligned().
>>>        os::reserve_memory() directly called anon_mmap().
>>>        os::reserve_memory_aligned() called os::reserve_memory(),
>>>     which directly called anon_mmap().
>>>
>>>     - the new code:
>>>         for the req_addr != NULL, calls anon_mmap directly, only
>>>     difference to before is that we now do not specify MAP_FIXED.
>>>         for the req_addr==NULL case, it calls the new function
>>>     anon_mmap_aligned(), which did not exist before, so no
>>>     compatibility issues can arise.
>>>
>>>     I can live with both my webrev.01 proposal and this one, but the
>>>     solution to factor out all the allocation to a single function
>>>     would be my least favorite choice: I do not really like factoring
>>>     out a function which only exists to be called from a single other
>>>     function, and which is too awkward to be used any other way.
>>
>>     The reason for moving it out to a function is to get better
>>     "separation of concerns" and hide the implementation details, it
>>     doesn't really matter if it's used only once.
>>
>>>     And your "reserved_memory_something()"  would be quite awkward:
>>>     bundling mmap allocation for the req_addr case (which duplicates
>>>     what anon_mmap() does), plus adding support for allocating at an
>>>     alignment. It does not easily fit together imho.
>>
>>     But you almost already had the "awkward" code in webrev.01, and it
>>     was quite easy to understand. With the requirement that req_addr
>>     is aligned if an alignment is passed in, then the code would be
>>     even easier. It could probably be used to implement anon_mmap with
>>     just a few small changes. And by doing that we would get rid of
>>     some code duplication as well. Not sure it worth it though.
>>
>>>
>>>     Of course, it is all a matter of taste, and if you insist, I'll
>>>     do it. Most important is to fix the bug.
>>
>>     OK. Then, please, extract it out to a separate function so that
>>     reserve_memory_special_huge_tlbfs_mixed only have one call out to
>>     a function that reserves the memory. It's OK if you want to
>>     implement it as a anon_mmap/anon_mmap_aligned pair or inline the
>>     code (as in webrev.01), as long as I don't have to see the code in
>>     reserve_memory_special_huge_tlbfs_mixed.
>>
>>     Thanks,
>>     StefanK
>>
>>
>>>
>>>>             ---
>>>>             -  size_t large_page_size = os::large_page_size();
>>>>             +  char* start;
>>>>
>>>>             +  size_t large_page_size = os::large_page_size();
>>>>
>>>>             There's no need anymore to declare 'start' before it's
>>>>             set. So this should be incorporated into:
>>>>
>>>>             3489   start = (char*) ::mmap(req_addr, extra_size,
>>>>             PROT_NONE,
>>>>             3490 MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>
>>>>             ---
>>>>             3480   // first reserve - but not commit - the address
>>>>             range in small pages.
>>>>             ...
>>>>             3529   // Commit small-paged leading area
>>>>
>>>>             Could you start your comments with a capital and with a
>>>>             period?
>>>>
>>>>
>>>>         Done. Note that there are comments where the first word is a
>>>>         parameter name - if that name is lowercase, I would rather
>>>>         not write it upper case.
>>>
>>>         3082 // reserve small-paged memory, optionally aligned to
>>>         given boundary.
>>>         3083 // alignment must be a multiple of allocation
>>>         granularity. bytes must be
>>>
>>>         and some in the test as well.
>>>
>>>
>>>     Okay, will change.
>>>
>>>>
>>>>             ---
>>>>             3481   // Note: if a wish address was given
>>>>
>>>>             There's no other reference to a "wish address". Could
>>>>             you reword it to be req_addr or "requested address"?
>>>>
>>>>
>>>>         Done.
>>>>
>>>>
>>>>             ---
>>>>             3481   // Note: if a wish address was given, we still do
>>>>             not use MAP_FIXED - do not want to trash
>>>>             3482   // existing mappings. We hand down req_addr to
>>>>             mmap and hope the OS does its best.
>>>>
>>>>             Could you move this comment down to the mmap call?
>>>>
>>>>
>>>>         Done.
>>>>
>>>>
>>>>             ---
>>>>             3529   // Commit small-paged leading area
>>>>             ...
>>>>             3540   // Commit large paged area
>>>>
>>>>             Shouldn't it say large-paged as well?
>>>>
>>>>
>>>>         Done.
>>>>
>>>>
>>>>             ---
>>>>             3483   // Note2: if wish address was given, alignment is
>>>>             ignored
>>>>
>>>>             Previously, we asserted that our in-parameters were
>>>>             aligned. Why did you replace that with a comment?
>>>>
>>>>
>>>>         Ok, I restored the asserts, but I really am not happy with
>>>>         that. I find the asserts way too strict and they limit the
>>>>         usefulness of the functions:
>>>>
>>>>         - "assert(is_ptr_aligned(req_addr, alignment), "Must be");"
>>>>
>>>>         Why must this be? Why do I even have to input an alignment
>>>>         value if I input a requested address? It would make more
>>>>         sense just to say that alignment is ignored if req_addr is 
>>>> set.
>>>>         Note that this was not really well tested beforehand,
>>>>         because there were no tests for a non-null req_addr.
>>>>
>>>>         - "assert(is_size_aligned(bytes, alignment), "Must be");"
>>>>
>>>>         This is even worse. Why should the size of the allocation be
>>>>         aligned to the alignment? Why can't I allocate a small chunk
>>>>         of memory at a larger alignment? I do not understand this
>>>>         limit at all.
>>>>
>>>>         But ok, I restored the asserts and had to limit the test
>>>>         accordingly to not to trigger those asserts.
>>>
>>>         You are probably right, and those alignment restrictions
>>>         should probably have been put in the higher layers were it's
>>>         needed. But, please, don't change this kind of things in an
>>>         unrelated bug fix. It needs a separate RFE so that it gets
>>>         enough scrutiny to make sure that we don't break something.
>>>
>>>
>>>     I agree with you. It is out of scope for this change.
>>>
>>>     Kind Regards, Thomas
>>>
>>>         Thanks,
>>>         StefanK
>>>
>>>
>>>>             ---
>>>>             +  size_t extra_size = ...
>>>>             +    char* const end0 = start0 + bytes;
>>>>             +    char* const end = start + extra_size;
>>>>
>>>>             It's not obvious from the variable names what they
>>>>             represent, IMHO. Could you try to give them more
>>>>             descriptive names?
>>>>
>>>>
>>>>         Done.
>>>>
>>>>             I'll take a look at the tests after next review round.
>>>>
>>>>
>>>>         Ok, thanks for taking the time!
>>>>
>>>>         ..Thomas
>>>>
>>>>             Thanks,
>>>>             StefanK
>>>>
>>>>
>>>>>
>>>>>             Thanks!
>>>>>
>>>>>             ..Thomas
>>>>>
>>>>>
>>>>>             On Wed, Apr 15, 2015 at 1:55 PM, Stefan Karlsson
>>>>>             <stefan.karlsson at oracle.com
>>>>>             <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>>
>>>>>                 Hi Thomas,
>>>>>
>>>>>                 Thanks for fixing this bug.
>>>>>
>>>>>                 Some comments inlined:
>>>>>
>>>>>                 On 2015-04-09 17:08, Thomas Stüfe wrote:
>>>>>
>>>>>                     Hi all,
>>>>>
>>>>>                     please review this fix to huge page allocation
>>>>>                     on Linux.
>>>>>
>>>>>                     bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8077276
>>>>>                     webrev:
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/>
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html>
>>>>>
>>>>>                 +  const int prot = exec ?
>>>>>                 PROT_READ|PROT_WRITE|PROT_EXEC : 
>>>>> PROT_READ|PROT_WRITE;
>>>>>                     char* start;
>>>>>                    if (req_addr != NULL) {
>>>>>                  assert(is_ptr_aligned(req_addr, alignment), "Must
>>>>>                 be");
>>>>>                  assert(is_size_aligned(bytes, alignment), "Must 
>>>>> be");
>>>>>                 -    start = os::reserve_memory(bytes, req_addr);
>>>>>                 - assert(start == NULL || start == req_addr, "Must
>>>>>                 be");
>>>>>                 +    // do NOT use MAP_FIXED here - do not trash
>>>>>                 existing mappings.
>>>>>                 +    start = (char*) ::mmap(req_addr, bytes, prot,
>>>>>                 + MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>>                 +    if (start == MAP_FAILED) {
>>>>>                 +      start = NULL;
>>>>>                 +    }
>>>>>                 +    if (!is_ptr_aligned(start, alignment)) {
>>>>>                 + ::munmap(start, bytes);
>>>>>                 +      start = NULL;
>>>>>                 +    }
>>>>>
>>>>>
>>>>>                 Previously, the small page area was mapped with
>>>>>                 PROT_NONE, but now you change it to RW. Is there a
>>>>>                 reason why you changed that?
>>>>>
>>>>>
>>>>>                    result = ::mmap(lp_start, lp_bytes, prot,
>>>>> MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_HUGETLB,
>>>>>                      -1, 0);
>>>>>                    if (result == MAP_FAILED) {
>>>>>                 - warn_on_large_pages_failure(req_addr, bytes, 
>>>>> errno);
>>>>>                 + warn_on_large_pages_failure(start, bytes, errno);
>>>>>
>>>>>                 Shouldn't we report lp_start instead of start here?
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp>
>>>>>
>>>>>                 There are a number of things that I would like to
>>>>>                 change with the the test.
>>>>>
>>>>>                 Can you use mapping == NULL in null checks instead
>>>>>                 of just if (mapping). The same for the asserts.
>>>>>
>>>>>                 I think the test will be easier to understand if
>>>>>                 you get rid of the for loop and extract code into
>>>>>                 helper functions, or do a small amount of code
>>>>>                 duplication:
>>>>>
>>>>>                     // do the test twice: once with a
>>>>>                 req_addr=NULL, once with a non-NULL
>>>>>                     // req_addr; the latter shall test if we trash
>>>>>                 existing mappings
>>>>>                     for (int run = 0; run < 2; run ++) {
>>>>>
>>>>>                 I think you should fail or exit the test if you
>>>>>                 can't get the expected mapping, instead of
>>>>>                 continuing and have :
>>>>>
>>>>>                       if (addr) {
>>>>>                 test_log("... mapped [" PTR_FORMAT "-" PTR_FORMAT
>>>>>                 ").", addr, addr + size);
>>>>>                       } else {
>>>>>                 test_log("... failed.");
>>>>>                       }
>>>>>
>>>>>                       if (addr != NULL) {
>>>>>
>>>>>                 Thanks,
>>>>>                 StefanK
>>>>>
>>>>>
>>>>>
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>>                     first establishes a
>>>>>                     mapping with small pages over the whole
>>>>>                     requested range, then exchanges the
>>>>>                     parts of the mapping which are aligned to large
>>>>>                     page size with large pages.
>>>>>
>>>>>                     Now, when establishing the first mapping, it
>>>>>                     uses os::reserve_memory() with
>>>>>                     a potentially non-null req_addr. In that case,
>>>>>                     os::reserve_memory() will do
>>>>>                     a mmap(MAP_FIXED).
>>>>>
>>>>>                     This will trash any pre-existing mappings.
>>>>>
>>>>>                     Note that I could not willingly reproduce the
>>>>>                     error with an unmodified VM.
>>>>>                     But I added a reproduction case which
>>>>>                     demonstrated that if one were to call
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>>                     with a non-NULL
>>>>>                     req_addr, existing mappings would be trashed.
>>>>>                     Depending on where we are in
>>>>>                     address space, we also would overwrite libc
>>>>>                     structures and crash
>>>>>                     immediately.
>>>>>
>>>>>                     The repro case is part of the change, see
>>>>>                     changes in
>>>>> test_reserve_memory_special_huge_tlbfs_mixed(),
>>>>>                     and can be executed with:
>>>>>
>>>>>                     ./images/jdk/bin/java -XX:+UseLargePages
>>>>>                     -XX:+UseHugeTLBFS -XX:-UseSHM
>>>>>                     -XX:+ExecuteInternalVMTests
>>>>>                     -XX:+VerboseInternalVMTests
>>>>>
>>>>>                     The fix: instead of using os::reserve_memory(),
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>>                     now calls mmap()
>>>>>                     directly with the non-NULL req_addr, but
>>>>>                     without MAP_FIXED. This means the
>>>>>                     OS may do its best to allocate at req_addr, but
>>>>>                     will not trash any
>>>>>                     mappings. This also follows the pattern in
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_only(),
>>>>>                     which is a sister
>>>>>                     function of
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed().
>>>>>
>>>>>                     Note also discussion at:
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html.
>>>>>
>>>>>                     Thanks for reviewing!
>>>>>
>>>>>                     Kind Regards, Thomas
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>



More information about the hotspot-runtime-dev mailing list