RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)
Stefan Karlsson
stefan.karlsson at oracle.com
Thu Apr 23 20:35:39 UTC 2015
On 2015-04-23 21:53, Stefan Karlsson wrote:
> Hi Thomas,
>
> On 2015-04-23 18:09, Thomas Stüfe wrote:
>> Hi Stefan,
>>
>> new webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.03/webrev/
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.03/webrev/>
>
> I think I've realized that we have different views on what the API for
> the reserve function/code should be.
>
> The code you have written has two more or less mutual exclusive
> execution modes:
> 1) req_addr != NULL => ignore the alignment
> 2) req_addr == NULL => adhere to the alignment
>
> this reflects in the code, the name of the new function and in the
> comments.
>
> If we take the previous implementation of the code, before your patches:
> 3482 if (req_addr != NULL) {
> 3483 assert(is_ptr_aligned(req_addr, alignment), "Must be");
> 3484 assert(is_size_aligned(bytes, alignment), "Must be");
> 3485 start = os::reserve_memory(bytes, req_addr);
> 3486 assert(start == NULL || start == req_addr, "Must be");
> 3487 } else {
> 3488 start = os::reserve_memory_aligned(bytes, alignment);
> 3489 }
> 3490
> 3491 if (start == NULL) {
> 3492 return NULL;
> 3493 }
> 3494
> 3495 assert(is_ptr_aligned(start, alignment), "Must be");
>
>
> If we would have extracted that into a function then the API would
> have been:
> INPUT:
> alignment - requested alignment
> req_addr - requested address (must be 'alignment' aligned)
> bytes - the size of the memory chunk to reserve (must be [arguably]
> 'alignment' aligned)
>
> OUTPUT:
> address - a memory chunk that is 'alignment' aligned and 'bytes'
> large, that starts at 'req_addr' if 'req_addr' is given. Or NULL at
> failures.
>
> Your current code is written to allow for a broader use-case that the
> current code doesn't need, where the req_addr doesn't have to be
> 'alignment' aligned. I would much prefer if you kept the INPUT and
> OUPUT requirements above and made sure that the function implemented
> those. And _if_ we need to alter the current restrictions, we go back
> and do it as a separate change.
>
> The code code be written as (not tested with a compiler):
>
> static char* anon_mmap_aligned(size_t bytes, size_t alignment, char*
> req_addr) {
> // Input requirements.
> assert(is_ptr_aligned(req_addr, alignment), "Must be");
> assert(is_size_aligned(bytes, alignment), "Must be");
>
> size_t extra_size = bytes;
> if (alignment > 0) {
> extra_size += alignment;
> }
>
> char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0);
> if (start == MAP_FAILED) {
> return NULL;
> }
>
> if (start != req_addr) {
> ::munmap(start, extra_size);
> return NULL;
> }
>
> if (alignment > 0) {
> char* const start_aligned = (char*) align_ptr_up(start, alignment);
> char* const end_aligned = start_aligned + bytes;
> char* const end = start + extra_size;
> if (start_aligned > start) {
> ::munmap(start, start_aligned - start);
> }
> if (end_aligned < end) {
> ::munmap(end_aligned, end - end_aligned);
> }
> start = start_aligned;
> }
>
> // Output requirements.
> assert(req_addr == NULL || start == req_addr, "Must be");
> assert(is_ptr_aligned(start, alignment), "Must be");
> return start;
> }
>
> and we could even get rid of the alignment > 0 checks, if we want to:
>
>
> static char* anon_map_aligned(size_t bytes, size_t alignment, char*
> req_addr) {
> // Input requirements.
> assert(is_ptr_aligned(req_addr, alignment), "Must be");
> assert(is_size_aligned(bytes, alignment), "Must be");
>
> size_t extra_size = bytes + alignment;
>
> char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0);
> if (start == MAP_FAILED) {
> return NULL;
> }
>
> if (start != req_addr) {
> ::munmap(start, extra_size);
> return NULL;
> }
>
> // Potentially adjust to get an aligned mmapping.
> char* const start_aligned = (char*) align_ptr_up(start, alignment);
> char* const end_aligned = start_aligned + bytes;
> char* const end = start + extra_size;
> if (start_aligned > start) {
> ::munmap(start, start_aligned - start);
> }
> if (end_aligned < end) {
> ::munmap(end_aligned, end - end_aligned);
> }
> start = start_aligned;
>
> // Output requirements.
> assert(req_addr == NULL || start == req_addr, "Must be");
> assert(is_ptr_aligned(start, alignment), "Must be");
> return start;
> }
Well, we can't use is_size_aligned and align_ptr_up with alignment == 0.
On the other hand, we never call these functions with an alignment less
than a page.
StefanK
>
> And as a side-note, even with these restrictions, we're not far from
> implementing anon_mmap as:
>
> static char* anon_mmap(char* requested_addr, size_t bytes, bool fixed) {
> return anon_mmap_aligned(bytes, 0, requested_addr);
> }
>
> we just need to cater for the fixed flag. (The input parameters are in
> different order in the current patch. That should probably be fixed.).
>
> ---
>
> Smaller style issues:
>
> - Don't NULL-check with if (p), use if (p != NULL)
> - i ++ should be written as i++
>
> Thanks,
> StefanK
>
>> Best Regards, Thomas
>>
>>
>> On Thu, Apr 23, 2015 at 4:16 PM, Stefan Karlsson
>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>>
>> Hi Thomas,
>>
>>
>> On 2015-04-23 14:55, Thomas Stüfe wrote:
>>> Hi Stefan,
>>>
>>> lets talk a bit more to get an understanding before I change the
>>> webrev again:
>>>
>>> On Thu, Apr 23, 2015 at 10:48 AM, Stefan Karlsson
>>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>>> wrote:
>>>
>>>
>>>
>>> On 2015-04-22 17:20, Thomas Stüfe wrote:
>>>> Hi Stefan,
>>>>
>>>> new webrev:
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.02/webrev/
>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.02/webrev/>
>>>>
>>>> See my answers inline.
>>>>
>>>> On Wed, Apr 22, 2015 at 2:20 PM, Stefan Karlsson
>>>> <stefan.karlsson at oracle.com
>>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>
>>>> Hi Thomas,
>>>>
>>>> On 2015-04-21 18:31, Thomas Stüfe wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> thanks for reviewing!
>>>>>
>>>>> Here is the new webrev:
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/>
>>>>>
>>>>> I worked in all your remarks except one: the test
>>>>> unfortunately cannot just assert if allocation fails,
>>>>> because that may happen sometimes, especially with
>>>>> small allocation sizes
>>>>> (os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> will refuse to allocate if the area cannot accomodate
>>>>> one large page). The test also did not assert prior to
>>>>> my change, so I did not want to change that behaviour.
>>>>>
>>>>> I changed some more things:
>>>>>
>>>>> 1) Before my change,
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> would call os::reserve_memory() or
>>>>> os::reserve_memory_aligned(), depending on whether
>>>>> req_addr was NULL or not. That was a bit unclean (a
>>>>> lower-level API calling a high-level API) and led to
>>>>> the weird workaround at os_linux.cpp:3497, where the
>>>>> NMT Memory tracker had to deregister a allocation which
>>>>> it expected to be registered again later at an upper
>>>>> level. Yikes.
>>>>>
>>>>> Because I changed the first case to use low-level mmap,
>>>>> I also changed the second - replaced
>>>>> os::reserve_memory_aligned with low level coding.
>>>>> Actually I merged both branches together as good as
>>>>> possible, so the new code (os_linux.cpp:3484 ff) now
>>>>> handles alignment as well: If req_addr is set,
>>>>> allocation is attempted at req_addr and alignment is
>>>>> ignored. If req_addr is NULL, alignment is honored.
>>>>> This way I could get rid of the workaround for the NMT
>>>>> completely.
>>>>>
>>>>> 2) I completely rewrote the test function. It now tests
>>>>> three scenarios, see comment. I hope the code is clear
>>>>> and easy to understand.
>>>>
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html
>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html>
>>>>
>>>> ---
>>>> + start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
>>>> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>> + if (start == MAP_FAILED) {
>>>> return NULL;
>>>> }
>>>>
>>>> You've changed the behavior of this function. It used to
>>>> return NULL if reserve_memory couldn't reserve memory at
>>>> the requested address. Your proposed patch can return
>>>> another address. This will be a problem for the upper
>>>> layers when using compressed oops. I think it would be
>>>> better if you keep the old behavior.
>>>>
>>>>
>>>> Arguably the new behaviour is better: The code which
>>>> allocates the heap for compressed Oops actually deals with
>>>> this case quite nicely by doing a range check for the
>>>> address, not an identity check (see virtualspace.cpp
>>>> "ReservedHeapSpace::try_reserve_range()"). The other code
>>>> path in "ReservedSpace::initialize()" at least handles this
>>>> case (if returned address is unequal to requested address,
>>>> releases memory again). So if the OS cooperates by returning
>>>> an address in the vicinity of the requested address, it
>>>> would actually be better to use this address if it fits
>>>> instead of retrying the allocation.
>>>>
>>>> However, I am not completely sure about other callers of
>>>> os::reserve_memory_special(), and changing the behaviour
>>>> would be beyond the scope of my change anyway, so I restored
>>>> the old behaviour as you wished. Now we either return the
>>>> requested address or NULL if a requested address was given.
>>>
>>> Sounds good to me.
>>>
>>>
>>> Ok, I'll keep it that way.
>>>
>>>>
>>>>
>>>> ---
>>>> Instead of inlining the reserve memory code into
>>>> reserve_memory_special_huge_tlbfs_mixed, could you
>>>> extract it into function? So that we have one function
>>>> that's responsible for reservinb PROT_NONE memory, and
>>>> one that commits the small, large, small pages.
>>>>
>>>>
>>>> Ok, did this. Because there is already a anon_mmap(), which
>>>> fits the req_addr != NULL case, I just reused this and added
>>>> a sister function "anon_mmap_aligned" for the alignment case.
>>>
>>> I didn't try to argue that you should split up the code, but
>>> rather move the entire reserve code that you wrote for
>>> webrev.01, and move it into a separate function. The code
>>> would then look like:
>>>
>>> char* start = reserve_memory_something(bytes, req_addr,
>>> alignment);
>>> if (start == NULL) {
>>> return NULL;
>>> }
>>>
>>> and the reader of reserve_memory_special_huge_tlbfs_mixed
>>> could simply ignore the details of reserve_memory_something,
>>> unless they were important for some reason. That way you
>>> could also write separate tests for reserve_memory_something
>>> (or whatever it would be called).
>>>
>>> ---
>>> You are now using anon_mmap which has the comment:
>>>
>>> 3070 // anon_mmap() should only get called during VM
>>> initialization,
>>> 3071 // don't need lock (actually we can skip locking
>>> even it can be called
>>> 3072 // from multiple threads, because
>>> _highest_vm_reserved_address is just a
>>> 3073 // hint about the upper limit of non-stack memory
>>> regions.)
>>> 3074 if ((address)addr + bytes >
>>> _highest_vm_reserved_address) {
>>> 3075 _highest_vm_reserved_address = (address)addr +
>>> bytes;
>>> 3076 }
>>>
>>> So, it's not entirely clear to me that you should be using
>>> this function. I don't know what _highest_vm_reserved_address
>>> is used for. Your new function, anon_mmap_aligned isn't
>>> updating this variable.
>>>
>>>
>>> The comment is obsolete and does not reflect reality; anon_mmap
>>> is used for os::reserve_memory(), which in turn can be used at
>>> any time in the life of the VM.
>>>
>>> The comment - and the associated variable
>>> "_highest_vm_reserved_address" - only affects the LinuxThreads
>>> implementation which is legacy and should be removed (I just did
>>> ask this question on hotspot-dev and will do the cleanup if I get
>>> green light).
>>>
>>> It is also broken since forever, since a lot of allocations
>>> happen in VM and JDK where _highest_vm_reserved_address is just
>>> not updated.
>>
>> OK. It would be good to get rid of it at some point then.
>>
>>>
>>> Could we go back and use the reserve code you had in
>>> webrev.01, so that we don't have to consider the impact of
>>> this?
>>>
>>>
>>> It actually changes less than my last webrev:
>>>
>>> - old code was using os::reserve_memory() and
>>> os::reserve_memory_aligned().
>>> os::reserve_memory() directly called anon_mmap().
>>> os::reserve_memory_aligned() called os::reserve_memory(),
>>> which directly called anon_mmap().
>>>
>>> - the new code:
>>> for the req_addr != NULL, calls anon_mmap directly, only
>>> difference to before is that we now do not specify MAP_FIXED.
>>> for the req_addr==NULL case, it calls the new function
>>> anon_mmap_aligned(), which did not exist before, so no
>>> compatibility issues can arise.
>>>
>>> I can live with both my webrev.01 proposal and this one, but the
>>> solution to factor out all the allocation to a single function
>>> would be my least favorite choice: I do not really like factoring
>>> out a function which only exists to be called from a single other
>>> function, and which is too awkward to be used any other way.
>>
>> The reason for moving it out to a function is to get better
>> "separation of concerns" and hide the implementation details, it
>> doesn't really matter if it's used only once.
>>
>>> And your "reserved_memory_something()" would be quite awkward:
>>> bundling mmap allocation for the req_addr case (which duplicates
>>> what anon_mmap() does), plus adding support for allocating at an
>>> alignment. It does not easily fit together imho.
>>
>> But you almost already had the "awkward" code in webrev.01, and it
>> was quite easy to understand. With the requirement that req_addr
>> is aligned if an alignment is passed in, then the code would be
>> even easier. It could probably be used to implement anon_mmap with
>> just a few small changes. And by doing that we would get rid of
>> some code duplication as well. Not sure it worth it though.
>>
>>>
>>> Of course, it is all a matter of taste, and if you insist, I'll
>>> do it. Most important is to fix the bug.
>>
>> OK. Then, please, extract it out to a separate function so that
>> reserve_memory_special_huge_tlbfs_mixed only have one call out to
>> a function that reserves the memory. It's OK if you want to
>> implement it as a anon_mmap/anon_mmap_aligned pair or inline the
>> code (as in webrev.01), as long as I don't have to see the code in
>> reserve_memory_special_huge_tlbfs_mixed.
>>
>> Thanks,
>> StefanK
>>
>>
>>>
>>>> ---
>>>> - size_t large_page_size = os::large_page_size();
>>>> + char* start;
>>>>
>>>> + size_t large_page_size = os::large_page_size();
>>>>
>>>> There's no need anymore to declare 'start' before it's
>>>> set. So this should be incorporated into:
>>>>
>>>> 3489 start = (char*) ::mmap(req_addr, extra_size,
>>>> PROT_NONE,
>>>> 3490 MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>
>>>> ---
>>>> 3480 // first reserve - but not commit - the address
>>>> range in small pages.
>>>> ...
>>>> 3529 // Commit small-paged leading area
>>>>
>>>> Could you start your comments with a capital and with a
>>>> period?
>>>>
>>>>
>>>> Done. Note that there are comments where the first word is a
>>>> parameter name - if that name is lowercase, I would rather
>>>> not write it upper case.
>>>
>>> 3082 // reserve small-paged memory, optionally aligned to
>>> given boundary.
>>> 3083 // alignment must be a multiple of allocation
>>> granularity. bytes must be
>>>
>>> and some in the test as well.
>>>
>>>
>>> Okay, will change.
>>>
>>>>
>>>> ---
>>>> 3481 // Note: if a wish address was given
>>>>
>>>> There's no other reference to a "wish address". Could
>>>> you reword it to be req_addr or "requested address"?
>>>>
>>>>
>>>> Done.
>>>>
>>>>
>>>> ---
>>>> 3481 // Note: if a wish address was given, we still do
>>>> not use MAP_FIXED - do not want to trash
>>>> 3482 // existing mappings. We hand down req_addr to
>>>> mmap and hope the OS does its best.
>>>>
>>>> Could you move this comment down to the mmap call?
>>>>
>>>>
>>>> Done.
>>>>
>>>>
>>>> ---
>>>> 3529 // Commit small-paged leading area
>>>> ...
>>>> 3540 // Commit large paged area
>>>>
>>>> Shouldn't it say large-paged as well?
>>>>
>>>>
>>>> Done.
>>>>
>>>>
>>>> ---
>>>> 3483 // Note2: if wish address was given, alignment is
>>>> ignored
>>>>
>>>> Previously, we asserted that our in-parameters were
>>>> aligned. Why did you replace that with a comment?
>>>>
>>>>
>>>> Ok, I restored the asserts, but I really am not happy with
>>>> that. I find the asserts way too strict and they limit the
>>>> usefulness of the functions:
>>>>
>>>> - "assert(is_ptr_aligned(req_addr, alignment), "Must be");"
>>>>
>>>> Why must this be? Why do I even have to input an alignment
>>>> value if I input a requested address? It would make more
>>>> sense just to say that alignment is ignored if req_addr is
>>>> set.
>>>> Note that this was not really well tested beforehand,
>>>> because there were no tests for a non-null req_addr.
>>>>
>>>> - "assert(is_size_aligned(bytes, alignment), "Must be");"
>>>>
>>>> This is even worse. Why should the size of the allocation be
>>>> aligned to the alignment? Why can't I allocate a small chunk
>>>> of memory at a larger alignment? I do not understand this
>>>> limit at all.
>>>>
>>>> But ok, I restored the asserts and had to limit the test
>>>> accordingly to not to trigger those asserts.
>>>
>>> You are probably right, and those alignment restrictions
>>> should probably have been put in the higher layers were it's
>>> needed. But, please, don't change this kind of things in an
>>> unrelated bug fix. It needs a separate RFE so that it gets
>>> enough scrutiny to make sure that we don't break something.
>>>
>>>
>>> I agree with you. It is out of scope for this change.
>>>
>>> Kind Regards, Thomas
>>>
>>> Thanks,
>>> StefanK
>>>
>>>
>>>> ---
>>>> + size_t extra_size = ...
>>>> + char* const end0 = start0 + bytes;
>>>> + char* const end = start + extra_size;
>>>>
>>>> It's not obvious from the variable names what they
>>>> represent, IMHO. Could you try to give them more
>>>> descriptive names?
>>>>
>>>>
>>>> Done.
>>>>
>>>> I'll take a look at the tests after next review round.
>>>>
>>>>
>>>> Ok, thanks for taking the time!
>>>>
>>>> ..Thomas
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> ..Thomas
>>>>>
>>>>>
>>>>> On Wed, Apr 15, 2015 at 1:55 PM, Stefan Karlsson
>>>>> <stefan.karlsson at oracle.com
>>>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>>
>>>>> Hi Thomas,
>>>>>
>>>>> Thanks for fixing this bug.
>>>>>
>>>>> Some comments inlined:
>>>>>
>>>>> On 2015-04-09 17:08, Thomas Stüfe wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> please review this fix to huge page allocation
>>>>> on Linux.
>>>>>
>>>>> bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8077276
>>>>> webrev:
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/>
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html>
>>>>>
>>>>> + const int prot = exec ?
>>>>> PROT_READ|PROT_WRITE|PROT_EXEC :
>>>>> PROT_READ|PROT_WRITE;
>>>>> char* start;
>>>>> if (req_addr != NULL) {
>>>>> assert(is_ptr_aligned(req_addr, alignment), "Must
>>>>> be");
>>>>> assert(is_size_aligned(bytes, alignment), "Must
>>>>> be");
>>>>> - start = os::reserve_memory(bytes, req_addr);
>>>>> - assert(start == NULL || start == req_addr, "Must
>>>>> be");
>>>>> + // do NOT use MAP_FIXED here - do not trash
>>>>> existing mappings.
>>>>> + start = (char*) ::mmap(req_addr, bytes, prot,
>>>>> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>> + if (start == MAP_FAILED) {
>>>>> + start = NULL;
>>>>> + }
>>>>> + if (!is_ptr_aligned(start, alignment)) {
>>>>> + ::munmap(start, bytes);
>>>>> + start = NULL;
>>>>> + }
>>>>>
>>>>>
>>>>> Previously, the small page area was mapped with
>>>>> PROT_NONE, but now you change it to RW. Is there a
>>>>> reason why you changed that?
>>>>>
>>>>>
>>>>> result = ::mmap(lp_start, lp_bytes, prot,
>>>>> MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_HUGETLB,
>>>>> -1, 0);
>>>>> if (result == MAP_FAILED) {
>>>>> - warn_on_large_pages_failure(req_addr, bytes,
>>>>> errno);
>>>>> + warn_on_large_pages_failure(start, bytes, errno);
>>>>>
>>>>> Shouldn't we report lp_start instead of start here?
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp>
>>>>>
>>>>> There are a number of things that I would like to
>>>>> change with the the test.
>>>>>
>>>>> Can you use mapping == NULL in null checks instead
>>>>> of just if (mapping). The same for the asserts.
>>>>>
>>>>> I think the test will be easier to understand if
>>>>> you get rid of the for loop and extract code into
>>>>> helper functions, or do a small amount of code
>>>>> duplication:
>>>>>
>>>>> // do the test twice: once with a
>>>>> req_addr=NULL, once with a non-NULL
>>>>> // req_addr; the latter shall test if we trash
>>>>> existing mappings
>>>>> for (int run = 0; run < 2; run ++) {
>>>>>
>>>>> I think you should fail or exit the test if you
>>>>> can't get the expected mapping, instead of
>>>>> continuing and have :
>>>>>
>>>>> if (addr) {
>>>>> test_log("... mapped [" PTR_FORMAT "-" PTR_FORMAT
>>>>> ").", addr, addr + size);
>>>>> } else {
>>>>> test_log("... failed.");
>>>>> }
>>>>>
>>>>> if (addr != NULL) {
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>>
>>>>>
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> first establishes a
>>>>> mapping with small pages over the whole
>>>>> requested range, then exchanges the
>>>>> parts of the mapping which are aligned to large
>>>>> page size with large pages.
>>>>>
>>>>> Now, when establishing the first mapping, it
>>>>> uses os::reserve_memory() with
>>>>> a potentially non-null req_addr. In that case,
>>>>> os::reserve_memory() will do
>>>>> a mmap(MAP_FIXED).
>>>>>
>>>>> This will trash any pre-existing mappings.
>>>>>
>>>>> Note that I could not willingly reproduce the
>>>>> error with an unmodified VM.
>>>>> But I added a reproduction case which
>>>>> demonstrated that if one were to call
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> with a non-NULL
>>>>> req_addr, existing mappings would be trashed.
>>>>> Depending on where we are in
>>>>> address space, we also would overwrite libc
>>>>> structures and crash
>>>>> immediately.
>>>>>
>>>>> The repro case is part of the change, see
>>>>> changes in
>>>>> test_reserve_memory_special_huge_tlbfs_mixed(),
>>>>> and can be executed with:
>>>>>
>>>>> ./images/jdk/bin/java -XX:+UseLargePages
>>>>> -XX:+UseHugeTLBFS -XX:-UseSHM
>>>>> -XX:+ExecuteInternalVMTests
>>>>> -XX:+VerboseInternalVMTests
>>>>>
>>>>> The fix: instead of using os::reserve_memory(),
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> now calls mmap()
>>>>> directly with the non-NULL req_addr, but
>>>>> without MAP_FIXED. This means the
>>>>> OS may do its best to allocate at req_addr, but
>>>>> will not trash any
>>>>> mappings. This also follows the pattern in
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_only(),
>>>>> which is a sister
>>>>> function of
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed().
>>>>>
>>>>> Note also discussion at:
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html.
>>>>>
>>>>> Thanks for reviewing!
>>>>>
>>>>> Kind Regards, Thomas
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
More information about the hotspot-runtime-dev
mailing list