RFR(s): 8077276: allocating heap with UseLargePages and HugeTLBFS may trash existing memory mappings (linux)
Stefan Karlsson
stefan.karlsson at oracle.com
Mon Apr 27 08:25:10 UTC 2015
On 2015-04-24 11:41, Thomas Stüfe wrote:
> Hi Stefan,
>
> ok. I see it makes sense to limit the scope of the change as much as
> possible to the actual error and do semantic changes separately.
>
> I worked for a long time with os::reserve_memory() and friends,
> especially because I had to port it to AIX here at SAP. That was
> annoyingly difficult because of AIX specifics, so it basically ended
> up being a complete new implementation. I guess that is the reason I
> am a bit "looser" around this coding and tend to change too much -
> hard to get this "ownership" feeling out of the bones.
>
> On a side note, I am quite happy now that SAP is contributing to the
> OpenJDK, so now I have the chance to actually bring some changes
> upstream :)
Thanks for fixing the bugs and cleaning out our inconsistent APIs. :)
>
> Ok, what you suggest makes sense and I will change the fix accordingly.
>
> Some more remarks inline:
>
>
>
> On Thu, Apr 23, 2015 at 9:53 PM, Stefan Karlsson
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
> Hi Thomas,
>
> On 2015-04-23 18:09, Thomas Stüfe wrote:
>> Hi Stefan,
>>
>> new webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.03/webrev/
>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.03/webrev/>
>
> I think I've realized that we have different views on what the API
> for the reserve function/code should be.
>
> The code you have written has two more or less mutual exclusive
> execution modes:
> 1) req_addr != NULL => ignore the alignment
> 2) req_addr == NULL => adhere to the alignment
>
> this reflects in the code, the name of the new function and in the
> comments.
>
> If we take the previous implementation of the code, before your
> patches:
> 3482 if (req_addr != NULL) {
> 3483 assert(is_ptr_aligned(req_addr, alignment), "Must be");
> 3484 assert(is_size_aligned(bytes, alignment), "Must be");
> 3485 start = os::reserve_memory(bytes, req_addr);
> 3486 assert(start == NULL || start == req_addr, "Must be");
> 3487 } else {
> 3488 start = os::reserve_memory_aligned(bytes, alignment);
> 3489 }
> 3490
> 3491 if (start == NULL) {
> 3492 return NULL;
> 3493 }
> 3494
> 3495 assert(is_ptr_aligned(start, alignment), "Must be");
>
>
> If we would have extracted that into a function then the API would
> have been:
> INPUT:
> alignment - requested alignment
>
>
> So... alignment can be 0 or not? It was never tested, the old code
> somewhat assumes that it would never be 0.
Currently, we never pass in 0 as the alignment, unless it changed recently.
> req_addr - requested address (must be 'alignment' aligned)
> bytes - the size of the memory chunk to reserve (must be
> [arguably] 'alignment' aligned)
>
>
> Almost. bytes had to be aligned to alignment only for the req_addr !=
> NULL case, which makes not much sense.
The alignment check was done here:
char* os::reserve_memory_aligned(size_t size, size_t alignment) {
assert((alignment & (os::vm_allocation_granularity() - 1)) == 0,
"Alignment must be a multiple of allocation granularity (page
size)");
assert((size & (alignment -1)) == 0, "size must be 'alignment' aligned");
>
>
> OUTPUT:
> address - a memory chunk that is 'alignment' aligned and 'bytes'
> large, that starts at 'req_addr' if 'req_addr' is given. Or NULL
> at failures.
>
> Your current code is written to allow for a broader use-case that
> the current code doesn't need, where the req_addr doesn't have to
> be 'alignment' aligned. I would much prefer if you kept the INPUT
> and OUPUT requirements above and made sure that the function
> implemented those. And _if_ we need to alter the current
> restrictions, we go back and do it as a separate change.
>
> ok!
>
> The code code be written as (not tested with a compiler):
>
> static char* anon_mmap_aligned(size_t bytes, size_t alignment,
> char* req_addr) {
> // Input requirements.
> assert(is_ptr_aligned(req_addr, alignment), "Must be");
> assert(is_size_aligned(bytes, alignment), "Must be");
>
> size_t extra_size = bytes;
> if (alignment > 0) {
> extra_size += alignment;
> }
>
> char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0);
> if (start == MAP_FAILED) {
> return NULL;
> }
>
> if (start != req_addr) {
> ::munmap(start, extra_size);
> return NULL;
> }
>
> if (alignment > 0) {
> char* const start_aligned = (char*) align_ptr_up(start,
> alignment);
> char* const end_aligned = start_aligned + bytes;
> char* const end = start + extra_size;
> if (start_aligned > start) {
> ::munmap(start, start_aligned - start);
> }
> if (end_aligned < end) {
> ::munmap(end_aligned, end - end_aligned);
> }
> start = start_aligned;
> }
>
> // Output requirements.
> assert(req_addr == NULL || start == req_addr, "Must be");
> assert(is_ptr_aligned(start, alignment), "Must be");
> return start;
> }
>
> and we could even get rid of the alignment > 0 checks, if we want to:
>
>
>
> No, because you'd change the behaviour in case req_addr != NULL
> compared to the old coding. You would always mmap more than needed,
> then discard the rest. Aside from being unnecessary, this would also
> change mapping placement, because OS must fit a larger mapping.
Yes, realized this over the night.
Thanks,
StefanK
>
> static char* anon_map_aligned(size_t bytes, size_t alignment,
> char* req_addr) {
> // Input requirements.
> assert(is_ptr_aligned(req_addr, alignment), "Must be");
> assert(is_size_aligned(bytes, alignment), "Must be");
>
> size_t extra_size = bytes + alignment;
>
> char* start = (char*) ::mmap(req_addr, extra_size, PROT_NONE,
> MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE,
> -1, 0);
> if (start == MAP_FAILED) {
> return NULL;
> }
>
> if (start != req_addr) {
> ::munmap(start, extra_size);
> return NULL;
> }
>
> // Potentially adjust to get an aligned mmapping.
> char* const start_aligned = (char*) align_ptr_up(start, alignment);
> char* const end_aligned = start_aligned + bytes;
> char* const end = start + extra_size;
> if (start_aligned > start) {
> ::munmap(start, start_aligned - start);
> }
> if (end_aligned < end) {
> ::munmap(end_aligned, end - end_aligned);
> }
> start = start_aligned;
>
> // Output requirements.
> assert(req_addr == NULL || start == req_addr, "Must be");
> assert(is_ptr_aligned(start, alignment), "Must be");
> return start;
> }
>
> And as a side-note, even with these restrictions, we're not far
> from implementing anon_mmap as:
>
> static char* anon_mmap(char* requested_addr, size_t bytes, bool
> fixed) {
> return anon_mmap_aligned(bytes, 0, requested_addr);
> }
>
> we just need to cater for the fixed flag. (The input parameters
> are in different order in the current patch. That should probably
> be fixed.).
>
> ---
>
> Smaller style issues:
>
> - Don't NULL-check with if (p), use if (p != NULL)
> - i ++ should be written as i++
>
>
> ok, will change.
>
> Thanks,
> StefanK
>
>
> Thanks, Thomas
>
>
>> Best Regards, Thomas
>>
>>
>> On Thu, Apr 23, 2015 at 4:16 PM, Stefan Karlsson
>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>> wrote:
>>
>> Hi Thomas,
>>
>>
>> On 2015-04-23 14:55, Thomas Stüfe wrote:
>>> Hi Stefan,
>>>
>>> lets talk a bit more to get an understanding before I change
>>> the webrev again:
>>>
>>> On Thu, Apr 23, 2015 at 10:48 AM, Stefan Karlsson
>>> <stefan.karlsson at oracle.com
>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>
>>>
>>>
>>> On 2015-04-22 17:20, Thomas Stüfe wrote:
>>>> Hi Stefan,
>>>>
>>>> new webrev:
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.02/webrev/
>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.02/webrev/>
>>>>
>>>> See my answers inline.
>>>>
>>>> On Wed, Apr 22, 2015 at 2:20 PM, Stefan Karlsson
>>>> <stefan.karlsson at oracle.com
>>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>
>>>> Hi Thomas,
>>>>
>>>> On 2015-04-21 18:31, Thomas Stüfe wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> thanks for reviewing!
>>>>>
>>>>> Here is the new webrev:
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/>
>>>>>
>>>>> I worked in all your remarks except one: the test
>>>>> unfortunately cannot just assert if allocation
>>>>> fails, because that may happen sometimes,
>>>>> especially with small allocation sizes
>>>>> (os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> will refuse to allocate if the area cannot
>>>>> accomodate one large page). The test also did not
>>>>> assert prior to my change, so I did not want to
>>>>> change that behaviour.
>>>>>
>>>>> I changed some more things:
>>>>>
>>>>> 1) Before my change,
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> would call os::reserve_memory() or
>>>>> os::reserve_memory_aligned(), depending on whether
>>>>> req_addr was NULL or not. That was a bit unclean
>>>>> (a lower-level API calling a high-level API) and
>>>>> led to the weird workaround at os_linux.cpp:3497,
>>>>> where the NMT Memory tracker had to deregister a
>>>>> allocation which it expected to be registered
>>>>> again later at an upper level. Yikes.
>>>>>
>>>>> Because I changed the first case to use low-level
>>>>> mmap, I also changed the second - replaced
>>>>> os::reserve_memory_aligned with low level coding.
>>>>> Actually I merged both branches together as good
>>>>> as possible, so the new code (os_linux.cpp:3484
>>>>> ff) now handles alignment as well: If req_addr is
>>>>> set, allocation is attempted at req_addr and
>>>>> alignment is ignored. If req_addr is NULL,
>>>>> alignment is honored. This way I could get rid of
>>>>> the workaround for the NMT completely.
>>>>>
>>>>> 2) I completely rewrote the test function. It now
>>>>> tests three scenarios, see comment. I hope the
>>>>> code is clear and easy to understand.
>>>>
>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html
>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.01/webrev/src/os/linux/vm/os_linux.cpp.frames.html>
>>>>
>>>> ---
>>>> + start = (char*) ::mmap(req_addr, extra_size,
>>>> PROT_NONE,
>>>> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>> + if (start == MAP_FAILED) {
>>>> return NULL;
>>>> }
>>>>
>>>> You've changed the behavior of this function. It
>>>> used to return NULL if reserve_memory couldn't
>>>> reserve memory at the requested address. Your
>>>> proposed patch can return another address. This
>>>> will be a problem for the upper layers when using
>>>> compressed oops. I think it would be better if you
>>>> keep the old behavior.
>>>>
>>>>
>>>> Arguably the new behaviour is better: The code which
>>>> allocates the heap for compressed Oops actually deals
>>>> with this case quite nicely by doing a range check for
>>>> the address, not an identity check (see
>>>> virtualspace.cpp
>>>> "ReservedHeapSpace::try_reserve_range()"). The other
>>>> code path in "ReservedSpace::initialize()" at least
>>>> handles this case (if returned address is unequal to
>>>> requested address, releases memory again). So if the OS
>>>> cooperates by returning an address in the vicinity of
>>>> the requested address, it would actually be better to
>>>> use this address if it fits instead of retrying the
>>>> allocation.
>>>>
>>>> However, I am not completely sure about other callers
>>>> of os::reserve_memory_special(), and changing the
>>>> behaviour would be beyond the scope of my change
>>>> anyway, so I restored the old behaviour as you wished.
>>>> Now we either return the requested address or NULL if a
>>>> requested address was given.
>>>
>>> Sounds good to me.
>>>
>>>
>>> Ok, I'll keep it that way.
>>>
>>>>
>>>>
>>>> ---
>>>> Instead of inlining the reserve memory code into
>>>> reserve_memory_special_huge_tlbfs_mixed, could you
>>>> extract it into function? So that we have one
>>>> function that's responsible for reservinb PROT_NONE
>>>> memory, and one that commits the small, large,
>>>> small pages.
>>>>
>>>>
>>>> Ok, did this. Because there is already a anon_mmap(),
>>>> which fits the req_addr != NULL case, I just reused
>>>> this and added a sister function "anon_mmap_aligned"
>>>> for the alignment case.
>>>
>>> I didn't try to argue that you should split up the code,
>>> but rather move the entire reserve code that you wrote
>>> for webrev.01, and move it into a separate function. The
>>> code would then look like:
>>>
>>> char* start = reserve_memory_something(bytes, req_addr,
>>> alignment);
>>> if (start == NULL) {
>>> return NULL;
>>> }
>>>
>>> and the reader of
>>> reserve_memory_special_huge_tlbfs_mixed could simply
>>> ignore the details of reserve_memory_something, unless
>>> they were important for some reason. That way you could
>>> also write separate tests for reserve_memory_something
>>> (or whatever it would be called).
>>>
>>> ---
>>> You are now using anon_mmap which has the comment:
>>>
>>> 3070 // anon_mmap() should only get called during VM
>>> initialization,
>>> 3071 // don't need lock (actually we can skip
>>> locking even it can be called
>>> 3072 // from multiple threads, because
>>> _highest_vm_reserved_address is just a
>>> 3073 // hint about the upper limit of non-stack
>>> memory regions.)
>>> 3074 if ((address)addr + bytes >
>>> _highest_vm_reserved_address) {
>>> 3075 _highest_vm_reserved_address = (address)addr + bytes;
>>> 3076 }
>>>
>>> So, it's not entirely clear to me that you should be
>>> using this function. I don't know what
>>> _highest_vm_reserved_address is used for. Your new
>>> function, anon_mmap_aligned isn't updating this variable.
>>>
>>>
>>> The comment is obsolete and does not reflect reality;
>>> anon_mmap is used for os::reserve_memory(), which in turn
>>> can be used at any time in the life of the VM.
>>>
>>> The comment - and the associated variable
>>> "_highest_vm_reserved_address" - only affects the
>>> LinuxThreads implementation which is legacy and should be
>>> removed (I just did ask this question on hotspot-dev and
>>> will do the cleanup if I get green light).
>>>
>>> It is also broken since forever, since a lot of allocations
>>> happen in VM and JDK where _highest_vm_reserved_address is
>>> just not updated.
>>
>> OK. It would be good to get rid of it at some point then.
>>
>>>
>>> Could we go back and use the reserve code you had in
>>> webrev.01, so that we don't have to consider the impact
>>> of this?
>>>
>>>
>>> It actually changes less than my last webrev:
>>>
>>> - old code was using os::reserve_memory() and
>>> os::reserve_memory_aligned().
>>> os::reserve_memory() directly called anon_mmap().
>>> os::reserve_memory_aligned() called os::reserve_memory(),
>>> which directly called anon_mmap().
>>>
>>> - the new code:
>>> for the req_addr != NULL, calls anon_mmap directly, only
>>> difference to before is that we now do not specify MAP_FIXED.
>>> for the req_addr==NULL case, it calls the new function
>>> anon_mmap_aligned(), which did not exist before, so no
>>> compatibility issues can arise.
>>>
>>> I can live with both my webrev.01 proposal and this one, but
>>> the solution to factor out all the allocation to a single
>>> function would be my least favorite choice: I do not really
>>> like factoring out a function which only exists to be called
>>> from a single other function, and which is too awkward to be
>>> used any other way.
>>
>> The reason for moving it out to a function is to get better
>> "separation of concerns" and hide the implementation details,
>> it doesn't really matter if it's used only once.
>>
>>> And your "reserved_memory_something()" would be quite
>>> awkward: bundling mmap allocation for the req_addr case
>>> (which duplicates what anon_mmap() does), plus adding
>>> support for allocating at an alignment. It does not easily
>>> fit together imho.
>>
>> But you almost already had the "awkward" code in webrev.01,
>> and it was quite easy to understand. With the requirement
>> that req_addr is aligned if an alignment is passed in, then
>> the code would be even easier. It could probably be used to
>> implement anon_mmap with just a few small changes. And by
>> doing that we would get rid of some code duplication as well.
>> Not sure it worth it though.
>>
>>>
>>> Of course, it is all a matter of taste, and if you insist,
>>> I'll do it. Most important is to fix the bug.
>>
>> OK. Then, please, extract it out to a separate function so
>> that reserve_memory_special_huge_tlbfs_mixed only have one
>> call out to a function that reserves the memory. It's OK if
>> you want to implement it as a anon_mmap/anon_mmap_aligned
>> pair or inline the code (as in webrev.01), as long as I don't
>> have to see the code in reserve_memory_special_huge_tlbfs_mixed.
>>
>> Thanks,
>> StefanK
>>
>>
>>>
>>>> ---
>>>> - size_t large_page_size = os::large_page_size();
>>>> + char* start;
>>>>
>>>> + size_t large_page_size = os::large_page_size();
>>>>
>>>> There's no need anymore to declare 'start' before
>>>> it's set. So this should be incorporated into:
>>>>
>>>> 3489 start = (char*) ::mmap(req_addr, extra_size,
>>>> PROT_NONE,
>>>> 3490 MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>
>>>> ---
>>>> 3480 // first reserve - but not commit - the
>>>> address range in small pages.
>>>> ...
>>>> 3529 // Commit small-paged leading area
>>>>
>>>> Could you start your comments with a capital and
>>>> with a period?
>>>>
>>>>
>>>> Done. Note that there are comments where the first word
>>>> is a parameter name - if that name is lowercase, I
>>>> would rather not write it upper case.
>>>
>>> 3082 // reserve small-paged memory, optionally aligned
>>> to given boundary.
>>> 3083 // alignment must be a multiple of allocation
>>> granularity. bytes must be
>>>
>>> and some in the test as well.
>>>
>>>
>>> Okay, will change.
>>>
>>>>
>>>> ---
>>>> 3481 // Note: if a wish address was given
>>>>
>>>> There's no other reference to a "wish address".
>>>> Could you reword it to be req_addr or "requested
>>>> address"?
>>>>
>>>>
>>>> Done.
>>>>
>>>>
>>>> ---
>>>> 3481 // Note: if a wish address was given, we
>>>> still do not use MAP_FIXED - do not want to trash
>>>> 3482 // existing mappings. We hand down req_addr
>>>> to mmap and hope the OS does its best.
>>>>
>>>> Could you move this comment down to the mmap call?
>>>>
>>>>
>>>> Done.
>>>>
>>>>
>>>> ---
>>>> 3529 // Commit small-paged leading area
>>>> ...
>>>> 3540 // Commit large paged area
>>>>
>>>> Shouldn't it say large-paged as well?
>>>>
>>>>
>>>> Done.
>>>>
>>>>
>>>> ---
>>>> 3483 // Note2: if wish address was given,
>>>> alignment is ignored
>>>>
>>>> Previously, we asserted that our in-parameters were
>>>> aligned. Why did you replace that with a comment?
>>>>
>>>>
>>>> Ok, I restored the asserts, but I really am not happy
>>>> with that. I find the asserts way too strict and they
>>>> limit the usefulness of the functions:
>>>>
>>>> - "assert(is_ptr_aligned(req_addr, alignment), "Must be");"
>>>>
>>>> Why must this be? Why do I even have to input an
>>>> alignment value if I input a requested address? It
>>>> would make more sense just to say that alignment is
>>>> ignored if req_addr is set.
>>>> Note that this was not really well tested beforehand,
>>>> because there were no tests for a non-null req_addr.
>>>>
>>>> - "assert(is_size_aligned(bytes, alignment), "Must be");"
>>>>
>>>> This is even worse. Why should the size of the
>>>> allocation be aligned to the alignment? Why can't I
>>>> allocate a small chunk of memory at a larger alignment?
>>>> I do not understand this limit at all.
>>>>
>>>> But ok, I restored the asserts and had to limit the
>>>> test accordingly to not to trigger those asserts.
>>>
>>> You are probably right, and those alignment restrictions
>>> should probably have been put in the higher layers were
>>> it's needed. But, please, don't change this kind of
>>> things in an unrelated bug fix. It needs a separate RFE
>>> so that it gets enough scrutiny to make sure that we
>>> don't break something.
>>>
>>>
>>> I agree with you. It is out of scope for this change.
>>>
>>> Kind Regards, Thomas
>>>
>>> Thanks,
>>> StefanK
>>>
>>>
>>>> ---
>>>> + size_t extra_size = ...
>>>> + char* const end0 = start0 + bytes;
>>>> + char* const end = start + extra_size;
>>>>
>>>> It's not obvious from the variable names what they
>>>> represent, IMHO. Could you try to give them more
>>>> descriptive names?
>>>>
>>>>
>>>> Done.
>>>>
>>>> I'll take a look at the tests after next review round.
>>>>
>>>>
>>>> Ok, thanks for taking the time!
>>>>
>>>> ..Thomas
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> ..Thomas
>>>>>
>>>>>
>>>>> On Wed, Apr 15, 2015 at 1:55 PM, Stefan Karlsson
>>>>> <stefan.karlsson at oracle.com
>>>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>>
>>>>> Hi Thomas,
>>>>>
>>>>> Thanks for fixing this bug.
>>>>>
>>>>> Some comments inlined:
>>>>>
>>>>> On 2015-04-09 17:08, Thomas Stüfe wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> please review this fix to huge page
>>>>> allocation on Linux.
>>>>>
>>>>> bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8077276
>>>>> webrev:
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/>
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/src/os/linux/vm/os_linux.cpp.udiff.html>
>>>>>
>>>>> + const int prot = exec ?
>>>>> PROT_READ|PROT_WRITE|PROT_EXEC :
>>>>> PROT_READ|PROT_WRITE;
>>>>> char* start;
>>>>> if (req_addr != NULL) {
>>>>> assert(is_ptr_aligned(req_addr, alignment),
>>>>> "Must be");
>>>>> assert(is_size_aligned(bytes, alignment),
>>>>> "Must be");
>>>>> - start = os::reserve_memory(bytes, req_addr);
>>>>> - assert(start == NULL || start == req_addr,
>>>>> "Must be");
>>>>> + // do NOT use MAP_FIXED here - do not
>>>>> trash existing mappings.
>>>>> + start = (char*) ::mmap(req_addr, bytes, prot,
>>>>> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
>>>>> + if (start == MAP_FAILED) {
>>>>> + start = NULL;
>>>>> + }
>>>>> + if (!is_ptr_aligned(start, alignment)) {
>>>>> + ::munmap(start, bytes);
>>>>> + start = NULL;
>>>>> + }
>>>>>
>>>>>
>>>>> Previously, the small page area was mapped
>>>>> with PROT_NONE, but now you change it to RW.
>>>>> Is there a reason why you changed that?
>>>>>
>>>>>
>>>>> result = ::mmap(lp_start, lp_bytes, prot,
>>>>> MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED|MAP_HUGETLB,
>>>>> -1, 0);
>>>>> if (result == MAP_FAILED) {
>>>>> - warn_on_large_pages_failure(req_addr, bytes,
>>>>> errno);
>>>>> + warn_on_large_pages_failure(start, bytes,
>>>>> errno);
>>>>>
>>>>> Shouldn't we report lp_start instead of start
>>>>> here?
>>>>>
>>>>>
>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp
>>>>> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8077276/webrev.00/webrev/raw_files/new/src/os/linux/vm/os_linux.cpp>
>>>>>
>>>>> There are a number of things that I would like
>>>>> to change with the the test.
>>>>>
>>>>> Can you use mapping == NULL in null checks
>>>>> instead of just if (mapping). The same for the
>>>>> asserts.
>>>>>
>>>>> I think the test will be easier to understand
>>>>> if you get rid of the for loop and extract
>>>>> code into helper functions, or do a small
>>>>> amount of code duplication:
>>>>>
>>>>> // do the test twice: once with a
>>>>> req_addr=NULL, once with a non-NULL
>>>>> // req_addr; the latter shall test if we
>>>>> trash existing mappings
>>>>> for (int run = 0; run < 2; run ++) {
>>>>>
>>>>> I think you should fail or exit the test if
>>>>> you can't get the expected mapping, instead of
>>>>> continuing and have :
>>>>>
>>>>> if (addr) {
>>>>> test_log("... mapped [" PTR_FORMAT "-"
>>>>> PTR_FORMAT ").", addr, addr + size);
>>>>> } else {
>>>>> test_log("... failed.");
>>>>> }
>>>>>
>>>>> if (addr != NULL) {
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>>
>>>>>
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> first establishes a
>>>>> mapping with small pages over the whole
>>>>> requested range, then exchanges the
>>>>> parts of the mapping which are aligned to
>>>>> large page size with large pages.
>>>>>
>>>>> Now, when establishing the first mapping,
>>>>> it uses os::reserve_memory() with
>>>>> a potentially non-null req_addr. In that
>>>>> case, os::reserve_memory() will do
>>>>> a mmap(MAP_FIXED).
>>>>>
>>>>> This will trash any pre-existing mappings.
>>>>>
>>>>> Note that I could not willingly reproduce
>>>>> the error with an unmodified VM.
>>>>> But I added a reproduction case which
>>>>> demonstrated that if one were to call
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> with a non-NULL
>>>>> req_addr, existing mappings would be
>>>>> trashed. Depending on where we are in
>>>>> address space, we also would overwrite
>>>>> libc structures and crash
>>>>> immediately.
>>>>>
>>>>> The repro case is part of the change, see
>>>>> changes in
>>>>> test_reserve_memory_special_huge_tlbfs_mixed(),
>>>>> and can be executed with:
>>>>>
>>>>> ./images/jdk/bin/java -XX:+UseLargePages
>>>>> -XX:+UseHugeTLBFS -XX:-UseSHM
>>>>> -XX:+ExecuteInternalVMTests
>>>>> -XX:+VerboseInternalVMTests
>>>>>
>>>>> The fix: instead of using
>>>>> os::reserve_memory(),
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed()
>>>>> now calls mmap()
>>>>> directly with the non-NULL req_addr, but
>>>>> without MAP_FIXED. This means the
>>>>> OS may do its best to allocate at
>>>>> req_addr, but will not trash any
>>>>> mappings. This also follows the pattern in
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_only(),
>>>>> which is a sister
>>>>> function of
>>>>> os::Linux::reserve_memory_special_huge_tlbfs_mixed().
>>>>>
>>>>> Note also discussion at:
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2015-April/017823.html.
>>>>>
>>>>> Thanks for reviewing!
>>>>>
>>>>> Kind Regards, Thomas
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
More information about the hotspot-runtime-dev
mailing list