RFR: 8017629: G1: UseSHM in combination with a G1HeapRegionSize > os::large_page_size() falls back to use small pages
Stefan Karlsson
stefan.karlsson at oracle.com
Wed Apr 13 11:23:49 UTC 2016
Hi Thomas,
On 2016-04-13 12:44, Thomas Stüfe wrote:
> Hi Stefan,
>
> On Tue, Apr 12, 2016 at 5:41 PM, Stefan Karlsson
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
> Hi Thomas,
>
>
> On 2016-04-12 16:23, Thomas Stüfe wrote:
>> Hi Stefan,
>>
>>
>> On Mon, Apr 11, 2016 at 3:52 PM, Stefan Karlsson
>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>> wrote:
>>
>> Hi Thomas,
>>
>> On 2016-04-11 14:39, Thomas Stüfe wrote:
>>> Hi Stefan,
>>>
>>> short question, why the mmap before the shmat? Why not
>>> shmat right away at the requested address?
>>
>> If we have a requested_address we do exactly what you propose.
>>
>> if (req_addr == NULL && alignment > os::large_page_size()) {
>> return shmat_with_large_alignment(shmid, bytes, alignment);
>> } else {
>> return shmat_with_normal_alignment(shmid, req_addr);
>> }
>>
>> ...
>>
>> static char* shmat_with_normal_alignment(int shmid, char*
>> req_addr) {
>> char* addr = (char*)shmat(shmid, req_addr, 0);
>>
>> if ((intptr_t)addr == -1) {
>> shm_warning_with_errno("Failed to attach shared memory.");
>> return NULL;
>> }
>>
>> return addr;
>> }
>>
>>
>> It's when you don't have a requested address that mmap is
>> used to find a large enough virtual memory area.
>>
>>
>> Sorry, seems I did not look at this coding thoroughly enough. I
>> understand now that you do mmap to allocate and then to cut away
>> the extra pre-/post-space, something which would not be possible
>> with shmat, which cannot be unmapped page-wise.
>>
>> But I am still not sure why we do it his way:
>>
>> 3429 static char* shmat_with_alignment(int shmid, size_t bytes,
>> size_t alignment, char* req_addr) {
>> 3430 // If there's no requested address, the shmat call can
>> return memory that is not
>> 3431 // 'alignment' aligned, if the given alignment is larger
>> than the large page size.
>> 3432 // Special care needs to be taken to ensure that we get
>> aligned memory back.
>> 3433 if (req_addr == NULL && alignment > os::large_page_size()) {
>> 3434 return shmat_with_large_alignment(shmid, bytes, alignment);
>> 3435 } else {
>> 3436 return shmat_with_normal_alignment(shmid, req_addr);
>> 3437 }
>> 3438 }
>>
>> For req_addr==0 and big alignment, we attach at the given
>> alignment ("shmat_with_large_alignment").
>> For req_addr!=0, we attach at the given requested address
>> ("shmat_with_normal_alignment").
>> For req_addr==0 and smaller alignment, we ignore the alignment
>> and attach anywhere?
>>
>> Maybe I am slow, but why does it matter if the alignment is large
>> or small? Why not just distinguish between:
>>
>> 1) address given (req_addr!=0): in this case we attach at this
>> req_addr and rely on the user having aligned the address properly
>> for his purposes. We specify 0 for flags, so we will attach at
>> exactly the given address or fail. In this case we could simply
>> ignore the given alignment - if one was given - or just use it to
>> counter-check the req_addr.
>>
>> 2) alignment given (req_addr==0 and alignment > 0): attach at the
>> given alignment using mmap-before-shmat. This could be done for
>> any alignment, be it large or small.
>
> What you propose doesn't work.
>
> We're allocating large pages with SHM_HUGETLB, and if we try to
> attach to an address that is not large_page_size aligned the shmat
> call returns EINVAL.
>
>
> I was aware of this. What I meant was:
>
> You have "shmat_with_large_alignment" which takes an alignment and
> does its best to shmat with that alignment using the mmap trick. This
> coding does not need to know anything about huge pages, and actually
> does not do anything huge-pagey, apart from the asserts - it would
> just as well work with small pages, because the only place where the
> code needs to know about huge pages is in the layer above, in
> reserve_memory_special - where we pass SHM_HUGETLB to shmget. (Btw, I
> always wondered about the "reserve_memory_special" naming.)
>
> I think my point is that by renaming this to "shmat_with_alignment"
> and removing the huge-page-related asserts the function would become
> both simpler and more versatile and could be reused for small
> alignments as well as large ones. The fact that it returns EINVAL for
> alignments instead of asserting would not be a problem - we would
> return an error instead of asserting because of bad alignment, and
> both handling this error and asserting for huge-page-alignment could
> be handled better in reserve_memory_special.
>
> To put it another way, I think "shmat_with_large_alignment" does not
> need to know about huge pages; this should be the responsibility of
> reserve_memory_special.
>
> About "shmat_with_normal_alignment", this is actually only a raw shmat
> call and exists for the req_addr!=NULL case and for the case where we
> do not pass neither req_addr nor alignment. So the only thing it does
> not handle is alignment, so it is misnamed and also should not be
> called for the req_addr==NULL-and-small-alignments-case.
The reserve_memory_special_shm function and the associated helper
functions I'm adding are specifically written to support large pages
allocations. The names "normal_alignment" and "large_alignment" are
intended to refer to alignment sizes compared to the large pages size. I
grant you that it's not obvious from the name, and we can rename them to
make it more clear.
I want to provide a small bug fix for this large pages bug, while you
are suggesting that we re-purpose the code into supporting small page
allocations as well. Your suggestions might be good, but may I suggest
that you create a patch and an RFE that motivates why we should make
this code more generic to support small pages as well?
Thanks,
StefanK
>>
>> Functions would become simpler and also could be clearer named
>> (e.g. "shmat_at_address" and "shmat_with_alignment", respectivly).
>
> Maybe I should rename the functions to make it more obvious that
> these are large pages specific functions?
>
>>
>> ----
>>
>> This:
>>
>> 3402 if ((intptr_t)addr == -1) {
>> 3403 shm_warning_with_errno("Failed to attach shared memory.");
>> 3404 // Since we don't know if the kernel unmapped the
>> pre-reserved memory area
>> 3405 // we can't unmap it, since that would potentially unmap
>> memory that was
>> 3406 // mapped from other threads.
>> 3407 return NULL;
>> 3408 }
>>
>> seems scary. Means for every call this happens, we leak the
>> reserved (not committed) address space?
>
> Yes, that's unfortunate.
>
> An alternative would be to use this sequence:
> 1) Use anon_mmap_aligned to find a suitable VA range
> 2) Immediately unmap the VA range
> 3) Try to attach at that VA range _without_ SHM_REMAP
>
> That would remove the risk of leaking the reserved address space,
> but instead we risk failing at (3) if another thread manages to
> allocate memory inside the found VA range. This will cause some
> users to unnecessarily fail to get large pages, though. We've had
> other problems when pre-existing threads used mmap while we were
> initializing the VM. See: JDK-8007074.
>
>
> Yes; btw you also could do this with shmget/shmat instead of mmap.
>
> Note that similar unclean tricks are already done in other places, see
> e.g. the windows version of os::pd_split_reserved_memory(). Which
> deals with VirtualAlloc() being unable, like shmget, to deallocate
> piece-wise.
>
>
>
>> For most cases (anything but ENOMEM, actually) could we at least
>> assert?:
>>
>> EACCES - should not happen: we created the shared memory and are
>> its owner
>> EIDRM - should not happen.
>> EINVAL - should not happen. (you already check now the attach
>> address for alignment to SHMLBA, so this is covered)
>
> Sure. I'll add asserts for these.
>
>>
>> ---
>>
>> Smaller nits:
>>
>> Functions called "shmat_..." suggest shmat-like behaviour, so
>> could we have them return -1 instead of NULL in case of error?
>
> That would add clutter to the reserve_memory_special_shm, and it
> might also suggest that it would be OK to check errno for the
> failure reason, which probably wouldn't work. I'll let other
> Reviewers chime in and help decide if we should change this.
>
>
> You are right. If one returns -1, one would have to preserve errno for
> the caller too.
>
> Thanks for reviewing this,
> StefanK
>
>
> You are welcome!
>
> Kind Regards, Thomas
>
>
>
>>
>> Kind Regards, Thomas
>>
>>>
>>> Also note that mmap- and shmat-allocated memory may have
>>> different alignment requirements: mmap requires a
>>> page-aligned request address, whereas shmat requires
>>> alignment to SHMLBA, which may be multiple pages (e.g. for
>>> ARM:
>>> http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9).
>>> So, for this shat-over-mmap trick to work, request address
>>> has to be aligned to SHMLBA, not just page size.
>>>
>>> I see that you assert alignment of requ address to
>>> os::large_page_size(), which I would assume is a multiple of
>>> SHMLBA, but I am not sure of this.
>>
>> I've added some defensive code and asserts to catch this
>> if/when this assumption fails:
>>
>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02.delta/
>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02.delta/>
>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02
>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02>
>>
>> I need to verify that this works on other machines than my
>> local Linux x64 machine.
>>
>> Thanks,
>> StefanK
>>
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>> On Mon, Apr 11, 2016 at 1:03 PM, Stefan Karlsson
>>> <stefan.karlsson at oracle.com
>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>
>>> Hi all,
>>>
>>> Please review this patch to enable SHM large page
>>> allocations even when the requested alignment is larger
>>> than os::large_page_size().
>>>
>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.01
>>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.01>
>>> https://bugs.openjdk.java.net/browse/JDK-8017629
>>>
>>> G1 is affected by this bug since it requires the heap to
>>> start at an address that is aligned with the heap region
>>> size. The patch fixes this by changing the UseSHM large
>>> pages allocation code. First, virtual memory with
>>> correct alignment is pre-reserved and then the large
>>> pages are attached to this memory area.
>>>
>>> Tested with vm.gc.testlist and ExecuteInternaVMTests
>>>
>>> Thanks,
>>> StefanK
>>>
>>>
>>
>>
>
>
More information about the hotspot-dev
mailing list