RFR: 8017629: G1: UseSHM in combination with a G1HeapRegionSize > os::large_page_size() falls back to use small pages

Wed Apr 13 10:44:58 UTC 2016

Hi Stefan,

On Tue, Apr 12, 2016 at 5:41 PM, Stefan Karlsson <stefan.karlsson at oracle.com
> wrote:

> Hi Thomas,
>
>
> On 2016-04-12 16:23, Thomas Stüfe wrote:
>
> Hi Stefan,
>
>
> On Mon, Apr 11, 2016 at 3:52 PM, Stefan Karlsson <
> <stefan.karlsson at oracle.com>stefan.karlsson at oracle.com> wrote:
>
>> Hi Thomas,
>>
>> On 2016-04-11 14:39, Thomas Stüfe wrote:
>>
>> Hi Stefan,
>>
>> short question, why the mmap before the shmat? Why not shmat right away
>> at the requested address?
>>
>>
>> If we have a requested_address we do exactly what you propose.
>>
>>   if (req_addr == NULL && alignment > os::large_page_size()) {
>>     return shmat_with_large_alignment(shmid, bytes, alignment);
>>   } else {
>>     return shmat_with_normal_alignment(shmid, req_addr);
>>   }
>>
>> ...
>>
>> static char* shmat_with_normal_alignment(int shmid, char* req_addr) {
>>   char* addr = (char*)shmat(shmid, req_addr, 0);
>>
>>   if ((intptr_t)addr == -1) {
>>     shm_warning_with_errno("Failed to attach shared memory.");
>>     return NULL;
>>   }
>>
>>   return addr;
>> }
>>
>>
>> It's when you don't have a requested address that mmap is used to find a
>> large enough virtual memory area.
>>
>>
> Sorry, seems I did not look at this coding thoroughly enough. I understand
> now that you do mmap to allocate and then to cut away the extra
> pre-/post-space, something which would not be possible with shmat, which
> cannot be unmapped page-wise.
>
> But I am still not sure why we do it his way:
>
> 3429 static char* shmat_with_alignment(int shmid, size_t bytes, size_t
> alignment, char* req_addr) {
> 3430   // If there's no requested address, the shmat call can return
> memory that is not
> 3431   // 'alignment' aligned, if the given alignment is larger than the
> large page size.
> 3432   // Special care needs to be taken to ensure that we get aligned
> memory back.
> 3433   if (req_addr == NULL && alignment > os::large_page_size()) {
> 3434     return shmat_with_large_alignment(shmid, bytes, alignment);
> 3435   } else {
> 3436     return shmat_with_normal_alignment(shmid, req_addr);
> 3437   }
> 3438 }
>
> For req_addr==0 and big alignment, we attach at the given alignment
> ("shmat_with_large_alignment").
> For req_addr!=0, we attach at the given requested address
> ("shmat_with_normal_alignment").
> For req_addr==0 and smaller alignment, we ignore the alignment and attach
> anywhere?
>
> Maybe I am slow, but why does it matter if the alignment is large or
> small? Why not just distinguish between:
>
> 1) address given (req_addr!=0): in this case we attach at this req_addr
> and rely on the user having aligned the address properly for his purposes.
> We specify 0 for flags, so we will attach at exactly the given address or
> fail. In this case we could simply ignore the given alignment - if one was
> given - or just use it to counter-check the req_addr.
>
> 2) alignment given (req_addr==0 and alignment > 0): attach at the given
> alignment using mmap-before-shmat. This could be done for any alignment, be
> it large or small.
>
>
> What you propose doesn't work.
>
> We're allocating large pages with SHM_HUGETLB, and if we try to attach to
> an address that is not large_page_size aligned the shmat call returns
> EINVAL.
>
>
I was aware of this. What I meant was:

You have "shmat_with_large_alignment" which takes an alignment and does its
best to shmat with that alignment using the mmap trick. This coding does
not need to know anything about huge pages, and actually does not do
anything huge-pagey, apart from the asserts - it would just as well work
with small pages, because the only place where the code needs to know about
huge pages is in the layer above, in reserve_memory_special - where we
pass SHM_HUGETLB
to shmget. (Btw, I always wondered about the "reserve_memory_special"
naming.)

I think my point is that by renaming this to "shmat_with_alignment" and
removing the huge-page-related asserts the function would become both
simpler and more versatile and could be reused for small alignments as well
as large ones. The fact that it returns EINVAL for alignments instead of
asserting would not be a problem - we would return an error instead of
asserting because of bad alignment, and both handling this error and
asserting for huge-page-alignment could be handled better in
reserve_memory_special.

To put it another way, I think "shmat_with_large_alignment" does not need
to know about huge pages; this should be the responsibility of
reserve_memory_special.

About "shmat_with_normal_alignment", this is actually only a raw shmat call
and exists for the req_addr!=NULL case and for the case where we do not
pass neither req_addr nor alignment. So the only thing it does not handle
is alignment, so it is misnamed and also should not be called for the
req_addr==NULL-and-small-alignments-case.

>
> Functions would become simpler and also could be clearer named (e.g.
> "shmat_at_address" and "shmat_with_alignment", respectivly).
>
>
> Maybe I should rename the functions to make it more obvious that these are
> large pages specific functions?
>
>
> ----
>
> This:
>
> 3402   if ((intptr_t)addr == -1) {
> 3403     shm_warning_with_errno("Failed to attach shared memory.");
> 3404     // Since we don't know if the kernel unmapped the pre-reserved
> memory area
> 3405     // we can't unmap it, since that would potentially unmap memory
> that was
> 3406     // mapped from other threads.
> 3407     return NULL;
> 3408   }
>
> seems scary. Means for every call this happens, we leak the reserved (not
> committed) address space?
>
>
> Yes, that's unfortunate.
>
> An alternative would be to use this sequence:
> 1) Use anon_mmap_aligned to find a suitable VA range
> 2) Immediately unmap the VA range
> 3) Try to attach at that VA range _without_ SHM_REMAP
>
> That would remove the risk of leaking the reserved address space, but
> instead we risk failing at (3) if another thread manages to allocate memory
> inside the found VA range. This will cause some users to unnecessarily fail
> to get large pages, though. We've had other problems when pre-existing
> threads used mmap while we were initializing the VM. See: JDK-8007074.
>

Yes; btw you also could do this with shmget/shmat instead of mmap.

Note that similar unclean tricks are already done in other places, see e.g.
the windows version of os::pd_split_reserved_memory(). Which deals
with VirtualAlloc() being unable, like shmget, to deallocate piece-wise.

>
>
> For most cases (anything but ENOMEM, actually) could we at least assert?:
>
> EACCES - should not happen: we created the shared memory and are its owner
> EIDRM  - should not happen.
> EINVAL - should not happen. (you already check now the attach address for
> alignment to SHMLBA, so this is covered)
>
>
> Sure. I'll add asserts for these.
>
>
> ---
>
> Smaller nits:
>
> Functions called "shmat_..." suggest shmat-like behaviour, so could we
> have them return -1 instead of NULL in case of error?
>
>
> That would add clutter to the reserve_memory_special_shm, and it might
> also suggest that it would be OK to check errno for the failure reason,
> which probably wouldn't work. I'll let other Reviewers chime in and help
> decide if we should change this.
>
>
You are right. If one returns -1, one would have to preserve errno for the
caller too.

> Thanks for reviewing this,
> StefanK
>

You are welcome!

Kind Regards, Thomas

>
>
>
> Kind Regards, Thomas
>
>>
>> Also note that mmap- and shmat-allocated memory may have different
>> alignment requirements: mmap requires a page-aligned request address,
>> whereas shmat requires alignment to SHMLBA, which may be multiple pages
>> (e.g. for ARM:
>> <http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9>
>> http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9).
>> So, for this shat-over-mmap trick to work, request address has to be
>> aligned to SHMLBA, not just page size.
>>
>> I see that you assert alignment of requ address to os::large_page_size(),
>> which I would assume is a multiple of SHMLBA, but I am not sure of this.
>>
>>
>> I've added some defensive code and asserts to catch this if/when this
>> assumption fails:
>>
>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02.delta/
>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02
>>
>> I need to verify that this works on other machines than my local Linux
>> x64 machine.
>>
>> Thanks,
>> StefanK
>>
>>
>> Kind Regards, Thomas
>>
>>
>>
>> On Mon, Apr 11, 2016 at 1:03 PM, Stefan Karlsson <
>> <stefan.karlsson at oracle.com>stefan.karlsson at oracle.com> wrote:
>>
>>> Hi all,
>>>
>>> Please review this patch to enable SHM large page allocations even when
>>> the requested alignment is larger than os::large_page_size().
>>>
>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.01
>>> https://bugs.openjdk.java.net/browse/JDK-8017629
>>>
>>> G1 is affected by this bug since it requires the heap to start at an
>>> address that is aligned with the heap region size. The patch fixes this by
>>> changing the UseSHM large pages allocation code. First, virtual memory with
>>> correct alignment is pre-reserved and then the large pages are attached to
>>> this memory area.
>>>
>>> Tested with vm.gc.testlist and ExecuteInternaVMTests
>>>
>>> Thanks,
>>> StefanK
>>>
>>
>>
>>
>
>