RFR: 8017629: G1: UseSHM in combination with a G1HeapRegionSize > os::large_page_size() falls back to use small pages

Wed Apr 13 13:59:44 UTC 2016

Hi Stefan,

On Wed, Apr 13, 2016 at 1:23 PM, Stefan Karlsson <stefan.karlsson at oracle.com
> wrote:

> Hi Thomas,
>
>
> On 2016-04-13 12:44, Thomas Stüfe wrote:
>
> Hi Stefan,
>
> On Tue, Apr 12, 2016 at 5:41 PM, Stefan Karlsson <
> <stefan.karlsson at oracle.com>stefan.karlsson at oracle.com> wrote:
>
>> Hi Thomas,
>>
>>
>> On 2016-04-12 16:23, Thomas Stüfe wrote:
>>
>> Hi Stefan,
>>
>>
>> On Mon, Apr 11, 2016 at 3:52 PM, Stefan Karlsson <
>> <stefan.karlsson at oracle.com>stefan.karlsson at oracle.com> wrote:
>>
>>> Hi Thomas,
>>>
>>> On 2016-04-11 14:39, Thomas Stüfe wrote:
>>>
>>> Hi Stefan,
>>>
>>> short question, why the mmap before the shmat? Why not shmat right away
>>> at the requested address?
>>>
>>>
>>> If we have a requested_address we do exactly what you propose.
>>>
>>>   if (req_addr == NULL && alignment > os::large_page_size()) {
>>>     return shmat_with_large_alignment(shmid, bytes, alignment);
>>>   } else {
>>>     return shmat_with_normal_alignment(shmid, req_addr);
>>>   }
>>>
>>> ...
>>>
>>> static char* shmat_with_normal_alignment(int shmid, char* req_addr) {
>>>   char* addr = (char*)shmat(shmid, req_addr, 0);
>>>
>>>   if ((intptr_t)addr == -1) {
>>>     shm_warning_with_errno("Failed to attach shared memory.");
>>>     return NULL;
>>>   }
>>>
>>>   return addr;
>>> }
>>>
>>>
>>> It's when you don't have a requested address that mmap is used to find a
>>> large enough virtual memory area.
>>>
>>>
>> Sorry, seems I did not look at this coding thoroughly enough. I
>> understand now that you do mmap to allocate and then to cut away the extra
>> pre-/post-space, something which would not be possible with shmat, which
>> cannot be unmapped page-wise.
>>
>> But I am still not sure why we do it his way:
>>
>> 3429 static char* shmat_with_alignment(int shmid, size_t bytes, size_t
>> alignment, char* req_addr) {
>> 3430   // If there's no requested address, the shmat call can return
>> memory that is not
>> 3431   // 'alignment' aligned, if the given alignment is larger than the
>> large page size.
>> 3432   // Special care needs to be taken to ensure that we get aligned
>> memory back.
>> 3433   if (req_addr == NULL && alignment > os::large_page_size()) {
>> 3434     return shmat_with_large_alignment(shmid, bytes, alignment);
>> 3435   } else {
>> 3436     return shmat_with_normal_alignment(shmid, req_addr);
>> 3437   }
>> 3438 }
>>
>> For req_addr==0 and big alignment, we attach at the given alignment
>> ("shmat_with_large_alignment").
>> For req_addr!=0, we attach at the given requested address
>> ("shmat_with_normal_alignment").
>> For req_addr==0 and smaller alignment, we ignore the alignment and attach
>> anywhere?
>>
>> Maybe I am slow, but why does it matter if the alignment is large or
>> small? Why not just distinguish between:
>>
>> 1) address given (req_addr!=0): in this case we attach at this req_addr
>> and rely on the user having aligned the address properly for his purposes.
>> We specify 0 for flags, so we will attach at exactly the given address or
>> fail. In this case we could simply ignore the given alignment - if one was
>> given - or just use it to counter-check the req_addr.
>>
>> 2) alignment given (req_addr==0 and alignment > 0): attach at the given
>> alignment using mmap-before-shmat. This could be done for any alignment, be
>> it large or small.
>>
>>
>> What you propose doesn't work.
>>
>> We're allocating large pages with SHM_HUGETLB, and if we try to attach to
>> an address that is not large_page_size aligned the shmat call returns
>> EINVAL.
>>
>>
> I was aware of this. What I meant was:
>
> You have "shmat_with_large_alignment" which takes an alignment and does
> its best to shmat with that alignment using the mmap trick. This coding
> does not need to know anything about huge pages, and actually does not do
> anything huge-pagey, apart from the asserts - it would just as well work
> with small pages, because the only place where the code needs to know about
> huge pages is in the layer above, in reserve_memory_special - where we pass SHM_HUGETLB
> to shmget. (Btw, I always wondered about the "reserve_memory_special"
> naming.)
>
> I think my point is that by renaming this to "shmat_with_alignment" and
> removing the huge-page-related asserts the function would become both
> simpler and more versatile and could be reused for small alignments as well
> as large ones. The fact that it returns EINVAL for alignments instead of
> asserting would not be a problem - we would return an error instead of
> asserting because of bad alignment, and both handling this error and
> asserting for huge-page-alignment could be handled better in
> reserve_memory_special.
>
> To put it another way, I think "shmat_with_large_alignment" does not need
> to know about huge pages; this should be the responsibility of
> reserve_memory_special.
>
> About "shmat_with_normal_alignment", this is actually only a raw shmat
> call and exists for the req_addr!=NULL case and for the case where we do
> not pass neither req_addr nor alignment. So the only thing it does not
> handle is alignment, so it is misnamed and also should not be called for
> the req_addr==NULL-and-small-alignments-case.
>
>
> The reserve_memory_special_shm function and the associated helper
> functions I'm adding are specifically written to support large pages
> allocations. The names "normal_alignment" and "large_alignment" are
> intended to refer to alignment sizes compared to the large pages size. I
> grant you that it's not obvious from the name, and we can rename them to
> make it more clear.
>
> I want to provide a small bug fix for this large pages bug, while you are
> suggesting that we re-purpose the code into supporting small page
> allocations as well. Your suggestions might be good, but may I suggest that
> you create a patch and an RFE that motivates why we should make this code
> more generic to support small pages as well?
>
> Thanks,
> StefanK
>
>
Ok, we can do that. I was just worried that the code becomes more difficult
to understand. But lets wait for some more reviews.

Kind Regards, Thomas

>
>
>
>>
>> Functions would become simpler and also could be clearer named (e.g.
>> "shmat_at_address" and "shmat_with_alignment", respectivly).
>>
>>
>> Maybe I should rename the functions to make it more obvious that these
>> are large pages specific functions?
>>
>>
>> ----
>>
>> This:
>>
>> 3402   if ((intptr_t)addr == -1) {
>> 3403     shm_warning_with_errno("Failed to attach shared memory.");
>> 3404     // Since we don't know if the kernel unmapped the pre-reserved
>> memory area
>> 3405     // we can't unmap it, since that would potentially unmap memory
>> that was
>> 3406     // mapped from other threads.
>> 3407     return NULL;
>> 3408   }
>>
>> seems scary. Means for every call this happens, we leak the reserved (not
>> committed) address space?
>>
>>
>> Yes, that's unfortunate.
>>
>> An alternative would be to use this sequence:
>> 1) Use anon_mmap_aligned to find a suitable VA range
>> 2) Immediately unmap the VA range
>> 3) Try to attach at that VA range _without_ SHM_REMAP
>>
>> That would remove the risk of leaking the reserved address space, but
>> instead we risk failing at (3) if another thread manages to allocate memory
>> inside the found VA range. This will cause some users to unnecessarily fail
>> to get large pages, though. We've had other problems when pre-existing
>> threads used mmap while we were initializing the VM. See: JDK-8007074.
>>
>
> Yes; btw you also could do this with shmget/shmat instead of mmap.
>
> Note that similar unclean tricks are already done in other places, see
> e.g. the windows version of os::pd_split_reserved_memory(). Which deals
> with VirtualAlloc() being unable, like shmget, to deallocate piece-wise.
>
>
>>
>>
>> For most cases (anything but ENOMEM, actually) could we at least assert?:
>>
>> EACCES - should not happen: we created the shared memory and are its owner
>> EIDRM  - should not happen.
>> EINVAL - should not happen. (you already check now the attach address for
>> alignment to SHMLBA, so this is covered)
>>
>>
>> Sure. I'll add asserts for these.
>>
>>
>> ---
>>
>> Smaller nits:
>>
>> Functions called "shmat_..." suggest shmat-like behaviour, so could we
>> have them return -1 instead of NULL in case of error?
>>
>>
>> That would add clutter to the reserve_memory_special_shm, and it might
>> also suggest that it would be OK to check errno for the failure reason,
>> which probably wouldn't work. I'll let other Reviewers chime in and help
>> decide if we should change this.
>>
>>
> You are right. If one returns -1, one would have to preserve errno for the
> caller too.
>
>
>> Thanks for reviewing this,
>> StefanK
>>
>
> You are welcome!
>
> Kind Regards, Thomas
>
>
>>
>>
>>
>> Kind Regards, Thomas
>>
>>>
>>> Also note that mmap- and shmat-allocated memory may have different
>>> alignment requirements: mmap requires a page-aligned request address,
>>> whereas shmat requires alignment to SHMLBA, which may be multiple pages
>>> (e.g. for ARM:
>>> <http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9>
>>> http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9).
>>> So, for this shat-over-mmap trick to work, request address has to be
>>> aligned to SHMLBA, not just page size.
>>>
>>> I see that you assert alignment of requ address to
>>> os::large_page_size(), which I would assume is a multiple of SHMLBA, but I
>>> am not sure of this.
>>>
>>>
>>> I've added some defensive code and asserts to catch this if/when this
>>> assumption fails:
>>>
>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02.delta/
>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02
>>>
>>> I need to verify that this works on other machines than my local Linux
>>> x64 machine.
>>>
>>> Thanks,
>>> StefanK
>>>
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>> On Mon, Apr 11, 2016 at 1:03 PM, Stefan Karlsson <
>>> <stefan.karlsson at oracle.com>stefan.karlsson at oracle.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Please review this patch to enable SHM large page allocations even when
>>>> the requested alignment is larger than os::large_page_size().
>>>>
>>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.01
>>>> https://bugs.openjdk.java.net/browse/JDK-8017629
>>>>
>>>> G1 is affected by this bug since it requires the heap to start at an
>>>> address that is aligned with the heap region size. The patch fixes this by
>>>> changing the UseSHM large pages allocation code. First, virtual memory with
>>>> correct alignment is pre-reserved and then the large pages are attached to
>>>> this memory area.
>>>>
>>>> Tested with vm.gc.testlist and ExecuteInternaVMTests
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>
>>>
>>>
>>
>>
>
>