RFR: 8017629: G1: UseSHM in combination with a G1HeapRegionSize > os::large_page_size() falls back to use small pages

Wed Apr 13 11:23:49 UTC 2016

Hi Thomas,

On 2016-04-13 12:44, Thomas Stüfe wrote:
> Hi Stefan,
>
> On Tue, Apr 12, 2016 at 5:41 PM, Stefan Karlsson 
> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>
>     Hi Thomas,
>
>
>     On 2016-04-12 16:23, Thomas Stüfe wrote:
>>     Hi Stefan,
>>
>>
>>     On Mon, Apr 11, 2016 at 3:52 PM, Stefan Karlsson
>>     <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>>     wrote:
>>
>>         Hi Thomas,
>>
>>         On 2016-04-11 14:39, Thomas Stüfe wrote:
>>>         Hi Stefan,
>>>
>>>         short question, why the mmap before the shmat? Why not
>>>         shmat right away at the requested address?
>>
>>         If we have a requested_address we do exactly what you propose.
>>
>>           if (req_addr == NULL && alignment > os::large_page_size()) {
>>             return shmat_with_large_alignment(shmid, bytes, alignment);
>>           } else {
>>             return shmat_with_normal_alignment(shmid, req_addr);
>>           }
>>
>>         ...
>>
>>         static char* shmat_with_normal_alignment(int shmid, char*
>>         req_addr) {
>>           char* addr = (char*)shmat(shmid, req_addr, 0);
>>
>>           if ((intptr_t)addr == -1) {
>>             shm_warning_with_errno("Failed to attach shared memory.");
>>             return NULL;
>>           }
>>
>>           return addr;
>>         }
>>
>>
>>         It's when you don't have a requested address that mmap is
>>         used to find a large enough virtual memory area.
>>
>>
>>     Sorry, seems I did not look at this coding thoroughly enough. I
>>     understand now that you do mmap to allocate and then to cut away
>>     the extra pre-/post-space, something which would not be possible
>>     with shmat, which cannot be unmapped page-wise.
>>
>>     But I am still not sure why we do it his way:
>>
>>     3429 static char* shmat_with_alignment(int shmid, size_t bytes,
>>     size_t alignment, char* req_addr) {
>>     3430   // If there's no requested address, the shmat call can
>>     return memory that is not
>>     3431   // 'alignment' aligned, if the given alignment is larger
>>     than the large page size.
>>     3432   // Special care needs to be taken to ensure that we get
>>     aligned memory back.
>>     3433   if (req_addr == NULL && alignment > os::large_page_size()) {
>>     3434     return shmat_with_large_alignment(shmid, bytes, alignment);
>>     3435   } else {
>>     3436     return shmat_with_normal_alignment(shmid, req_addr);
>>     3437   }
>>     3438 }
>>
>>     For req_addr==0 and big alignment, we attach at the given
>>     alignment ("shmat_with_large_alignment").
>>     For req_addr!=0, we attach at the given requested address
>>     ("shmat_with_normal_alignment").
>>     For req_addr==0 and smaller alignment, we ignore the alignment
>>     and attach anywhere?
>>
>>     Maybe I am slow, but why does it matter if the alignment is large
>>     or small? Why not just distinguish between:
>>
>>     1) address given (req_addr!=0): in this case we attach at this
>>     req_addr and rely on the user having aligned the address properly
>>     for his purposes. We specify 0 for flags, so we will attach at
>>     exactly the given address or fail. In this case we could simply
>>     ignore the given alignment - if one was given - or just use it to
>>     counter-check the req_addr.
>>
>>     2) alignment given (req_addr==0 and alignment > 0): attach at the
>>     given alignment using mmap-before-shmat. This could be done for
>>     any alignment, be it large or small.
>
>     What you propose doesn't work.
>
>     We're allocating large pages with SHM_HUGETLB, and if we try to
>     attach to an address that is not large_page_size aligned the shmat
>     call returns EINVAL.
>
>
> I was aware of this. What I meant was:
>
> You have "shmat_with_large_alignment" which takes an alignment and 
> does its best to shmat with that alignment using the mmap trick. This 
> coding does not need to know anything about huge pages, and actually 
> does not do anything huge-pagey, apart from the asserts - it would 
> just as well work with small pages, because the only place where the 
> code needs to know about huge pages is in the layer above, in 
> reserve_memory_special - where we pass SHM_HUGETLB to shmget. (Btw, I 
> always wondered about the "reserve_memory_special" naming.)
>
> I think my point is that by renaming this to "shmat_with_alignment" 
> and removing the huge-page-related asserts the function would become 
> both simpler and more versatile and could be reused for small 
> alignments as well as large ones. The fact that it returns EINVAL for 
> alignments instead of asserting would not be a problem - we would 
> return an error instead of asserting because of bad alignment, and 
> both handling this error and asserting for huge-page-alignment could 
> be handled better in reserve_memory_special.
>
> To put it another way, I think "shmat_with_large_alignment" does not 
> need to know about huge pages; this should be the responsibility of 
> reserve_memory_special.
>
> About "shmat_with_normal_alignment", this is actually only a raw shmat 
> call and exists for the req_addr!=NULL case and for the case where we 
> do not pass neither req_addr nor alignment. So the only thing it does 
> not handle is alignment, so it is misnamed and also should not be 
> called for the req_addr==NULL-and-small-alignments-case.

The reserve_memory_special_shm function and the associated helper 
functions I'm adding are specifically written to support large pages 
allocations. The names "normal_alignment" and "large_alignment" are 
intended to refer to alignment sizes compared to the large pages size. I 
grant you that it's not obvious from the name, and we can rename them to 
make it more clear.

I want to provide a small bug fix for this large pages bug, while you 
are suggesting that we re-purpose the code into supporting small page 
allocations as well. Your suggestions might be good, but may I suggest 
that you create a patch and an RFE that motivates why we should make 
this code more generic to support small pages as well?

Thanks,
StefanK

>>
>>     Functions would become simpler and also could be clearer named
>>     (e.g. "shmat_at_address" and "shmat_with_alignment", respectivly).
>
>     Maybe I should rename the functions to make it more obvious that
>     these are large pages specific functions?
>
>>
>>     ----
>>
>>     This:
>>
>>     3402   if ((intptr_t)addr == -1) {
>>     3403     shm_warning_with_errno("Failed to attach shared memory.");
>>     3404     // Since we don't know if the kernel unmapped the
>>     pre-reserved memory area
>>     3405     // we can't unmap it, since that would potentially unmap
>>     memory that was
>>     3406     // mapped from other threads.
>>     3407     return NULL;
>>     3408   }
>>
>>     seems scary. Means for every call this happens, we leak the
>>     reserved (not committed) address space?
>
>     Yes, that's unfortunate.
>
>     An alternative would be to use this sequence:
>     1) Use anon_mmap_aligned to find a suitable VA range
>     2) Immediately unmap the VA range
>     3) Try to attach at that VA range _without_ SHM_REMAP
>
>     That would remove the risk of leaking the reserved address space,
>     but instead we risk failing at (3) if another thread manages to
>     allocate memory inside the found VA range. This will cause some
>     users to unnecessarily fail to get large pages, though. We've had
>     other problems when pre-existing threads used mmap while we were
>     initializing the VM. See: JDK-8007074.
>
>
> Yes; btw you also could do this with shmget/shmat instead of mmap.
>
> Note that similar unclean tricks are already done in other places, see 
> e.g. the windows version of os::pd_split_reserved_memory(). Which 
> deals with VirtualAlloc() being unable, like shmget, to deallocate 
> piece-wise.
>
>
>
>>     For most cases (anything but ENOMEM, actually) could we at least
>>     assert?:
>>
>>     EACCES - should not happen: we created the shared memory and are
>>     its owner
>>     EIDRM  - should not happen.
>>     EINVAL - should not happen. (you already check now the attach
>>     address for alignment to SHMLBA, so this is covered)
>
>     Sure. I'll add asserts for these.
>
>>
>>     ---
>>
>>     Smaller nits:
>>
>>     Functions called "shmat_..." suggest shmat-like behaviour, so
>>     could we have them return -1 instead of NULL in case of error?
>
>     That would add clutter to the reserve_memory_special_shm, and it
>     might also suggest that it would be OK to check errno for the
>     failure reason, which probably wouldn't work. I'll let other
>     Reviewers chime in and help decide if we should change this.
>
>
> You are right. If one returns -1, one would have to preserve errno for 
> the caller too.
>
>     Thanks for reviewing this,
>     StefanK
>
>
> You are welcome!
>
> Kind Regards, Thomas
>
>
>
>>
>>     Kind Regards, Thomas
>>
>>>
>>>         Also note that mmap- and shmat-allocated memory may have
>>>         different alignment requirements: mmap requires a
>>>         page-aligned request address, whereas shmat requires
>>>         alignment to SHMLBA, which may be multiple pages (e.g. for
>>>         ARM:
>>>         http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9).
>>>         So, for this shat-over-mmap trick to work, request address
>>>         has to be aligned to SHMLBA, not just page size.
>>>
>>>         I see that you assert alignment of requ address to
>>>         os::large_page_size(), which I would assume is a multiple of
>>>         SHMLBA, but I am not sure of this.
>>
>>         I've added some defensive code and asserts to catch this
>>         if/when this assumption fails:
>>
>>         http://cr.openjdk.java.net/~stefank/8017629/webrev.02.delta/
>>         <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02.delta/>
>>         http://cr.openjdk.java.net/~stefank/8017629/webrev.02
>>         <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02>
>>
>>         I need to verify that this works on other machines than my
>>         local Linux x64 machine.
>>
>>         Thanks,
>>         StefanK
>>
>>>
>>>         Kind Regards, Thomas
>>>
>>>
>>>
>>>         On Mon, Apr 11, 2016 at 1:03 PM, Stefan Karlsson
>>>         <stefan.karlsson at oracle.com
>>>         <mailto:stefan.karlsson at oracle.com>> wrote:
>>>
>>>             Hi all,
>>>
>>>             Please review this patch to enable SHM large page
>>>             allocations even when the requested alignment is larger
>>>             than os::large_page_size().
>>>
>>>             http://cr.openjdk.java.net/~stefank/8017629/webrev.01
>>>             <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.01>
>>>             https://bugs.openjdk.java.net/browse/JDK-8017629
>>>
>>>             G1 is affected by this bug since it requires the heap to
>>>             start at an address that is aligned with the heap region
>>>             size. The patch fixes this by changing the UseSHM large
>>>             pages allocation code. First, virtual memory with
>>>             correct alignment is pre-reserved and then the large
>>>             pages are attached to this memory area.
>>>
>>>             Tested with vm.gc.testlist and ExecuteInternaVMTests
>>>
>>>             Thanks,
>>>             StefanK
>>>
>>>
>>
>>
>
>