RFR: 8017629: G1: UseSHM in combination with a G1HeapRegionSize > os::large_page_size() falls back to use small pages

Wed Apr 20 11:37:12 UTC 2016

Thanks, StefanJ.

StefanK

On 2016-04-20 13:36, Stefan Johansson wrote:
> Hi StefanK,
>
> On 2016-04-18 12:04, Stefan Karlsson wrote:
>> Hi Thomas,
>>
>> I discussed the code with Per and updated the names and changed the 
>> code slightly.
>>
>> http://cr.openjdk.java.net/~stefank/8017629/webrev.03.delta
>> http://cr.openjdk.java.net/~stefank/8017629/webrev.03
>>
> Looks good,
>
> StefanJ
>> 1) shmat_with_large_alignment was renamed to shmat_with_alignment and 
>> all references to large pages were removed.
>>
>> 2) shmat_with_normal_alignment was renamed to shmat_at_address and 
>> all references to pages sizes were removed.
>>
>> 3) shmat_with_alignment was renamed to shmat_large_pages and all 
>> large pages specific code were kept in that function.
>>
>> 4) shmat_large_pages was restructured to have one section for the 
>> req_addr != NULL case, and another section for req_addr == NULL. I 
>> know that you suggested to call shmat_with_alignment (previously 
>> shmat_with_normal_alignment) for both cases in the req_addr == NULL 
>> section, but I would like to only have to use shmat_with_alignment 
>> when it's really necessary.
>>
>> Thanks,
>> StefanK
>>
>> On 2016-04-13 15:59, Thomas Stüfe wrote:
>>> Hi Stefan,
>>>
>>> On Wed, Apr 13, 2016 at 1:23 PM, Stefan Karlsson 
>>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>>>
>>>     Hi Thomas,
>>>
>>>
>>>     On 2016-04-13 12:44, Thomas Stüfe wrote:
>>>>     Hi Stefan,
>>>>
>>>>     On Tue, Apr 12, 2016 at 5:41 PM, Stefan Karlsson
>>>>     <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>>>>     wrote:
>>>>
>>>>         Hi Thomas,
>>>>
>>>>
>>>>         On 2016-04-12 16:23, Thomas Stüfe wrote:
>>>>>         Hi Stefan,
>>>>>
>>>>>
>>>>>         On Mon, Apr 11, 2016 at 3:52 PM, Stefan Karlsson
>>>>>         <stefan.karlsson at oracle.com
>>>>>         <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>>
>>>>>             Hi Thomas,
>>>>>
>>>>>             On 2016-04-11 14:39, Thomas Stüfe wrote:
>>>>>>             Hi Stefan,
>>>>>>
>>>>>>             short question, why the mmap before the shmat? Why not
>>>>>>             shmat right away at the requested address?
>>>>>
>>>>>             If we have a requested_address we do exactly what you
>>>>>             propose.
>>>>>
>>>>>               if (req_addr == NULL && alignment >
>>>>>             os::large_page_size()) {
>>>>>                 return shmat_with_large_alignment(shmid, bytes,
>>>>>             alignment);
>>>>>               } else {
>>>>>                 return shmat_with_normal_alignment(shmid, req_addr);
>>>>>               }
>>>>>
>>>>>             ...
>>>>>
>>>>>             static char* shmat_with_normal_alignment(int shmid,
>>>>>             char* req_addr) {
>>>>>               char* addr = (char*)shmat(shmid, req_addr, 0);
>>>>>
>>>>>               if ((intptr_t)addr == -1) {
>>>>>             shm_warning_with_errno("Failed to attach shared 
>>>>> memory.");
>>>>>                 return NULL;
>>>>>               }
>>>>>
>>>>>               return addr;
>>>>>             }
>>>>>
>>>>>
>>>>>             It's when you don't have a requested address that mmap
>>>>>             is used to find a large enough virtual memory area.
>>>>>
>>>>>
>>>>>         Sorry, seems I did not look at this coding thoroughly
>>>>>         enough. I understand now that you do mmap to allocate and
>>>>>         then to cut away the extra pre-/post-space, something which
>>>>>         would not be possible with shmat, which cannot be unmapped
>>>>>         page-wise.
>>>>>
>>>>>         But I am still not sure why we do it his way:
>>>>>
>>>>>         3429 static char* shmat_with_alignment(int shmid, size_t
>>>>>         bytes, size_t alignment, char* req_addr) {
>>>>>         3430   // If there's no requested address, the shmat call
>>>>>         can return memory that is not
>>>>>         3431   // 'alignment' aligned, if the given alignment is
>>>>>         larger than the large page size.
>>>>>         3432   // Special care needs to be taken to ensure that we
>>>>>         get aligned memory back.
>>>>>         3433   if (req_addr == NULL && alignment >
>>>>>         os::large_page_size()) {
>>>>>         3434     return shmat_with_large_alignment(shmid, bytes,
>>>>>         alignment);
>>>>>         3435   } else {
>>>>>         3436     return shmat_with_normal_alignment(shmid, req_addr);
>>>>>         3437   }
>>>>>         3438 }
>>>>>
>>>>>         For req_addr==0 and big alignment, we attach at the given
>>>>>         alignment ("shmat_with_large_alignment").
>>>>>         For req_addr!=0, we attach at the given requested address
>>>>>         ("shmat_with_normal_alignment").
>>>>>         For req_addr==0 and smaller alignment, we ignore the
>>>>>         alignment and attach anywhere?
>>>>>
>>>>>         Maybe I am slow, but why does it matter if the alignment is
>>>>>         large or small? Why not just distinguish between:
>>>>>
>>>>>         1) address given (req_addr!=0): in this case we attach at
>>>>>         this req_addr and rely on the user having aligned the
>>>>>         address properly for his purposes. We specify 0 for flags,
>>>>>         so we will attach at exactly the given address or fail. In
>>>>>         this case we could simply ignore the given alignment - if
>>>>>         one was given - or just use it to counter-check the req_addr.
>>>>>
>>>>>         2) alignment given (req_addr==0 and alignment > 0): attach
>>>>>         at the given alignment using mmap-before-shmat. This could
>>>>>         be done for any alignment, be it large or small.
>>>>
>>>>         What you propose doesn't work.
>>>>
>>>>         We're allocating large pages with SHM_HUGETLB, and if we try
>>>>         to attach to an address that is not large_page_size aligned
>>>>         the shmat call returns EINVAL.
>>>>
>>>>
>>>>     I was aware of this. What I meant was:
>>>>
>>>>     You have "shmat_with_large_alignment" which takes an alignment
>>>>     and does its best to shmat with that alignment using the mmap
>>>>     trick. This coding does not need to know anything about huge
>>>>     pages, and actually does not do anything huge-pagey, apart from
>>>>     the asserts - it would just as well work with small pages,
>>>>     because the only place where the code needs to know about huge
>>>>     pages is in the layer above, in reserve_memory_special - where we
>>>>     pass SHM_HUGETLB to shmget. (Btw, I always wondered about the
>>>>     "reserve_memory_special" naming.)
>>>>
>>>>     I think my point is that by renaming this to
>>>>     "shmat_with_alignment" and removing the huge-page-related asserts
>>>>     the function would become both simpler and more versatile and
>>>>     could be reused for small alignments as well as large ones. The
>>>>     fact that it returns EINVAL for alignments instead of asserting
>>>>     would not be a problem - we would return an error instead of
>>>>     asserting because of bad alignment, and both handling this error
>>>>     and asserting for huge-page-alignment could be handled better in
>>>>     reserve_memory_special.
>>>>
>>>>     To put it another way, I think "shmat_with_large_alignment" does
>>>>     not need to know about huge pages; this should be the
>>>>     responsibility of reserve_memory_special.
>>>>
>>>>     About "shmat_with_normal_alignment", this is actually only a raw
>>>>     shmat call and exists for the req_addr!=NULL case and for the
>>>>     case where we do not pass neither req_addr nor alignment. So the
>>>>     only thing it does not handle is alignment, so it is misnamed and
>>>>     also should not be called for the
>>>>     req_addr==NULL-and-small-alignments-case.
>>>
>>>     The reserve_memory_special_shm function and the associated helper
>>>     functions I'm adding are specifically written to support large
>>>     pages allocations. The names "normal_alignment" and
>>>     "large_alignment" are intended to refer to alignment sizes
>>>     compared to the large pages size. I grant you that it's not
>>>     obvious from the name, and we can rename them to make it more 
>>> clear.
>>>
>>>     I want to provide a small bug fix for this large pages bug, while
>>>     you are suggesting that we re-purpose the code into supporting
>>>     small page allocations as well. Your suggestions might be good,
>>>     but may I suggest that you create a patch and an RFE that
>>>     motivates why we should make this code more generic to support
>>>     small pages as well?
>>>
>>>     Thanks,
>>>     StefanK
>>>
>>>
>>> Ok, we can do that. I was just worried that the code becomes more 
>>> difficult to understand. But lets wait for some more reviews.
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>>>
>>>>>         Functions would become simpler and also could be clearer
>>>>>         named (e.g. "shmat_at_address" and "shmat_with_alignment",
>>>>>         respectivly).
>>>>
>>>>         Maybe I should rename the functions to make it more obvious
>>>>         that these are large pages specific functions?
>>>>
>>>>>
>>>>>         ----
>>>>>
>>>>>         This:
>>>>>
>>>>>         3402   if ((intptr_t)addr == -1) {
>>>>>         3403 shm_warning_with_errno("Failed to attach shared 
>>>>> memory.");
>>>>>         3404     // Since we don't know if the kernel unmapped the
>>>>>         pre-reserved memory area
>>>>>         3405     // we can't unmap it, since that would potentially
>>>>>         unmap memory that was
>>>>>         3406     // mapped from other threads.
>>>>>         3407     return NULL;
>>>>>         3408   }
>>>>>
>>>>>         seems scary. Means for every call this happens, we leak the
>>>>>         reserved (not committed) address space?
>>>>
>>>>         Yes, that's unfortunate.
>>>>
>>>>         An alternative would be to use this sequence:
>>>>         1) Use anon_mmap_aligned to find a suitable VA range
>>>>         2) Immediately unmap the VA range
>>>>         3) Try to attach at that VA range _without_ SHM_REMAP
>>>>
>>>>         That would remove the risk of leaking the reserved address
>>>>         space, but instead we risk failing at (3) if another thread
>>>>         manages to allocate memory inside the found VA range. This
>>>>         will cause some users to unnecessarily fail to get large
>>>>         pages, though. We've had other problems when pre-existing
>>>>         threads used mmap while we were initializing the VM. See:
>>>>         JDK-8007074.
>>>>
>>>>
>>>>     Yes; btw you also could do this with shmget/shmat instead of mmap.
>>>>
>>>>     Note that similar unclean tricks are already done in other
>>>>     places, see e.g. the windows version of
>>>>     os::pd_split_reserved_memory(). Which deals with VirtualAlloc()
>>>>     being unable, like shmget, to deallocate piece-wise.
>>>>
>>>>
>>>>
>>>>>         For most cases (anything but ENOMEM, actually) could we at
>>>>>         least assert?:
>>>>>
>>>>>         EACCES - should not happen: we created the shared memory and
>>>>>         are its owner
>>>>>         EIDRM  - should not happen.
>>>>>         EINVAL - should not happen. (you already check now the
>>>>>         attach address for alignment to SHMLBA, so this is covered)
>>>>
>>>>         Sure. I'll add asserts for these.
>>>>
>>>>>
>>>>>         ---
>>>>>
>>>>>         Smaller nits:
>>>>>
>>>>>         Functions called "shmat_..." suggest shmat-like behaviour,
>>>>>         so could we have them return -1 instead of NULL in case of
>>>>>         error?
>>>>
>>>>         That would add clutter to the reserve_memory_special_shm, and
>>>>         it might also suggest that it would be OK to check errno for
>>>>         the failure reason, which probably wouldn't work. I'll let
>>>>         other Reviewers chime in and help decide if we should change
>>>>         this.
>>>>
>>>>
>>>>     You are right. If one returns -1, one would have to preserve
>>>>     errno for the caller too.
>>>>
>>>>         Thanks for reviewing this,
>>>>         StefanK
>>>>
>>>>
>>>>     You are welcome!
>>>>
>>>>     Kind Regards, Thomas
>>>>
>>>>
>>>>
>>>>>
>>>>>         Kind Regards, Thomas
>>>>>
>>>>>>
>>>>>>             Also note that mmap- and shmat-allocated memory may
>>>>>>             have different alignment requirements: mmap requires a
>>>>>>             page-aligned request address, whereas shmat requires
>>>>>>             alignment to SHMLBA, which may be multiple pages (e.g.
>>>>>>             for ARM:
>>>>>> http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9). 
>>>>>>
>>>>>>             So, for this shat-over-mmap trick to work, request
>>>>>>             address has to be aligned to SHMLBA, not just page size.
>>>>>>
>>>>>>             I see that you assert alignment of requ address to
>>>>>>             os::large_page_size(), which I would assume is a
>>>>>>             multiple of SHMLBA, but I am not sure of this.
>>>>>
>>>>>             I've added some defensive code and asserts to catch this
>>>>>             if/when this assumption fails:
>>>>>
>>>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02.delta/
>>>>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02.delta/>
>>>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02
>>>>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02>
>>>>>
>>>>>             I need to verify that this works on other machines than
>>>>>             my local Linux x64 machine.
>>>>>
>>>>>             Thanks,
>>>>>             StefanK
>>>>>
>>>>>>
>>>>>>             Kind Regards, Thomas
>>>>>>
>>>>>>
>>>>>>
>>>>>>             On Mon, Apr 11, 2016 at 1:03 PM, Stefan Karlsson
>>>>>>             <stefan.karlsson at oracle.com
>>>>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>>>
>>>>>>                 Hi all,
>>>>>>
>>>>>>                 Please review this patch to enable SHM large page
>>>>>>                 allocations even when the requested alignment is
>>>>>>                 larger than os::large_page_size().
>>>>>>
>>>>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.01
>>>>>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.01>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8017629
>>>>>>
>>>>>>                 G1 is affected by this bug since it requires the
>>>>>>                 heap to start at an address that is aligned with
>>>>>>                 the heap region size. The patch fixes this by
>>>>>>                 changing the UseSHM large pages allocation code.
>>>>>>                 First, virtual memory with correct alignment is
>>>>>>                 pre-reserved and then the large pages are attached
>>>>>>                 to this memory area.
>>>>>>
>>>>>>                 Tested with vm.gc.testlist and ExecuteInternaVMTests
>>>>>>
>>>>>>                 Thanks,
>>>>>>                 StefanK
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>