RFR: 8017629: G1: UseSHM in combination with a G1HeapRegionSize > os::large_page_size() falls back to use small pages
Stefan Johansson
stefan.johansson at oracle.com
Wed Apr 20 11:36:29 UTC 2016
Hi StefanK,
On 2016-04-18 12:04, Stefan Karlsson wrote:
> Hi Thomas,
>
> I discussed the code with Per and updated the names and changed the
> code slightly.
>
> http://cr.openjdk.java.net/~stefank/8017629/webrev.03.delta
> http://cr.openjdk.java.net/~stefank/8017629/webrev.03
>
Looks good,
StefanJ
> 1) shmat_with_large_alignment was renamed to shmat_with_alignment and
> all references to large pages were removed.
>
> 2) shmat_with_normal_alignment was renamed to shmat_at_address and all
> references to pages sizes were removed.
>
> 3) shmat_with_alignment was renamed to shmat_large_pages and all large
> pages specific code were kept in that function.
>
> 4) shmat_large_pages was restructured to have one section for the
> req_addr != NULL case, and another section for req_addr == NULL. I
> know that you suggested to call shmat_with_alignment (previously
> shmat_with_normal_alignment) for both cases in the req_addr == NULL
> section, but I would like to only have to use shmat_with_alignment
> when it's really necessary.
>
> Thanks,
> StefanK
>
> On 2016-04-13 15:59, Thomas Stüfe wrote:
>> Hi Stefan,
>>
>> On Wed, Apr 13, 2016 at 1:23 PM, Stefan Karlsson
>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>> wrote:
>>
>> Hi Thomas,
>>
>>
>> On 2016-04-13 12:44, Thomas Stüfe wrote:
>>> Hi Stefan,
>>>
>>> On Tue, Apr 12, 2016 at 5:41 PM, Stefan Karlsson
>>> <stefan.karlsson at oracle.com <mailto:stefan.karlsson at oracle.com>>
>>> wrote:
>>>
>>> Hi Thomas,
>>>
>>>
>>> On 2016-04-12 16:23, Thomas Stüfe wrote:
>>>> Hi Stefan,
>>>>
>>>>
>>>> On Mon, Apr 11, 2016 at 3:52 PM, Stefan Karlsson
>>>> <stefan.karlsson at oracle.com
>>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>
>>>> Hi Thomas,
>>>>
>>>> On 2016-04-11 14:39, Thomas Stüfe wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> short question, why the mmap before the shmat? Why not
>>>>> shmat right away at the requested address?
>>>>
>>>> If we have a requested_address we do exactly what you
>>>> propose.
>>>>
>>>> if (req_addr == NULL && alignment >
>>>> os::large_page_size()) {
>>>> return shmat_with_large_alignment(shmid, bytes,
>>>> alignment);
>>>> } else {
>>>> return shmat_with_normal_alignment(shmid, req_addr);
>>>> }
>>>>
>>>> ...
>>>>
>>>> static char* shmat_with_normal_alignment(int shmid,
>>>> char* req_addr) {
>>>> char* addr = (char*)shmat(shmid, req_addr, 0);
>>>>
>>>> if ((intptr_t)addr == -1) {
>>>> shm_warning_with_errno("Failed to attach shared memory.");
>>>> return NULL;
>>>> }
>>>>
>>>> return addr;
>>>> }
>>>>
>>>>
>>>> It's when you don't have a requested address that mmap
>>>> is used to find a large enough virtual memory area.
>>>>
>>>>
>>>> Sorry, seems I did not look at this coding thoroughly
>>>> enough. I understand now that you do mmap to allocate and
>>>> then to cut away the extra pre-/post-space, something which
>>>> would not be possible with shmat, which cannot be unmapped
>>>> page-wise.
>>>>
>>>> But I am still not sure why we do it his way:
>>>>
>>>> 3429 static char* shmat_with_alignment(int shmid, size_t
>>>> bytes, size_t alignment, char* req_addr) {
>>>> 3430 // If there's no requested address, the shmat call
>>>> can return memory that is not
>>>> 3431 // 'alignment' aligned, if the given alignment is
>>>> larger than the large page size.
>>>> 3432 // Special care needs to be taken to ensure that we
>>>> get aligned memory back.
>>>> 3433 if (req_addr == NULL && alignment >
>>>> os::large_page_size()) {
>>>> 3434 return shmat_with_large_alignment(shmid, bytes,
>>>> alignment);
>>>> 3435 } else {
>>>> 3436 return shmat_with_normal_alignment(shmid, req_addr);
>>>> 3437 }
>>>> 3438 }
>>>>
>>>> For req_addr==0 and big alignment, we attach at the given
>>>> alignment ("shmat_with_large_alignment").
>>>> For req_addr!=0, we attach at the given requested address
>>>> ("shmat_with_normal_alignment").
>>>> For req_addr==0 and smaller alignment, we ignore the
>>>> alignment and attach anywhere?
>>>>
>>>> Maybe I am slow, but why does it matter if the alignment is
>>>> large or small? Why not just distinguish between:
>>>>
>>>> 1) address given (req_addr!=0): in this case we attach at
>>>> this req_addr and rely on the user having aligned the
>>>> address properly for his purposes. We specify 0 for flags,
>>>> so we will attach at exactly the given address or fail. In
>>>> this case we could simply ignore the given alignment - if
>>>> one was given - or just use it to counter-check the req_addr.
>>>>
>>>> 2) alignment given (req_addr==0 and alignment > 0): attach
>>>> at the given alignment using mmap-before-shmat. This could
>>>> be done for any alignment, be it large or small.
>>>
>>> What you propose doesn't work.
>>>
>>> We're allocating large pages with SHM_HUGETLB, and if we try
>>> to attach to an address that is not large_page_size aligned
>>> the shmat call returns EINVAL.
>>>
>>>
>>> I was aware of this. What I meant was:
>>>
>>> You have "shmat_with_large_alignment" which takes an alignment
>>> and does its best to shmat with that alignment using the mmap
>>> trick. This coding does not need to know anything about huge
>>> pages, and actually does not do anything huge-pagey, apart from
>>> the asserts - it would just as well work with small pages,
>>> because the only place where the code needs to know about huge
>>> pages is in the layer above, in reserve_memory_special - where we
>>> pass SHM_HUGETLB to shmget. (Btw, I always wondered about the
>>> "reserve_memory_special" naming.)
>>>
>>> I think my point is that by renaming this to
>>> "shmat_with_alignment" and removing the huge-page-related asserts
>>> the function would become both simpler and more versatile and
>>> could be reused for small alignments as well as large ones. The
>>> fact that it returns EINVAL for alignments instead of asserting
>>> would not be a problem - we would return an error instead of
>>> asserting because of bad alignment, and both handling this error
>>> and asserting for huge-page-alignment could be handled better in
>>> reserve_memory_special.
>>>
>>> To put it another way, I think "shmat_with_large_alignment" does
>>> not need to know about huge pages; this should be the
>>> responsibility of reserve_memory_special.
>>>
>>> About "shmat_with_normal_alignment", this is actually only a raw
>>> shmat call and exists for the req_addr!=NULL case and for the
>>> case where we do not pass neither req_addr nor alignment. So the
>>> only thing it does not handle is alignment, so it is misnamed and
>>> also should not be called for the
>>> req_addr==NULL-and-small-alignments-case.
>>
>> The reserve_memory_special_shm function and the associated helper
>> functions I'm adding are specifically written to support large
>> pages allocations. The names "normal_alignment" and
>> "large_alignment" are intended to refer to alignment sizes
>> compared to the large pages size. I grant you that it's not
>> obvious from the name, and we can rename them to make it more clear.
>>
>> I want to provide a small bug fix for this large pages bug, while
>> you are suggesting that we re-purpose the code into supporting
>> small page allocations as well. Your suggestions might be good,
>> but may I suggest that you create a patch and an RFE that
>> motivates why we should make this code more generic to support
>> small pages as well?
>>
>> Thanks,
>> StefanK
>>
>>
>> Ok, we can do that. I was just worried that the code becomes more
>> difficult to understand. But lets wait for some more reviews.
>>
>> Kind Regards, Thomas
>>
>>
>>>>
>>>> Functions would become simpler and also could be clearer
>>>> named (e.g. "shmat_at_address" and "shmat_with_alignment",
>>>> respectivly).
>>>
>>> Maybe I should rename the functions to make it more obvious
>>> that these are large pages specific functions?
>>>
>>>>
>>>> ----
>>>>
>>>> This:
>>>>
>>>> 3402 if ((intptr_t)addr == -1) {
>>>> 3403 shm_warning_with_errno("Failed to attach shared
>>>> memory.");
>>>> 3404 // Since we don't know if the kernel unmapped the
>>>> pre-reserved memory area
>>>> 3405 // we can't unmap it, since that would potentially
>>>> unmap memory that was
>>>> 3406 // mapped from other threads.
>>>> 3407 return NULL;
>>>> 3408 }
>>>>
>>>> seems scary. Means for every call this happens, we leak the
>>>> reserved (not committed) address space?
>>>
>>> Yes, that's unfortunate.
>>>
>>> An alternative would be to use this sequence:
>>> 1) Use anon_mmap_aligned to find a suitable VA range
>>> 2) Immediately unmap the VA range
>>> 3) Try to attach at that VA range _without_ SHM_REMAP
>>>
>>> That would remove the risk of leaking the reserved address
>>> space, but instead we risk failing at (3) if another thread
>>> manages to allocate memory inside the found VA range. This
>>> will cause some users to unnecessarily fail to get large
>>> pages, though. We've had other problems when pre-existing
>>> threads used mmap while we were initializing the VM. See:
>>> JDK-8007074.
>>>
>>>
>>> Yes; btw you also could do this with shmget/shmat instead of mmap.
>>>
>>> Note that similar unclean tricks are already done in other
>>> places, see e.g. the windows version of
>>> os::pd_split_reserved_memory(). Which deals with VirtualAlloc()
>>> being unable, like shmget, to deallocate piece-wise.
>>>
>>>
>>>
>>>> For most cases (anything but ENOMEM, actually) could we at
>>>> least assert?:
>>>>
>>>> EACCES - should not happen: we created the shared memory and
>>>> are its owner
>>>> EIDRM - should not happen.
>>>> EINVAL - should not happen. (you already check now the
>>>> attach address for alignment to SHMLBA, so this is covered)
>>>
>>> Sure. I'll add asserts for these.
>>>
>>>>
>>>> ---
>>>>
>>>> Smaller nits:
>>>>
>>>> Functions called "shmat_..." suggest shmat-like behaviour,
>>>> so could we have them return -1 instead of NULL in case of
>>>> error?
>>>
>>> That would add clutter to the reserve_memory_special_shm, and
>>> it might also suggest that it would be OK to check errno for
>>> the failure reason, which probably wouldn't work. I'll let
>>> other Reviewers chime in and help decide if we should change
>>> this.
>>>
>>>
>>> You are right. If one returns -1, one would have to preserve
>>> errno for the caller too.
>>>
>>> Thanks for reviewing this,
>>> StefanK
>>>
>>>
>>> You are welcome!
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>>>
>>>>> Also note that mmap- and shmat-allocated memory may
>>>>> have different alignment requirements: mmap requires a
>>>>> page-aligned request address, whereas shmat requires
>>>>> alignment to SHMLBA, which may be multiple pages (e.g.
>>>>> for ARM:
>>>>> http://lxr.free-electrons.com/source/arch/arm/include/asm/shmparam.h#L9).
>>>>> So, for this shat-over-mmap trick to work, request
>>>>> address has to be aligned to SHMLBA, not just page size.
>>>>>
>>>>> I see that you assert alignment of requ address to
>>>>> os::large_page_size(), which I would assume is a
>>>>> multiple of SHMLBA, but I am not sure of this.
>>>>
>>>> I've added some defensive code and asserts to catch this
>>>> if/when this assumption fails:
>>>>
>>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02.delta/
>>>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02.delta/>
>>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.02
>>>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.02>
>>>>
>>>> I need to verify that this works on other machines than
>>>> my local Linux x64 machine.
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>>
>>>>> Kind Regards, Thomas
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 11, 2016 at 1:03 PM, Stefan Karlsson
>>>>> <stefan.karlsson at oracle.com
>>>>> <mailto:stefan.karlsson at oracle.com>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Please review this patch to enable SHM large page
>>>>> allocations even when the requested alignment is
>>>>> larger than os::large_page_size().
>>>>>
>>>>> http://cr.openjdk.java.net/~stefank/8017629/webrev.01
>>>>> <http://cr.openjdk.java.net/%7Estefank/8017629/webrev.01>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8017629
>>>>>
>>>>> G1 is affected by this bug since it requires the
>>>>> heap to start at an address that is aligned with
>>>>> the heap region size. The patch fixes this by
>>>>> changing the UseSHM large pages allocation code.
>>>>> First, virtual memory with correct alignment is
>>>>> pre-reserved and then the large pages are attached
>>>>> to this memory area.
>>>>>
>>>>> Tested with vm.gc.testlist and ExecuteInternaVMTests
>>>>>
>>>>> Thanks,
>>>>> StefanK
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
More information about the hotspot-dev
mailing list