Request for review (s): 7173959 : Jvm crashed during coherence exabus (tmb) testing

Thu Dec 13 15:44:34 PST 2012

Nice detective work and thanks for the answers.

Vitaly

Sent from my phone
On Dec 13, 2012 4:23 PM, "Bengt Rutisson" <bengt.rutisson at oracle.com> wrote:

>
> Hi Vitaly,
>
> Thanks for looking at this!
>
> On 12/13/12 2:44 PM, Vitaly Davidovich wrote:
>
> Hi Bengt,
>
> Great write-up! What was in the hs_err file that led you (or Mikael) here?
>
>
> Using the core file we noticed that the hotspot internal data structures
> that keep track of the bitmap sizes did not match the mapped memory listed
> in the hs_err file. We could see that the mapped memory for the bitmap was
> smaller than what hotspot had requested. After convincing ourselves that we
> were passing the correct size to mmap we figured that someone must have
> been unmapping part of the memory that we had mapped.
>
>  One question I have is why alignment is a multiple of page size and not
> page size? I imagine callers want to reserve X space and have it page
> aligned up if necessary.  When would someone request multiple page size
> alignment rather than just increase requested allocation amount and keep
> alignment at page size? If huge pages are used then
> vm_allocation_granularity() respects that, right? Or is this for
> transparent huge page support? Hope that makes sense.
>
> One such example is with G1. It wants the heap size to be aligned with the
> region size, which may be a multiple of the page size. Even with large
> pages.
>
> Actually we also need this for our large page support. The ReservedSpace
> doesn't really know of large pages. We use it to map memory but then we
> later on use madvice to make the mapped memory use huge pages. This means
> that at the time we map the memory we can only rely on the small page size,
> but if we know we will use it for large pages we have to make sure that the
> memory is aligned to the large page size. So the alignment will be a
> multiple of the small page size.
>
> The other question is whether it's worth it to check for overflow in the
> new os_posix function? There are some additions and subtractions on size_t
> values and maybe on 32 bit some of those adds can overflow a temporary?
> Specifically, line 118.  Disregard if that's not needed :).
>
>
> Good point. I think this change is not really increasing the risk of an
> overflow compared to the existing code. We should not overflow if we are
> passed sane values. An overflow will indicate that someone higher up was
> putting together a too large size. I think I would rather have the upper
> layers verify that they calculate reasonable values.
>
> Thanks for your feedback!
> Bengt
>
>  Thanks
>
> Sent from my phone
> On Dec 13, 2012 7:59 AM, "Bengt Rutisson" <bengt.rutisson at oracle.com>
> wrote:
>
>>
>> Hi David,
>>
>> Updated webrev:
>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.02/
>>
>> I moved the alignment of "size" back into ReservedSpace::initialize()
>> since we actually store the size in an instance variable (_size), so I
>> think it is a bit risky to just do the align_up in
>> os::reserve_memory_aligned(). We probably want the instance variables _size
>> and _alignment in ReservedSpace to be consistent.
>>
>> I added an assert in os::reserve_memory_aligned() to verify that the size
>> is correctly aligned. I also added the assert you suggested to check that
>> alignment is page size aligned.
>>
>> Bengt
>>
>> On 12/13/12 11:50 AM, David Holmes wrote:
>>
>>> On 13/12/2012 8:37 PM, Bengt Rutisson wrote:
>>>
>>>>
>>>> Hi again,
>>>>
>>>> Updated webrev:
>>>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/
>>>>
>>>> I removed the comment and the alignment.
>>>>
>>>> But I did not add an assert just yet.
>>>>
>>>> At the top of ReservedSpace::initialize() we have this code:
>>>>
>>>> const size_t granularity = os::vm_allocation_granularity();
>>>> assert((size & (granularity - 1)) == 0,
>>>> "size not aligned to os::vm_allocation_granularity()");
>>>> assert((alignment & (granularity - 1)) == 0,
>>>> "alignment not aligned to os::vm_allocation_granularity()");
>>>>
>>>>
>>>> Where os::vm_allocation_granularity() is implemented as page size on all
>>>> platforms except Windows. There we look it up from the Windows API. I
>>>> assume that is a page size too, but since the Windows code in our patch
>>>> does not unmap based on the alignment it should be safe either way.
>>>>
>>>> Is this assert enough or would you like me to add an assert closer to
>>>> the code block were I did the changes?
>>>>
>>>
>>> As this is a separate method now I think an assert in the method itself
>>> would not go astray.
>>>
>>> David
>>> -----
>>>
>>>  Thanks,
>>>> Bengt
>>>>
>>>>
>>>> On 12/13/12 11:02 AM, Bengt Rutisson wrote:
>>>>
>>>>>
>>>>> Hi David,
>>>>>
>>>>> Thanks for looking at this!
>>>>>
>>>>> On 12/13/12 8:33 AM, David Holmes wrote:
>>>>>
>>>>>> Hi Bengt,
>>>>>>
>>>>>> This has to be one of the absolute best review requests I've ever
>>>>>> read :) Thank you.
>>>>>>
>>>>>
>>>>> Wow! Thanks! :)
>>>>>
>>>>>
>>>>>> Just a couple of queries.
>>>>>>
>>>>>> os_posix.cpp:
>>>>>>
>>>>>> Love the diagram :)
>>>>>>
>>>>>
>>>>> It was Mikael's way of making sure we got it right.
>>>>>
>>>>>  I'm assuming that "alignment" has to be >= the underlying page size,
>>>>>> and in fact must be a multiple of the underlying page size ? (As I
>>>>>> assume you can only unmap whole numbers of pages). If so does that
>>>>>> need to be checked somewhere?
>>>>>>
>>>>>
>>>>> Good point. I'll add an assert to check that.
>>>>>
>>>>>  In virtualSpace.cpp:
>>>>>>
>>>>>> // Reserve size large enough to do manual alignment and
>>>>>> // increase size to a multiple of the desired alignment
>>>>>> size = align_size_up(size, alignment);
>>>>>> ! base = os::reserve_memory_aligned(size, alignment);
>>>>>>
>>>>>> The comment doesn't seem necessary now, and the align_size_up seems
>>>>>> redundant.
>>>>>>
>>>>>
>>>>> You are right. I'll remove the comment and the alignment.
>>>>>
>>>>> Thanks,
>>>>> Bengt
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 13/12/2012 4:42 PM, Bengt Rutisson wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Could I have a couple of reviews for this change?
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/
>>>>>>>
>>>>>>> This is for a bug originally reported by the Coherence team:
>>>>>>>
>>>>>>> 7173959 : Jvm crashed during coherence exabus (tmb) testing
>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959
>>>>>>>
>>>>>>> The original analysis and proposed fix was done by Mikael Gerdin and
>>>>>>> me
>>>>>>> together. I'll handle the webrev and push since Mikael is on vacation
>>>>>>> starting today. But Mikael did a great job tracking down this very
>>>>>>> difficult bug, so he should have a large part of the credit for this
>>>>>>> bug
>>>>>>> fix,
>>>>>>>
>>>>>>> Description from the CR:
>>>>>>>
>>>>>>> The reason that we crash is due to how we re-map memory when we want
>>>>>>> to
>>>>>>> align it for large pages in ReservedSpace::initialize().
>>>>>>>
>>>>>>> Here is what happens:
>>>>>>>
>>>>>>> The reservation of memory is split up to a few steps. When we want a
>>>>>>> chunk of memory with large pages we first just reserve some memory
>>>>>>> large
>>>>>>> enough for what we need. Then we realize that we want large pages,
>>>>>>> so we
>>>>>>> want to re-map the reserved memory to use large pages. But since this
>>>>>>> requires that we have a large-page-aligned memory chunk we may have
>>>>>>> to
>>>>>>> fix the recently reserved memory chunk up a bit.
>>>>>>>
>>>>>>> We do this in ReservedSpace::initialize() by first releasing the
>>>>>>> memory
>>>>>>> we just reserved. Then requesting more memory than we actually need
>>>>>>> to
>>>>>>> make sure that we have enough space to do manual large page
>>>>>>> alignment.
>>>>>>> After we have gotten this memory we figure out a nicely aligned start
>>>>>>> address. We then release the memory again and then reserve just
>>>>>>> enough
>>>>>>> memory but using the aligned start address as a fixed address to make
>>>>>>> sure that we get the memory we wanted in an aligned way.
>>>>>>>
>>>>>>> This is done in a loop to make sure that we eventually get some
>>>>>>> memory.
>>>>>>> The interesting code looks like this:
>>>>>>>
>>>>>>> do {
>>>>>>> char* extra_base = os::reserve_memory(extra_size, NULL, alignment);
>>>>>>> if (extra_base == NULL) return;
>>>>>>> // Do manual alignement
>>>>>>> base = (char*) align_size_up((uintptr_t) extra_base, alignment);
>>>>>>> assert(base >= extra_base, "just checking");
>>>>>>> // Re-reserve the region at the aligned base address.
>>>>>>> os::release_memory(extra_base, extra_size); // (1)
>>>>>>> base = os::reserve_memory(size, base); // (2)
>>>>>>> } while (base == NULL);
>>>>>>>
>>>>>>>
>>>>>>> There is a race here between releasing the memory in (1) and
>>>>>>> re-reserving it in (2). But the loop is supposed to handle this race.
>>>>>>>
>>>>>>> The problem is that on posix platforms you can remap the same memory
>>>>>>> area several times. The call in (2) will use mmap with MAP_FIXED.
>>>>>>> This
>>>>>>> means that the OS will think that you know exactly what you are
>>>>>>> doing.
>>>>>>> So, if part of the memory has been mapped already by the process it
>>>>>>> will
>>>>>>> just go ahead and reuse that memory.
>>>>>>>
>>>>>>> This means that if we are having multiple threads that do mmap. We
>>>>>>> can
>>>>>>> end up with a situation where we release our mapping in (1). Then
>>>>>>> another thread comes in and maps part of the memory that we used to
>>>>>>> have. Then we remap over that memory again in (2) with MAP_FIXED.
>>>>>>> Now we
>>>>>>> have a situation where two threads in our process have mapped the
>>>>>>> same
>>>>>>> memory. If both threads try to use it or if one of the threads unmap
>>>>>>> part or all of the memory we will crash.
>>>>>>>
>>>>>>> On posix it is possible to unmap any part of a mapped chunk. So, our
>>>>>>> proposed solution to the race described above is to not unmap all
>>>>>>> memory
>>>>>>> in (1) but rather just unmap the section at the start and at the end
>>>>>>> of
>>>>>>> the chunk that we mapped to get alignment. This also removes the need
>>>>>>> for the loop.
>>>>>>>
>>>>>>> However, on Windows you can only unmap _all_ of the memory that you
>>>>>>> have
>>>>>>> mapped. On the other hand Windows also will not allow you to map over
>>>>>>> other mappings, so the original code is actually safe. If we keep
>>>>>>> the loop.
>>>>>>>
>>>>>>> So, our solution is to treat this differently on Windows and posix
>>>>>>> platforms.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bengt
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20121213/aebd5895/attachment-0001.html