Request for review (s): 7173959 : Jvm crashed during coherence exabus (tmb) testing
David Holmes
david.holmes at oracle.com
Thu Dec 13 02:50:20 PST 2012
On 13/12/2012 8:37 PM, Bengt Rutisson wrote:
>
> Hi again,
>
> Updated webrev:
> http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/
>
> I removed the comment and the alignment.
>
> But I did not add an assert just yet.
>
> At the top of ReservedSpace::initialize() we have this code:
>
> const size_t granularity = os::vm_allocation_granularity();
> assert((size & (granularity - 1)) == 0,
> "size not aligned to os::vm_allocation_granularity()");
> assert((alignment & (granularity - 1)) == 0,
> "alignment not aligned to os::vm_allocation_granularity()");
>
>
> Where os::vm_allocation_granularity() is implemented as page size on all
> platforms except Windows. There we look it up from the Windows API. I
> assume that is a page size too, but since the Windows code in our patch
> does not unmap based on the alignment it should be safe either way.
>
> Is this assert enough or would you like me to add an assert closer to
> the code block were I did the changes?
As this is a separate method now I think an assert in the method itself
would not go astray.
David
-----
> Thanks,
> Bengt
>
>
> On 12/13/12 11:02 AM, Bengt Rutisson wrote:
>>
>> Hi David,
>>
>> Thanks for looking at this!
>>
>> On 12/13/12 8:33 AM, David Holmes wrote:
>>> Hi Bengt,
>>>
>>> This has to be one of the absolute best review requests I've ever
>>> read :) Thank you.
>>
>> Wow! Thanks! :)
>>
>>>
>>> Just a couple of queries.
>>>
>>> os_posix.cpp:
>>>
>>> Love the diagram :)
>>
>> It was Mikael's way of making sure we got it right.
>>
>>> I'm assuming that "alignment" has to be >= the underlying page size,
>>> and in fact must be a multiple of the underlying page size ? (As I
>>> assume you can only unmap whole numbers of pages). If so does that
>>> need to be checked somewhere?
>>
>> Good point. I'll add an assert to check that.
>>
>>> In virtualSpace.cpp:
>>>
>>> // Reserve size large enough to do manual alignment and
>>> // increase size to a multiple of the desired alignment
>>> size = align_size_up(size, alignment);
>>> ! base = os::reserve_memory_aligned(size, alignment);
>>>
>>> The comment doesn't seem necessary now, and the align_size_up seems
>>> redundant.
>>
>> You are right. I'll remove the comment and the alignment.
>>
>> Thanks,
>> Bengt
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 13/12/2012 4:42 PM, Bengt Rutisson wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Could I have a couple of reviews for this change?
>>>>
>>>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/
>>>>
>>>> This is for a bug originally reported by the Coherence team:
>>>>
>>>> 7173959 : Jvm crashed during coherence exabus (tmb) testing
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959
>>>>
>>>> The original analysis and proposed fix was done by Mikael Gerdin and me
>>>> together. I'll handle the webrev and push since Mikael is on vacation
>>>> starting today. But Mikael did a great job tracking down this very
>>>> difficult bug, so he should have a large part of the credit for this
>>>> bug
>>>> fix,
>>>>
>>>> Description from the CR:
>>>>
>>>> The reason that we crash is due to how we re-map memory when we want to
>>>> align it for large pages in ReservedSpace::initialize().
>>>>
>>>> Here is what happens:
>>>>
>>>> The reservation of memory is split up to a few steps. When we want a
>>>> chunk of memory with large pages we first just reserve some memory
>>>> large
>>>> enough for what we need. Then we realize that we want large pages,
>>>> so we
>>>> want to re-map the reserved memory to use large pages. But since this
>>>> requires that we have a large-page-aligned memory chunk we may have to
>>>> fix the recently reserved memory chunk up a bit.
>>>>
>>>> We do this in ReservedSpace::initialize() by first releasing the memory
>>>> we just reserved. Then requesting more memory than we actually need to
>>>> make sure that we have enough space to do manual large page alignment.
>>>> After we have gotten this memory we figure out a nicely aligned start
>>>> address. We then release the memory again and then reserve just enough
>>>> memory but using the aligned start address as a fixed address to make
>>>> sure that we get the memory we wanted in an aligned way.
>>>>
>>>> This is done in a loop to make sure that we eventually get some memory.
>>>> The interesting code looks like this:
>>>>
>>>> do {
>>>> char* extra_base = os::reserve_memory(extra_size, NULL, alignment);
>>>> if (extra_base == NULL) return;
>>>> // Do manual alignement
>>>> base = (char*) align_size_up((uintptr_t) extra_base, alignment);
>>>> assert(base >= extra_base, "just checking");
>>>> // Re-reserve the region at the aligned base address.
>>>> os::release_memory(extra_base, extra_size); // (1)
>>>> base = os::reserve_memory(size, base); // (2)
>>>> } while (base == NULL);
>>>>
>>>>
>>>> There is a race here between releasing the memory in (1) and
>>>> re-reserving it in (2). But the loop is supposed to handle this race.
>>>>
>>>> The problem is that on posix platforms you can remap the same memory
>>>> area several times. The call in (2) will use mmap with MAP_FIXED. This
>>>> means that the OS will think that you know exactly what you are doing.
>>>> So, if part of the memory has been mapped already by the process it
>>>> will
>>>> just go ahead and reuse that memory.
>>>>
>>>> This means that if we are having multiple threads that do mmap. We can
>>>> end up with a situation where we release our mapping in (1). Then
>>>> another thread comes in and maps part of the memory that we used to
>>>> have. Then we remap over that memory again in (2) with MAP_FIXED.
>>>> Now we
>>>> have a situation where two threads in our process have mapped the same
>>>> memory. If both threads try to use it or if one of the threads unmap
>>>> part or all of the memory we will crash.
>>>>
>>>> On posix it is possible to unmap any part of a mapped chunk. So, our
>>>> proposed solution to the race described above is to not unmap all
>>>> memory
>>>> in (1) but rather just unmap the section at the start and at the end of
>>>> the chunk that we mapped to get alignment. This also removes the need
>>>> for the loop.
>>>>
>>>> However, on Windows you can only unmap _all_ of the memory that you
>>>> have
>>>> mapped. On the other hand Windows also will not allow you to map over
>>>> other mappings, so the original code is actually safe. If we keep
>>>> the loop.
>>>>
>>>> So, our solution is to treat this differently on Windows and posix
>>>> platforms.
>>>>
>>>>
>>>> Thanks,
>>>> Bengt
>>
>
More information about the hotspot-runtime-dev
mailing list