Request for review (s): 7173959 : Jvm crashed during coherence exabus (tmb) testing
Bengt Rutisson
bengt.rutisson at oracle.com
Thu Dec 13 02:02:21 PST 2012
Hi David,
Thanks for looking at this!
On 12/13/12 8:33 AM, David Holmes wrote:
> Hi Bengt,
>
> This has to be one of the absolute best review requests I've ever read
> :) Thank you.
Wow! Thanks! :)
>
> Just a couple of queries.
>
> os_posix.cpp:
>
> Love the diagram :)
It was Mikael's way of making sure we got it right.
> I'm assuming that "alignment" has to be >= the underlying page size,
> and in fact must be a multiple of the underlying page size ? (As I
> assume you can only unmap whole numbers of pages). If so does that
> need to be checked somewhere?
Good point. I'll add an assert to check that.
> In virtualSpace.cpp:
>
> // Reserve size large enough to do manual alignment and
> // increase size to a multiple of the desired alignment
> size = align_size_up(size, alignment);
> ! base = os::reserve_memory_aligned(size, alignment);
>
> The comment doesn't seem necessary now, and the align_size_up seems
> redundant.
You are right. I'll remove the comment and the alignment.
Thanks,
Bengt
>
> Thanks,
> David
>
> On 13/12/2012 4:42 PM, Bengt Rutisson wrote:
>>
>> Hi all,
>>
>> Could I have a couple of reviews for this change?
>>
>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/
>>
>> This is for a bug originally reported by the Coherence team:
>>
>> 7173959 : Jvm crashed during coherence exabus (tmb) testing
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959
>>
>> The original analysis and proposed fix was done by Mikael Gerdin and me
>> together. I'll handle the webrev and push since Mikael is on vacation
>> starting today. But Mikael did a great job tracking down this very
>> difficult bug, so he should have a large part of the credit for this bug
>> fix,
>>
>> Description from the CR:
>>
>> The reason that we crash is due to how we re-map memory when we want to
>> align it for large pages in ReservedSpace::initialize().
>>
>> Here is what happens:
>>
>> The reservation of memory is split up to a few steps. When we want a
>> chunk of memory with large pages we first just reserve some memory large
>> enough for what we need. Then we realize that we want large pages, so we
>> want to re-map the reserved memory to use large pages. But since this
>> requires that we have a large-page-aligned memory chunk we may have to
>> fix the recently reserved memory chunk up a bit.
>>
>> We do this in ReservedSpace::initialize() by first releasing the memory
>> we just reserved. Then requesting more memory than we actually need to
>> make sure that we have enough space to do manual large page alignment.
>> After we have gotten this memory we figure out a nicely aligned start
>> address. We then release the memory again and then reserve just enough
>> memory but using the aligned start address as a fixed address to make
>> sure that we get the memory we wanted in an aligned way.
>>
>> This is done in a loop to make sure that we eventually get some memory.
>> The interesting code looks like this:
>>
>> do {
>> char* extra_base = os::reserve_memory(extra_size, NULL, alignment);
>> if (extra_base == NULL) return;
>> // Do manual alignement
>> base = (char*) align_size_up((uintptr_t) extra_base, alignment);
>> assert(base >= extra_base, "just checking");
>> // Re-reserve the region at the aligned base address.
>> os::release_memory(extra_base, extra_size); // (1)
>> base = os::reserve_memory(size, base); // (2)
>> } while (base == NULL);
>>
>>
>> There is a race here between releasing the memory in (1) and
>> re-reserving it in (2). But the loop is supposed to handle this race.
>>
>> The problem is that on posix platforms you can remap the same memory
>> area several times. The call in (2) will use mmap with MAP_FIXED. This
>> means that the OS will think that you know exactly what you are doing.
>> So, if part of the memory has been mapped already by the process it will
>> just go ahead and reuse that memory.
>>
>> This means that if we are having multiple threads that do mmap. We can
>> end up with a situation where we release our mapping in (1). Then
>> another thread comes in and maps part of the memory that we used to
>> have. Then we remap over that memory again in (2) with MAP_FIXED. Now we
>> have a situation where two threads in our process have mapped the same
>> memory. If both threads try to use it or if one of the threads unmap
>> part or all of the memory we will crash.
>>
>> On posix it is possible to unmap any part of a mapped chunk. So, our
>> proposed solution to the race described above is to not unmap all memory
>> in (1) but rather just unmap the section at the start and at the end of
>> the chunk that we mapped to get alignment. This also removes the need
>> for the loop.
>>
>> However, on Windows you can only unmap _all_ of the memory that you have
>> mapped. On the other hand Windows also will not allow you to map over
>> other mappings, so the original code is actually safe. If we keep the
>> loop.
>>
>> So, our solution is to treat this differently on Windows and posix
>> platforms.
>>
>>
>> Thanks,
>> Bengt
More information about the hotspot-runtime-dev
mailing list