Request for review (s): 7173959 : Jvm crashed during coherence exabus (tmb) testing

Bengt Rutisson bengt.rutisson at oracle.com
Mon Dec 17 00:38:01 PST 2012


Hi Vitaly,

On 12/14/12 3:46 PM, Vitaly Davidovich wrote:
>
> Hi Bengt,
>
> I'd add an assert(begin_offset < end_offset, "sanity") as well but up 
> to you.  Looks good.
>

Actually begin_offset and end_offset are the number of bytes we want to 
umap. So, it is quite possible for end_offset to be less than begin_offset.

Bengt

> Thanks
>
> Sent from my phone
>
> On Dec 14, 2012 3:49 AM, "Bengt Rutisson" <bengt.rutisson at oracle.com 
> <mailto:bengt.rutisson at oracle.com>> wrote:
>
>
>     Hi again Runtime team,
>
>     I need one more review for this change. It is a P1 bug with a
>     fairly small change. Any takers?
>
>     Latest webrev based on comments from David Holmes and Vitaly
>     Davidovich:
>     http://cr.openjdk.java.net/~brutisso/7173959/webrev.03/
>     <http://cr.openjdk.java.net/%7Ebrutisso/7173959/webrev.03/>
>
>     Thanks,
>     Bengt
>
>
>     On 12/14/12 9:29 AM, Bengt Rutisson wrote:
>
>
>         Hi David,
>
>         On 12/14/12 1:38 AM, David Holmes wrote:
>
>             Hi Bengt,
>
>             That all sounds good to me.
>
>             With regard to potential overflow perhaps a simple check
>             that extra_size does not overflow? If that is true and
>             extra_base is not null then I think all the other
>             calculations must be valid.
>
>
>         Yes, that makes sense. I'll add an overflow check for the
>         extra_size.
>
>         Thanks,
>         Bengt
>
>
>             Thanks,
>             David
>
>             On 13/12/2012 10:58 PM, Bengt Rutisson wrote:
>
>
>                 Hi David,
>
>                 Updated webrev:
>                 http://cr.openjdk.java.net/~brutisso/7173959/webrev.02/ <http://cr.openjdk.java.net/%7Ebrutisso/7173959/webrev.02/>
>
>                 I moved the alignment of "size" back into
>                 ReservedSpace::initialize()
>                 since we actually store the size in an instance
>                 variable (_size), so I
>                 think it is a bit risky to just do the align_up in
>                 os::reserve_memory_aligned(). We probably want the
>                 instance variables
>                 _size and _alignment in ReservedSpace to be consistent.
>
>                 I added an assert in os::reserve_memory_aligned() to
>                 verify that the
>                 size is correctly aligned. I also added the assert you
>                 suggested to
>                 check that alignment is page size aligned.
>
>                 Bengt
>
>                 On 12/13/12 11:50 AM, David Holmes wrote:
>
>                     On 13/12/2012 8:37 PM, Bengt Rutisson wrote:
>
>
>                         Hi again,
>
>                         Updated webrev:
>                         http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/
>                         <http://cr.openjdk.java.net/%7Ebrutisso/7173959/webrev.01/>
>
>                         I removed the comment and the alignment.
>
>                         But I did not add an assert just yet.
>
>                         At the top of ReservedSpace::initialize() we
>                         have this code:
>
>                         const size_t granularity =
>                         os::vm_allocation_granularity();
>                         assert((size & (granularity - 1)) == 0,
>                         "size not aligned to
>                         os::vm_allocation_granularity()");
>                         assert((alignment & (granularity - 1)) == 0,
>                         "alignment not aligned to
>                         os::vm_allocation_granularity()");
>
>
>                         Where os::vm_allocation_granularity() is
>                         implemented as page size on all
>                         platforms except Windows. There we look it up
>                         from the Windows API. I
>                         assume that is a page size too, but since the
>                         Windows code in our patch
>                         does not unmap based on the alignment it
>                         should be safe either way.
>
>                         Is this assert enough or would you like me to
>                         add an assert closer to
>                         the code block were I did the changes?
>
>
>                     As this is a separate method now I think an assert
>                     in the method
>                     itself would not go astray.
>
>                     David
>                     -----
>
>                         Thanks,
>                         Bengt
>
>
>                         On 12/13/12 11:02 AM, Bengt Rutisson wrote:
>
>
>                             Hi David,
>
>                             Thanks for looking at this!
>
>                             On 12/13/12 8:33 AM, David Holmes wrote:
>
>                                 Hi Bengt,
>
>                                 This has to be one of the absolute
>                                 best review requests I've ever
>                                 read :) Thank you.
>
>
>                             Wow! Thanks! :)
>
>
>                                 Just a couple of queries.
>
>                                 os_posix.cpp:
>
>                                 Love the diagram :)
>
>
>                             It was Mikael's way of making sure we got
>                             it right.
>
>                                 I'm assuming that "alignment" has to
>                                 be >= the underlying page size,
>                                 and in fact must be a multiple of the
>                                 underlying page size ? (As I
>                                 assume you can only unmap whole
>                                 numbers of pages). If so does that
>                                 need to be checked somewhere?
>
>
>                             Good point. I'll add an assert to check that.
>
>                                 In virtualSpace.cpp:
>
>                                 // Reserve size large enough to do
>                                 manual alignment and
>                                 // increase size to a multiple of the
>                                 desired alignment
>                                 size = align_size_up(size, alignment);
>                                 ! base =
>                                 os::reserve_memory_aligned(size,
>                                 alignment);
>
>                                 The comment doesn't seem necessary
>                                 now, and the align_size_up seems
>                                 redundant.
>
>
>                             You are right. I'll remove the comment and
>                             the alignment.
>
>                             Thanks,
>                             Bengt
>
>
>                                 Thanks,
>                                 David
>
>                                 On 13/12/2012 4:42 PM, Bengt Rutisson
>                                 wrote:
>
>
>                                     Hi all,
>
>                                     Could I have a couple of reviews
>                                     for this change?
>
>                                     http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/
>                                     <http://cr.openjdk.java.net/%7Ebrutisso/7173959/webrev.00/>
>
>                                     This is for a bug originally
>                                     reported by the Coherence team:
>
>                                     7173959 : Jvm crashed during
>                                     coherence exabus (tmb) testing
>                                     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959
>
>                                     The original analysis and proposed
>                                     fix was done by Mikael Gerdin
>                                     and me
>                                     together. I'll handle the webrev
>                                     and push since Mikael is on vacation
>                                     starting today. But Mikael did a
>                                     great job tracking down this very
>                                     difficult bug, so he should have a
>                                     large part of the credit for this
>                                     bug
>                                     fix,
>
>                                     Description from the CR:
>
>                                     The reason that we crash is due to
>                                     how we re-map memory when we
>                                     want to
>                                     align it for large pages in
>                                     ReservedSpace::initialize().
>
>                                     Here is what happens:
>
>                                     The reservation of memory is split
>                                     up to a few steps. When we want a
>                                     chunk of memory with large pages
>                                     we first just reserve some memory
>                                     large
>                                     enough for what we need. Then we
>                                     realize that we want large pages,
>                                     so we
>                                     want to re-map the reserved memory
>                                     to use large pages. But since this
>                                     requires that we have a
>                                     large-page-aligned memory chunk we may
>                                     have to
>                                     fix the recently reserved memory
>                                     chunk up a bit.
>
>                                     We do this in
>                                     ReservedSpace::initialize() by
>                                     first releasing the
>                                     memory
>                                     we just reserved. Then requesting
>                                     more memory than we actually
>                                     need to
>                                     make sure that we have enough
>                                     space to do manual large page
>                                     alignment.
>                                     After we have gotten this memory
>                                     we figure out a nicely aligned start
>                                     address. We then release the
>                                     memory again and then reserve just
>                                     enough
>                                     memory but using the aligned start
>                                     address as a fixed address to make
>                                     sure that we get the memory we
>                                     wanted in an aligned way.
>
>                                     This is done in a loop to make
>                                     sure that we eventually get some
>                                     memory.
>                                     The interesting code looks like this:
>
>                                     do {
>                                     char* extra_base =
>                                     os::reserve_memory(extra_size,
>                                     NULL, alignment);
>                                     if (extra_base == NULL) return;
>                                     // Do manual alignement
>                                     base = (char*)
>                                     align_size_up((uintptr_t)
>                                     extra_base, alignment);
>                                     assert(base >= extra_base, "just
>                                     checking");
>                                     // Re-reserve the region at the
>                                     aligned base address.
>                                     os::release_memory(extra_base,
>                                     extra_size); // (1)
>                                     base = os::reserve_memory(size,
>                                     base); // (2)
>                                     } while (base == NULL);
>
>
>                                     There is a race here between
>                                     releasing the memory in (1) and
>                                     re-reserving it in (2). But the
>                                     loop is supposed to handle this race.
>
>                                     The problem is that on posix
>                                     platforms you can remap the same
>                                     memory
>                                     area several times. The call in
>                                     (2) will use mmap with MAP_FIXED.
>                                     This
>                                     means that the OS will think that
>                                     you know exactly what you are
>                                     doing.
>                                     So, if part of the memory has been
>                                     mapped already by the process it
>                                     will
>                                     just go ahead and reuse that memory.
>
>                                     This means that if we are having
>                                     multiple threads that do mmap. We
>                                     can
>                                     end up with a situation where we
>                                     release our mapping in (1). Then
>                                     another thread comes in and maps
>                                     part of the memory that we used to
>                                     have. Then we remap over that
>                                     memory again in (2) with MAP_FIXED.
>                                     Now we
>                                     have a situation where two threads
>                                     in our process have mapped the
>                                     same
>                                     memory. If both threads try to use
>                                     it or if one of the threads unmap
>                                     part or all of the memory we will
>                                     crash.
>
>                                     On posix it is possible to unmap
>                                     any part of a mapped chunk. So, our
>                                     proposed solution to the race
>                                     described above is to not unmap all
>                                     memory
>                                     in (1) but rather just unmap the
>                                     section at the start and at the
>                                     end of
>                                     the chunk that we mapped to get
>                                     alignment. This also removes the need
>                                     for the loop.
>
>                                     However, on Windows you can only
>                                     unmap _all_ of the memory that you
>                                     have
>                                     mapped. On the other hand Windows
>                                     also will not allow you to map over
>                                     other mappings, so the original
>                                     code is actually safe. If we keep
>                                     the loop.
>
>                                     So, our solution is to treat this
>                                     differently on Windows and posix
>                                     platforms.
>
>
>                                     Thanks,
>                                     Bengt
>
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20121217/02ce114a/attachment-0001.html 


More information about the hotspot-runtime-dev mailing list