Fwd: Re: Request for review (s): 7173959 : Jvm crashed during coherence exabus (tmb) testing

Zhengyu Gu zhengyu.gu at oracle.com
Fri Dec 14 06:59:14 PST 2012


Looks good to me.

 From NMT perspective, as long as os::xxxx_memory methods are used, you 
don't need to add NMT hooks, but you do have to be careful about tagging 
the memory pointers returned by os::reserve_memory() methods. But for 
VirtualSpace and ReservedSpace, etc. utility classes, they are tagged by 
their callers.

Thanks,

-Zhengyu

On 12/14/2012 7:44 AM, Coleen Phillimore wrote:
>
> Does NMT track this?   Will this mess up NMT?
> Coleen
>
>
> -------- Original Message --------
> Subject: 	Re: Request for review (s): 7173959 : Jvm crashed during 
> coherence exabus (tmb) testing
> Date: 	Fri, 14 Dec 2012 10:38:34 +1000
> From: 	David Holmes <david.holmes at oracle.com>
> Organization: 	Oracle Corporation
> To: 	Bengt Rutisson <bengt.rutisson at oracle.com>
> CC: 	hotspot-runtime-dev at openjdk.java.net
>
>
>
> Hi Bengt,
>
> That all sounds good to me.
>
> With regard to potential overflow perhaps a simple check that extra_size
> does not overflow? If that is true and extra_base is not null then I
> think all the other calculations must be valid.
>
> Thanks,
> David
>
> On 13/12/2012 10:58 PM, Bengt Rutisson wrote:
> >
> >  Hi David,
> >
> >  Updated webrev:
> >  http://cr.openjdk.java.net/~brutisso/7173959/webrev.02/
> >
> >  I moved the alignment of "size" back into ReservedSpace::initialize()
> >  since we actually store the size in an instance variable (_size), so I
> >  think it is a bit risky to just do the align_up in
> >  os::reserve_memory_aligned(). We probably want the instance variables
> >  _size and _alignment in ReservedSpace to be consistent.
> >
> >  I added an assert in os::reserve_memory_aligned() to verify that the
> >  size is correctly aligned. I also added the assert you suggested to
> >  check that alignment is page size aligned.
> >
> >  Bengt
> >
> >  On 12/13/12 11:50 AM, David Holmes wrote:
> >>  On 13/12/2012 8:37 PM, Bengt Rutisson wrote:
> >>>
> >>>  Hi again,
> >>>
> >>>  Updated webrev:
> >>>  http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/
> >>>
> >>>  I removed the comment and the alignment.
> >>>
> >>>  But I did not add an assert just yet.
> >>>
> >>>  At the top of ReservedSpace::initialize() we have this code:
> >>>
> >>>  const size_t granularity = os::vm_allocation_granularity();
> >>>  assert((size&  (granularity - 1)) == 0,
> >>>  "size not aligned to os::vm_allocation_granularity()");
> >>>  assert((alignment&  (granularity - 1)) == 0,
> >>>  "alignment not aligned to os::vm_allocation_granularity()");
> >>>
> >>>
> >>>  Where os::vm_allocation_granularity() is implemented as page size on all
> >>>  platforms except Windows. There we look it up from the Windows API. I
> >>>  assume that is a page size too, but since the Windows code in our patch
> >>>  does not unmap based on the alignment it should be safe either way.
> >>>
> >>>  Is this assert enough or would you like me to add an assert closer to
> >>>  the code block were I did the changes?
> >>
> >>  As this is a separate method now I think an assert in the method
> >>  itself would not go astray.
> >>
> >>  David
> >>  -----
> >>
> >>>  Thanks,
> >>>  Bengt
> >>>
> >>>
> >>>  On 12/13/12 11:02 AM, Bengt Rutisson wrote:
> >>>>
> >>>>  Hi David,
> >>>>
> >>>>  Thanks for looking at this!
> >>>>
> >>>>  On 12/13/12 8:33 AM, David Holmes wrote:
> >>>>>  Hi Bengt,
> >>>>>
> >>>>>  This has to be one of the absolute best review requests I've ever
> >>>>>  read :) Thank you.
> >>>>
> >>>>  Wow! Thanks! :)
> >>>>
> >>>>>
> >>>>>  Just a couple of queries.
> >>>>>
> >>>>>  os_posix.cpp:
> >>>>>
> >>>>>  Love the diagram :)
> >>>>
> >>>>  It was Mikael's way of making sure we got it right.
> >>>>
> >>>>>  I'm assuming that "alignment" has to be>= the underlying page size,
> >>>>>  and in fact must be a multiple of the underlying page size ? (As I
> >>>>>  assume you can only unmap whole numbers of pages). If so does that
> >>>>>  need to be checked somewhere?
> >>>>
> >>>>  Good point. I'll add an assert to check that.
> >>>>
> >>>>>  In virtualSpace.cpp:
> >>>>>
> >>>>>  // Reserve size large enough to do manual alignment and
> >>>>>  // increase size to a multiple of the desired alignment
> >>>>>  size = align_size_up(size, alignment);
> >>>>>  ! base = os::reserve_memory_aligned(size, alignment);
> >>>>>
> >>>>>  The comment doesn't seem necessary now, and the align_size_up seems
> >>>>>  redundant.
> >>>>
> >>>>  You are right. I'll remove the comment and the alignment.
> >>>>
> >>>>  Thanks,
> >>>>  Bengt
> >>>>
> >>>>>
> >>>>>  Thanks,
> >>>>>  David
> >>>>>
> >>>>>  On 13/12/2012 4:42 PM, Bengt Rutisson wrote:
> >>>>>>
> >>>>>>  Hi all,
> >>>>>>
> >>>>>>  Could I have a couple of reviews for this change?
> >>>>>>
> >>>>>>  http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/
> >>>>>>
> >>>>>>  This is for a bug originally reported by the Coherence team:
> >>>>>>
> >>>>>>  7173959 : Jvm crashed during coherence exabus (tmb) testing
> >>>>>>  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959
> >>>>>>
> >>>>>>  The original analysis and proposed fix was done by Mikael Gerdin
> >>>>>>  and me
> >>>>>>  together. I'll handle the webrev and push since Mikael is on vacation
> >>>>>>  starting today. But Mikael did a great job tracking down this very
> >>>>>>  difficult bug, so he should have a large part of the credit for this
> >>>>>>  bug
> >>>>>>  fix,
> >>>>>>
> >>>>>>  Description from the CR:
> >>>>>>
> >>>>>>  The reason that we crash is due to how we re-map memory when we
> >>>>>>  want to
> >>>>>>  align it for large pages in ReservedSpace::initialize().
> >>>>>>
> >>>>>>  Here is what happens:
> >>>>>>
> >>>>>>  The reservation of memory is split up to a few steps. When we want a
> >>>>>>  chunk of memory with large pages we first just reserve some memory
> >>>>>>  large
> >>>>>>  enough for what we need. Then we realize that we want large pages,
> >>>>>>  so we
> >>>>>>  want to re-map the reserved memory to use large pages. But since this
> >>>>>>  requires that we have a large-page-aligned memory chunk we may
> >>>>>>  have to
> >>>>>>  fix the recently reserved memory chunk up a bit.
> >>>>>>
> >>>>>>  We do this in ReservedSpace::initialize() by first releasing the
> >>>>>>  memory
> >>>>>>  we just reserved. Then requesting more memory than we actually
> >>>>>>  need to
> >>>>>>  make sure that we have enough space to do manual large page
> >>>>>>  alignment.
> >>>>>>  After we have gotten this memory we figure out a nicely aligned start
> >>>>>>  address. We then release the memory again and then reserve just
> >>>>>>  enough
> >>>>>>  memory but using the aligned start address as a fixed address to make
> >>>>>>  sure that we get the memory we wanted in an aligned way.
> >>>>>>
> >>>>>>  This is done in a loop to make sure that we eventually get some
> >>>>>>  memory.
> >>>>>>  The interesting code looks like this:
> >>>>>>
> >>>>>>  do {
> >>>>>>  char* extra_base = os::reserve_memory(extra_size, NULL, alignment);
> >>>>>>  if (extra_base == NULL) return;
> >>>>>>  // Do manual alignement
> >>>>>>  base = (char*) align_size_up((uintptr_t) extra_base, alignment);
> >>>>>>  assert(base>= extra_base, "just checking");
> >>>>>>  // Re-reserve the region at the aligned base address.
> >>>>>>  os::release_memory(extra_base, extra_size); // (1)
> >>>>>>  base = os::reserve_memory(size, base); // (2)
> >>>>>>  } while (base == NULL);
> >>>>>>
> >>>>>>
> >>>>>>  There is a race here between releasing the memory in (1) and
> >>>>>>  re-reserving it in (2). But the loop is supposed to handle this race.
> >>>>>>
> >>>>>>  The problem is that on posix platforms you can remap the same memory
> >>>>>>  area several times. The call in (2) will use mmap with MAP_FIXED.
> >>>>>>  This
> >>>>>>  means that the OS will think that you know exactly what you are
> >>>>>>  doing.
> >>>>>>  So, if part of the memory has been mapped already by the process it
> >>>>>>  will
> >>>>>>  just go ahead and reuse that memory.
> >>>>>>
> >>>>>>  This means that if we are having multiple threads that do mmap. We
> >>>>>>  can
> >>>>>>  end up with a situation where we release our mapping in (1). Then
> >>>>>>  another thread comes in and maps part of the memory that we used to
> >>>>>>  have. Then we remap over that memory again in (2) with MAP_FIXED.
> >>>>>>  Now we
> >>>>>>  have a situation where two threads in our process have mapped the
> >>>>>>  same
> >>>>>>  memory. If both threads try to use it or if one of the threads unmap
> >>>>>>  part or all of the memory we will crash.
> >>>>>>
> >>>>>>  On posix it is possible to unmap any part of a mapped chunk. So, our
> >>>>>>  proposed solution to the race described above is to not unmap all
> >>>>>>  memory
> >>>>>>  in (1) but rather just unmap the section at the start and at the
> >>>>>>  end of
> >>>>>>  the chunk that we mapped to get alignment. This also removes the need
> >>>>>>  for the loop.
> >>>>>>
> >>>>>>  However, on Windows you can only unmap _all_ of the memory that you
> >>>>>>  have
> >>>>>>  mapped. On the other hand Windows also will not allow you to map over
> >>>>>>  other mappings, so the original code is actually safe. If we keep
> >>>>>>  the loop.
> >>>>>>
> >>>>>>  So, our solution is to treat this differently on Windows and posix
> >>>>>>  platforms.
> >>>>>>
> >>>>>>
> >>>>>>  Thanks,
> >>>>>>  Bengt
> >>>>
> >>>
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20121214/723cc23f/attachment.html 


More information about the hotspot-runtime-dev mailing list