Fwd: Re: Request for review (s): 7173959 : Jvm crashed during coherence exabus (tmb) testing
Zhengyu Gu
zhengyu.gu at oracle.com
Fri Dec 14 06:59:14 PST 2012
Looks good to me.
From NMT perspective, as long as os::xxxx_memory methods are used, you
don't need to add NMT hooks, but you do have to be careful about tagging
the memory pointers returned by os::reserve_memory() methods. But for
VirtualSpace and ReservedSpace, etc. utility classes, they are tagged by
their callers.
Thanks,
-Zhengyu
On 12/14/2012 7:44 AM, Coleen Phillimore wrote:
>
> Does NMT track this? Will this mess up NMT?
> Coleen
>
>
> -------- Original Message --------
> Subject: Re: Request for review (s): 7173959 : Jvm crashed during
> coherence exabus (tmb) testing
> Date: Fri, 14 Dec 2012 10:38:34 +1000
> From: David Holmes <david.holmes at oracle.com>
> Organization: Oracle Corporation
> To: Bengt Rutisson <bengt.rutisson at oracle.com>
> CC: hotspot-runtime-dev at openjdk.java.net
>
>
>
> Hi Bengt,
>
> That all sounds good to me.
>
> With regard to potential overflow perhaps a simple check that extra_size
> does not overflow? If that is true and extra_base is not null then I
> think all the other calculations must be valid.
>
> Thanks,
> David
>
> On 13/12/2012 10:58 PM, Bengt Rutisson wrote:
> >
> > Hi David,
> >
> > Updated webrev:
> > http://cr.openjdk.java.net/~brutisso/7173959/webrev.02/
> >
> > I moved the alignment of "size" back into ReservedSpace::initialize()
> > since we actually store the size in an instance variable (_size), so I
> > think it is a bit risky to just do the align_up in
> > os::reserve_memory_aligned(). We probably want the instance variables
> > _size and _alignment in ReservedSpace to be consistent.
> >
> > I added an assert in os::reserve_memory_aligned() to verify that the
> > size is correctly aligned. I also added the assert you suggested to
> > check that alignment is page size aligned.
> >
> > Bengt
> >
> > On 12/13/12 11:50 AM, David Holmes wrote:
> >> On 13/12/2012 8:37 PM, Bengt Rutisson wrote:
> >>>
> >>> Hi again,
> >>>
> >>> Updated webrev:
> >>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/
> >>>
> >>> I removed the comment and the alignment.
> >>>
> >>> But I did not add an assert just yet.
> >>>
> >>> At the top of ReservedSpace::initialize() we have this code:
> >>>
> >>> const size_t granularity = os::vm_allocation_granularity();
> >>> assert((size& (granularity - 1)) == 0,
> >>> "size not aligned to os::vm_allocation_granularity()");
> >>> assert((alignment& (granularity - 1)) == 0,
> >>> "alignment not aligned to os::vm_allocation_granularity()");
> >>>
> >>>
> >>> Where os::vm_allocation_granularity() is implemented as page size on all
> >>> platforms except Windows. There we look it up from the Windows API. I
> >>> assume that is a page size too, but since the Windows code in our patch
> >>> does not unmap based on the alignment it should be safe either way.
> >>>
> >>> Is this assert enough or would you like me to add an assert closer to
> >>> the code block were I did the changes?
> >>
> >> As this is a separate method now I think an assert in the method
> >> itself would not go astray.
> >>
> >> David
> >> -----
> >>
> >>> Thanks,
> >>> Bengt
> >>>
> >>>
> >>> On 12/13/12 11:02 AM, Bengt Rutisson wrote:
> >>>>
> >>>> Hi David,
> >>>>
> >>>> Thanks for looking at this!
> >>>>
> >>>> On 12/13/12 8:33 AM, David Holmes wrote:
> >>>>> Hi Bengt,
> >>>>>
> >>>>> This has to be one of the absolute best review requests I've ever
> >>>>> read :) Thank you.
> >>>>
> >>>> Wow! Thanks! :)
> >>>>
> >>>>>
> >>>>> Just a couple of queries.
> >>>>>
> >>>>> os_posix.cpp:
> >>>>>
> >>>>> Love the diagram :)
> >>>>
> >>>> It was Mikael's way of making sure we got it right.
> >>>>
> >>>>> I'm assuming that "alignment" has to be>= the underlying page size,
> >>>>> and in fact must be a multiple of the underlying page size ? (As I
> >>>>> assume you can only unmap whole numbers of pages). If so does that
> >>>>> need to be checked somewhere?
> >>>>
> >>>> Good point. I'll add an assert to check that.
> >>>>
> >>>>> In virtualSpace.cpp:
> >>>>>
> >>>>> // Reserve size large enough to do manual alignment and
> >>>>> // increase size to a multiple of the desired alignment
> >>>>> size = align_size_up(size, alignment);
> >>>>> ! base = os::reserve_memory_aligned(size, alignment);
> >>>>>
> >>>>> The comment doesn't seem necessary now, and the align_size_up seems
> >>>>> redundant.
> >>>>
> >>>> You are right. I'll remove the comment and the alignment.
> >>>>
> >>>> Thanks,
> >>>> Bengt
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> David
> >>>>>
> >>>>> On 13/12/2012 4:42 PM, Bengt Rutisson wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Could I have a couple of reviews for this change?
> >>>>>>
> >>>>>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/
> >>>>>>
> >>>>>> This is for a bug originally reported by the Coherence team:
> >>>>>>
> >>>>>> 7173959 : Jvm crashed during coherence exabus (tmb) testing
> >>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959
> >>>>>>
> >>>>>> The original analysis and proposed fix was done by Mikael Gerdin
> >>>>>> and me
> >>>>>> together. I'll handle the webrev and push since Mikael is on vacation
> >>>>>> starting today. But Mikael did a great job tracking down this very
> >>>>>> difficult bug, so he should have a large part of the credit for this
> >>>>>> bug
> >>>>>> fix,
> >>>>>>
> >>>>>> Description from the CR:
> >>>>>>
> >>>>>> The reason that we crash is due to how we re-map memory when we
> >>>>>> want to
> >>>>>> align it for large pages in ReservedSpace::initialize().
> >>>>>>
> >>>>>> Here is what happens:
> >>>>>>
> >>>>>> The reservation of memory is split up to a few steps. When we want a
> >>>>>> chunk of memory with large pages we first just reserve some memory
> >>>>>> large
> >>>>>> enough for what we need. Then we realize that we want large pages,
> >>>>>> so we
> >>>>>> want to re-map the reserved memory to use large pages. But since this
> >>>>>> requires that we have a large-page-aligned memory chunk we may
> >>>>>> have to
> >>>>>> fix the recently reserved memory chunk up a bit.
> >>>>>>
> >>>>>> We do this in ReservedSpace::initialize() by first releasing the
> >>>>>> memory
> >>>>>> we just reserved. Then requesting more memory than we actually
> >>>>>> need to
> >>>>>> make sure that we have enough space to do manual large page
> >>>>>> alignment.
> >>>>>> After we have gotten this memory we figure out a nicely aligned start
> >>>>>> address. We then release the memory again and then reserve just
> >>>>>> enough
> >>>>>> memory but using the aligned start address as a fixed address to make
> >>>>>> sure that we get the memory we wanted in an aligned way.
> >>>>>>
> >>>>>> This is done in a loop to make sure that we eventually get some
> >>>>>> memory.
> >>>>>> The interesting code looks like this:
> >>>>>>
> >>>>>> do {
> >>>>>> char* extra_base = os::reserve_memory(extra_size, NULL, alignment);
> >>>>>> if (extra_base == NULL) return;
> >>>>>> // Do manual alignement
> >>>>>> base = (char*) align_size_up((uintptr_t) extra_base, alignment);
> >>>>>> assert(base>= extra_base, "just checking");
> >>>>>> // Re-reserve the region at the aligned base address.
> >>>>>> os::release_memory(extra_base, extra_size); // (1)
> >>>>>> base = os::reserve_memory(size, base); // (2)
> >>>>>> } while (base == NULL);
> >>>>>>
> >>>>>>
> >>>>>> There is a race here between releasing the memory in (1) and
> >>>>>> re-reserving it in (2). But the loop is supposed to handle this race.
> >>>>>>
> >>>>>> The problem is that on posix platforms you can remap the same memory
> >>>>>> area several times. The call in (2) will use mmap with MAP_FIXED.
> >>>>>> This
> >>>>>> means that the OS will think that you know exactly what you are
> >>>>>> doing.
> >>>>>> So, if part of the memory has been mapped already by the process it
> >>>>>> will
> >>>>>> just go ahead and reuse that memory.
> >>>>>>
> >>>>>> This means that if we are having multiple threads that do mmap. We
> >>>>>> can
> >>>>>> end up with a situation where we release our mapping in (1). Then
> >>>>>> another thread comes in and maps part of the memory that we used to
> >>>>>> have. Then we remap over that memory again in (2) with MAP_FIXED.
> >>>>>> Now we
> >>>>>> have a situation where two threads in our process have mapped the
> >>>>>> same
> >>>>>> memory. If both threads try to use it or if one of the threads unmap
> >>>>>> part or all of the memory we will crash.
> >>>>>>
> >>>>>> On posix it is possible to unmap any part of a mapped chunk. So, our
> >>>>>> proposed solution to the race described above is to not unmap all
> >>>>>> memory
> >>>>>> in (1) but rather just unmap the section at the start and at the
> >>>>>> end of
> >>>>>> the chunk that we mapped to get alignment. This also removes the need
> >>>>>> for the loop.
> >>>>>>
> >>>>>> However, on Windows you can only unmap _all_ of the memory that you
> >>>>>> have
> >>>>>> mapped. On the other hand Windows also will not allow you to map over
> >>>>>> other mappings, so the original code is actually safe. If we keep
> >>>>>> the loop.
> >>>>>>
> >>>>>> So, our solution is to treat this differently on Windows and posix
> >>>>>> platforms.
> >>>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Bengt
> >>>>
> >>>
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20121214/723cc23f/attachment.html
More information about the hotspot-runtime-dev
mailing list