Request for review (s): 7173959 : Jvm crashed during coherence exabus (tmb) testing

Fri Dec 14 07:24:30 PST 2012

Hi Coleen,

Thanks for looking at this!

On 12/14/12 2:21 PM, Coleen Phillimore wrote:
> On 12/14/2012 3:48 AM, Bengt Rutisson wrote:
>>
>> Hi again Runtime team,
>>
>> I need one more review for this change. It is a P1 bug with a fairly 
>> small change. Any takers?
>>
>> Latest webrev based on comments from David Holmes and Vitaly Davidovich:
>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.03/
>
> Hi Bengt,   This code looks good to me but I asked Zhengyu to check it 
> to make sure it doesn't have any bad NMT interactions.

Thanks! He just replied.
>
> Also the writeup is great below, but can you add these comments to the 
> code in os_posix.cpp.   I had to go back to the email to figure out 
> why you are doing this.

I'm not really sure where to put such a comment. I think the writeup I 
did mostly makes sense when you compare it to the old code. Also, the 
affected code is in the middle of ReservedSpace::initialize(). I think 
it would look strange with a large comment there. In the 
os_posix/windows files the comment does not make much sense in my opinion.

I could add a comment in ReservedSpace::initialize() that refers to the 
Jira issue. But I'm not a big fan of such comments.

>
> I don't understand why there are racing threads.   I thought all the 
> heap space was initialized at the start of the vm when it is single 
> threaded?

I would have to look up exactly when we set up the heap to see if we are 
single threaded at that point. The issue we saw was with setting up the 
GC bitmaps. At that point we are multithreaded and we are for example 
mapping memory for the GC worker threads at the same time.

Thanks again for looking at this!
Bengt

>
> thanks,
> Coleen
>
>>
>> Thanks,
>> Bengt
>>
>>
>> On 12/14/12 9:29 AM, Bengt Rutisson wrote:
>>>
>>> Hi David,
>>>
>>> On 12/14/12 1:38 AM, David Holmes wrote:
>>>> Hi Bengt,
>>>>
>>>> That all sounds good to me.
>>>>
>>>> With regard to potential overflow perhaps a simple check that 
>>>> extra_size does not overflow? If that is true and extra_base is not 
>>>> null then I think all the other calculations must be valid.
>>>
>>> Yes, that makes sense. I'll add an overflow check for the extra_size.
>>>
>>> Thanks,
>>> Bengt
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 13/12/2012 10:58 PM, Bengt Rutisson wrote:
>>>>>
>>>>> Hi David,
>>>>>
>>>>> Updated webrev:
>>>>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.02/
>>>>>
>>>>> I moved the alignment of "size" back into ReservedSpace::initialize()
>>>>> since we actually store the size in an instance variable (_size), 
>>>>> so I
>>>>> think it is a bit risky to just do the align_up in
>>>>> os::reserve_memory_aligned(). We probably want the instance variables
>>>>> _size and _alignment in ReservedSpace to be consistent.
>>>>>
>>>>> I added an assert in os::reserve_memory_aligned() to verify that the
>>>>> size is correctly aligned. I also added the assert you suggested to
>>>>> check that alignment is page size aligned.
>>>>>
>>>>> Bengt
>>>>>
>>>>> On 12/13/12 11:50 AM, David Holmes wrote:
>>>>>> On 13/12/2012 8:37 PM, Bengt Rutisson wrote:
>>>>>>>
>>>>>>> Hi again,
>>>>>>>
>>>>>>> Updated webrev:
>>>>>>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/
>>>>>>>
>>>>>>> I removed the comment and the alignment.
>>>>>>>
>>>>>>> But I did not add an assert just yet.
>>>>>>>
>>>>>>> At the top of ReservedSpace::initialize() we have this code:
>>>>>>>
>>>>>>> const size_t granularity = os::vm_allocation_granularity();
>>>>>>> assert((size & (granularity - 1)) == 0,
>>>>>>> "size not aligned to os::vm_allocation_granularity()");
>>>>>>> assert((alignment & (granularity - 1)) == 0,
>>>>>>> "alignment not aligned to os::vm_allocation_granularity()");
>>>>>>>
>>>>>>>
>>>>>>> Where os::vm_allocation_granularity() is implemented as page 
>>>>>>> size on all
>>>>>>> platforms except Windows. There we look it up from the Windows 
>>>>>>> API. I
>>>>>>> assume that is a page size too, but since the Windows code in 
>>>>>>> our patch
>>>>>>> does not unmap based on the alignment it should be safe either way.
>>>>>>>
>>>>>>> Is this assert enough or would you like me to add an assert 
>>>>>>> closer to
>>>>>>> the code block were I did the changes?
>>>>>>
>>>>>> As this is a separate method now I think an assert in the method
>>>>>> itself would not go astray.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thanks,
>>>>>>> Bengt
>>>>>>>
>>>>>>>
>>>>>>> On 12/13/12 11:02 AM, Bengt Rutisson wrote:
>>>>>>>>
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Thanks for looking at this!
>>>>>>>>
>>>>>>>> On 12/13/12 8:33 AM, David Holmes wrote:
>>>>>>>>> Hi Bengt,
>>>>>>>>>
>>>>>>>>> This has to be one of the absolute best review requests I've ever
>>>>>>>>> read :) Thank you.
>>>>>>>>
>>>>>>>> Wow! Thanks! :)
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Just a couple of queries.
>>>>>>>>>
>>>>>>>>> os_posix.cpp:
>>>>>>>>>
>>>>>>>>> Love the diagram :)
>>>>>>>>
>>>>>>>> It was Mikael's way of making sure we got it right.
>>>>>>>>
>>>>>>>>> I'm assuming that "alignment" has to be >= the underlying page 
>>>>>>>>> size,
>>>>>>>>> and in fact must be a multiple of the underlying page size ? 
>>>>>>>>> (As I
>>>>>>>>> assume you can only unmap whole numbers of pages). If so does 
>>>>>>>>> that
>>>>>>>>> need to be checked somewhere?
>>>>>>>>
>>>>>>>> Good point. I'll add an assert to check that.
>>>>>>>>
>>>>>>>>> In virtualSpace.cpp:
>>>>>>>>>
>>>>>>>>> // Reserve size large enough to do manual alignment and
>>>>>>>>> // increase size to a multiple of the desired alignment
>>>>>>>>> size = align_size_up(size, alignment);
>>>>>>>>> ! base = os::reserve_memory_aligned(size, alignment);
>>>>>>>>>
>>>>>>>>> The comment doesn't seem necessary now, and the align_size_up 
>>>>>>>>> seems
>>>>>>>>> redundant.
>>>>>>>>
>>>>>>>> You are right. I'll remove the comment and the alignment.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Bengt
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 13/12/2012 4:42 PM, Bengt Rutisson wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Could I have a couple of reviews for this change?
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/
>>>>>>>>>>
>>>>>>>>>> This is for a bug originally reported by the Coherence team:
>>>>>>>>>>
>>>>>>>>>> 7173959 : Jvm crashed during coherence exabus (tmb) testing
>>>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959
>>>>>>>>>>
>>>>>>>>>> The original analysis and proposed fix was done by Mikael Gerdin
>>>>>>>>>> and me
>>>>>>>>>> together. I'll handle the webrev and push since Mikael is on 
>>>>>>>>>> vacation
>>>>>>>>>> starting today. But Mikael did a great job tracking down this 
>>>>>>>>>> very
>>>>>>>>>> difficult bug, so he should have a large part of the credit 
>>>>>>>>>> for this
>>>>>>>>>> bug
>>>>>>>>>> fix,
>>>>>>>>>>
>>>>>>>>>> Description from the CR:
>>>>>>>>>>
>>>>>>>>>> The reason that we crash is due to how we re-map memory when we
>>>>>>>>>> want to
>>>>>>>>>> align it for large pages in ReservedSpace::initialize().
>>>>>>>>>>
>>>>>>>>>> Here is what happens:
>>>>>>>>>>
>>>>>>>>>> The reservation of memory is split up to a few steps. When we 
>>>>>>>>>> want a
>>>>>>>>>> chunk of memory with large pages we first just reserve some 
>>>>>>>>>> memory
>>>>>>>>>> large
>>>>>>>>>> enough for what we need. Then we realize that we want large 
>>>>>>>>>> pages,
>>>>>>>>>> so we
>>>>>>>>>> want to re-map the reserved memory to use large pages. But 
>>>>>>>>>> since this
>>>>>>>>>> requires that we have a large-page-aligned memory chunk we may
>>>>>>>>>> have to
>>>>>>>>>> fix the recently reserved memory chunk up a bit.
>>>>>>>>>>
>>>>>>>>>> We do this in ReservedSpace::initialize() by first releasing the
>>>>>>>>>> memory
>>>>>>>>>> we just reserved. Then requesting more memory than we actually
>>>>>>>>>> need to
>>>>>>>>>> make sure that we have enough space to do manual large page
>>>>>>>>>> alignment.
>>>>>>>>>> After we have gotten this memory we figure out a nicely 
>>>>>>>>>> aligned start
>>>>>>>>>> address. We then release the memory again and then reserve just
>>>>>>>>>> enough
>>>>>>>>>> memory but using the aligned start address as a fixed address 
>>>>>>>>>> to make
>>>>>>>>>> sure that we get the memory we wanted in an aligned way.
>>>>>>>>>>
>>>>>>>>>> This is done in a loop to make sure that we eventually get some
>>>>>>>>>> memory.
>>>>>>>>>> The interesting code looks like this:
>>>>>>>>>>
>>>>>>>>>> do {
>>>>>>>>>> char* extra_base = os::reserve_memory(extra_size, NULL, 
>>>>>>>>>> alignment);
>>>>>>>>>> if (extra_base == NULL) return;
>>>>>>>>>> // Do manual alignement
>>>>>>>>>> base = (char*) align_size_up((uintptr_t) extra_base, alignment);
>>>>>>>>>> assert(base >= extra_base, "just checking");
>>>>>>>>>> // Re-reserve the region at the aligned base address.
>>>>>>>>>> os::release_memory(extra_base, extra_size); // (1)
>>>>>>>>>> base = os::reserve_memory(size, base); // (2)
>>>>>>>>>> } while (base == NULL);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There is a race here between releasing the memory in (1) and
>>>>>>>>>> re-reserving it in (2). But the loop is supposed to handle 
>>>>>>>>>> this race.
>>>>>>>>>>
>>>>>>>>>> The problem is that on posix platforms you can remap the same 
>>>>>>>>>> memory
>>>>>>>>>> area several times. The call in (2) will use mmap with 
>>>>>>>>>> MAP_FIXED.
>>>>>>>>>> This
>>>>>>>>>> means that the OS will think that you know exactly what you are
>>>>>>>>>> doing.
>>>>>>>>>> So, if part of the memory has been mapped already by the 
>>>>>>>>>> process it
>>>>>>>>>> will
>>>>>>>>>> just go ahead and reuse that memory.
>>>>>>>>>>
>>>>>>>>>> This means that if we are having multiple threads that do 
>>>>>>>>>> mmap. We
>>>>>>>>>> can
>>>>>>>>>> end up with a situation where we release our mapping in (1). 
>>>>>>>>>> Then
>>>>>>>>>> another thread comes in and maps part of the memory that we 
>>>>>>>>>> used to
>>>>>>>>>> have. Then we remap over that memory again in (2) with 
>>>>>>>>>> MAP_FIXED.
>>>>>>>>>> Now we
>>>>>>>>>> have a situation where two threads in our process have mapped 
>>>>>>>>>> the
>>>>>>>>>> same
>>>>>>>>>> memory. If both threads try to use it or if one of the 
>>>>>>>>>> threads unmap
>>>>>>>>>> part or all of the memory we will crash.
>>>>>>>>>>
>>>>>>>>>> On posix it is possible to unmap any part of a mapped chunk. 
>>>>>>>>>> So, our
>>>>>>>>>> proposed solution to the race described above is to not unmap 
>>>>>>>>>> all
>>>>>>>>>> memory
>>>>>>>>>> in (1) but rather just unmap the section at the start and at the
>>>>>>>>>> end of
>>>>>>>>>> the chunk that we mapped to get alignment. This also removes 
>>>>>>>>>> the need
>>>>>>>>>> for the loop.
>>>>>>>>>>
>>>>>>>>>> However, on Windows you can only unmap _all_ of the memory 
>>>>>>>>>> that you
>>>>>>>>>> have
>>>>>>>>>> mapped. On the other hand Windows also will not allow you to 
>>>>>>>>>> map over
>>>>>>>>>> other mappings, so the original code is actually safe. If we 
>>>>>>>>>> keep
>>>>>>>>>> the loop.
>>>>>>>>>>
>>>>>>>>>> So, our solution is to treat this differently on Windows and 
>>>>>>>>>> posix
>>>>>>>>>> platforms.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Bengt
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>