<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix"><br>
Hi all,<br>
<br>
One final update on this. I pushed this change both in to the
hotspot-gc repository and to the hs24 repository:<br>
<br>
<a class="moz-txt-link-freetext" href="http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/730cc4ddd550">http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/730cc4ddd550</a><br>
<a class="moz-txt-link-freetext" href="http://hg.openjdk.java.net/hsx/hsx24/hotspot/rev/e1d9b04b560b">http://hg.openjdk.java.net/hsx/hsx24/hotspot/rev/e1d9b04b560b</a><br>
<br>
But I forgot to add "Contributed-by" for Mikael Gerdin (
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
mgerdin). That should definitely have been the case since he
deserves a large part of the credit for this fix.<br>
<br>
Hopefully this email message will be of some help in recording his
work for this fix for future reference.<br>
<br>
Bengt<br>
<br>
On 12/17/12 8:36 AM, Bengt Rutisson wrote:<br>
</div>
<blockquote cite="mid:50CECB6F.3050806@oracle.com" type="cite">
<br>
Hi Coleen,
<br>
<br>
Thanks for suggesting the comments. I added them to the os files.
<br>
<br>
<br>
David, Coleen, Zhengyu and Vitaly,
<br>
<br>
Thanks for the reviews! All set to push this now.
<br>
<br>
Bengt
<br>
<br>
<br>
On 12/14/12 4:49 PM, Coleen Phillimore wrote:
<br>
<blockquote type="cite">On 12/14/2012 10:24 AM, Bengt Rutisson
wrote:
<br>
<blockquote type="cite">
<br>
Hi Coleen,
<br>
<br>
Thanks for looking at this!
<br>
<br>
On 12/14/12 2:21 PM, Coleen Phillimore wrote:
<br>
<blockquote type="cite">On 12/14/2012 3:48 AM, Bengt Rutisson
wrote:
<br>
<blockquote type="cite">
<br>
Hi again Runtime team,
<br>
<br>
I need one more review for this change. It is a P1 bug
with a fairly small change. Any takers?
<br>
<br>
Latest webrev based on comments from David Holmes and
Vitaly Davidovich:
<br>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~brutisso/7173959/webrev.03/">http://cr.openjdk.java.net/~brutisso/7173959/webrev.03/</a>
<br>
</blockquote>
<br>
Hi Bengt, This code looks good to me but I asked Zhengyu
to check it to make sure it doesn't have any bad NMT
interactions.
<br>
</blockquote>
<br>
Thanks! He just replied.
<br>
<blockquote type="cite">
<br>
Also the writeup is great below, but can you add these
comments to the code in os_posix.cpp. I had to go back to
the email to figure out why you are doing this.
<br>
</blockquote>
<br>
I'm not really sure where to put such a comment. I think the
writeup I did mostly makes sense when you compare it to the
old code. Also, the affected code is in the middle of
ReservedSpace::initialize(). I think it would look strange
with a large comment there. In the os_posix/windows files the
comment does not make much sense in my opinion.
<br>
<br>
I could add a comment in ReservedSpace::initialize() that
refers to the Jira issue. But I'm not a big fan of such
comments.
<br>
</blockquote>
<br>
I agree we shouldn't point to bugs in the source code. A
comment at the top of the os_posix version of this function that
paraphrases below why you are not re-reserving in a loop like
windows would be sufficient. That trimming memory at either
end is allowed on posix-like os's and is thread-safe, something
like:
<br>
<br>
// Multiple threads can race in this code, and can remap over
each other with MAP_FIXED, so on posix, unmap the section
<br>
// at the start and at the end of the chunk that we mapped to
get requested alignment.
<br>
<br>
os_windows:
<br>
<br>
// Multiple threads can race in this code but it's not possible
to remap with a section of virtual space to get requested
alignment, like posix-like os's.
<br>
// Windows prevents multiple thread from remapping over each
other so this loop is thread-safe.
<br>
<br>
I just want something to document why they are different.
<br>
<br>
<blockquote type="cite">
<br>
<blockquote type="cite">
<br>
I don't understand why there are racing threads. I thought
all the heap space was initialized at the start of the vm
when it is single threaded?
<br>
</blockquote>
<br>
I would have to look up exactly when we set up the heap to see
if we are single threaded at that point. The issue we saw was
with setting up the GC bitmaps. At that point we are
multithreaded and we are for example mapping memory for the GC
worker threads at the same time.
<br>
</blockquote>
<br>
I see.
<br>
<br>
Coleen
<br>
<blockquote type="cite">
<br>
<br>
Thanks again for looking at this!
<br>
Bengt
<br>
<br>
<blockquote type="cite">
<br>
thanks,
<br>
Coleen
<br>
<br>
<blockquote type="cite">
<br>
Thanks,
<br>
Bengt
<br>
<br>
<br>
On 12/14/12 9:29 AM, Bengt Rutisson wrote:
<br>
<blockquote type="cite">
<br>
Hi David,
<br>
<br>
On 12/14/12 1:38 AM, David Holmes wrote:
<br>
<blockquote type="cite">Hi Bengt,
<br>
<br>
That all sounds good to me.
<br>
<br>
With regard to potential overflow perhaps a simple
check that extra_size does not overflow? If that is
true and extra_base is not null then I think all the
other calculations must be valid.
<br>
</blockquote>
<br>
Yes, that makes sense. I'll add an overflow check for
the extra_size.
<br>
<br>
Thanks,
<br>
Bengt
<br>
<br>
<blockquote type="cite">
<br>
Thanks,
<br>
David
<br>
<br>
On 13/12/2012 10:58 PM, Bengt Rutisson wrote:
<br>
<blockquote type="cite">
<br>
Hi David,
<br>
<br>
Updated webrev:
<br>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~brutisso/7173959/webrev.02/">http://cr.openjdk.java.net/~brutisso/7173959/webrev.02/</a>
<br>
<br>
I moved the alignment of "size" back into
ReservedSpace::initialize()
<br>
since we actually store the size in an instance
variable (_size), so I
<br>
think it is a bit risky to just do the align_up in
<br>
os::reserve_memory_aligned(). We probably want the
instance variables
<br>
_size and _alignment in ReservedSpace to be
consistent.
<br>
<br>
I added an assert in os::reserve_memory_aligned() to
verify that the
<br>
size is correctly aligned. I also added the assert
you suggested to
<br>
check that alignment is page size aligned.
<br>
<br>
Bengt
<br>
<br>
On 12/13/12 11:50 AM, David Holmes wrote:
<br>
<blockquote type="cite">On 13/12/2012 8:37 PM, Bengt
Rutisson wrote:
<br>
<blockquote type="cite">
<br>
Hi again,
<br>
<br>
Updated webrev:
<br>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/">http://cr.openjdk.java.net/~brutisso/7173959/webrev.01/</a>
<br>
<br>
I removed the comment and the alignment.
<br>
<br>
But I did not add an assert just yet.
<br>
<br>
At the top of ReservedSpace::initialize() we
have this code:
<br>
<br>
const size_t granularity =
os::vm_allocation_granularity();
<br>
assert((size & (granularity - 1)) == 0,
<br>
"size not aligned to
os::vm_allocation_granularity()");
<br>
assert((alignment & (granularity - 1)) == 0,
<br>
"alignment not aligned to
os::vm_allocation_granularity()");
<br>
<br>
<br>
Where os::vm_allocation_granularity() is
implemented as page size on all
<br>
platforms except Windows. There we look it up
from the Windows API. I
<br>
assume that is a page size too, but since the
Windows code in our patch
<br>
does not unmap based on the alignment it should
be safe either way.
<br>
<br>
Is this assert enough or would you like me to
add an assert closer to
<br>
the code block were I did the changes?
<br>
</blockquote>
<br>
As this is a separate method now I think an assert
in the method
<br>
itself would not go astray.
<br>
<br>
David
<br>
-----
<br>
<br>
<blockquote type="cite">Thanks,
<br>
Bengt
<br>
<br>
<br>
On 12/13/12 11:02 AM, Bengt Rutisson wrote:
<br>
<blockquote type="cite">
<br>
Hi David,
<br>
<br>
Thanks for looking at this!
<br>
<br>
On 12/13/12 8:33 AM, David Holmes wrote:
<br>
<blockquote type="cite">Hi Bengt,
<br>
<br>
This has to be one of the absolute best
review requests I've ever
<br>
read :) Thank you.
<br>
</blockquote>
<br>
Wow! Thanks! :)
<br>
<br>
<blockquote type="cite">
<br>
Just a couple of queries.
<br>
<br>
os_posix.cpp:
<br>
<br>
Love the diagram :)
<br>
</blockquote>
<br>
It was Mikael's way of making sure we got it
right.
<br>
<br>
<blockquote type="cite">I'm assuming that
"alignment" has to be >= the underlying
page size,
<br>
and in fact must be a multiple of the
underlying page size ? (As I
<br>
assume you can only unmap whole numbers of
pages). If so does that
<br>
need to be checked somewhere?
<br>
</blockquote>
<br>
Good point. I'll add an assert to check that.
<br>
<br>
<blockquote type="cite">In virtualSpace.cpp:
<br>
<br>
// Reserve size large enough to do manual
alignment and
<br>
// increase size to a multiple of the
desired alignment
<br>
size = align_size_up(size, alignment);
<br>
! base = os::reserve_memory_aligned(size,
alignment);
<br>
<br>
The comment doesn't seem necessary now, and
the align_size_up seems
<br>
redundant.
<br>
</blockquote>
<br>
You are right. I'll remove the comment and the
alignment.
<br>
<br>
Thanks,
<br>
Bengt
<br>
<br>
<blockquote type="cite">
<br>
Thanks,
<br>
David
<br>
<br>
On 13/12/2012 4:42 PM, Bengt Rutisson wrote:
<br>
<blockquote type="cite">
<br>
Hi all,
<br>
<br>
Could I have a couple of reviews for this
change?
<br>
<br>
<a class="moz-txt-link-freetext" href="http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/">http://cr.openjdk.java.net/~brutisso/7173959/webrev.00/</a>
<br>
<br>
This is for a bug originally reported by
the Coherence team:
<br>
<br>
7173959 : Jvm crashed during coherence
exabus (tmb) testing
<br>
<a class="moz-txt-link-freetext" href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959">http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7173959</a>
<br>
<br>
The original analysis and proposed fix was
done by Mikael Gerdin
<br>
and me
<br>
together. I'll handle the webrev and push
since Mikael is on vacation
<br>
starting today. But Mikael did a great job
tracking down this very
<br>
difficult bug, so he should have a large
part of the credit for this
<br>
bug
<br>
fix,
<br>
<br>
Description from the CR:
<br>
<br>
The reason that we crash is due to how we
re-map memory when we
<br>
want to
<br>
align it for large pages in
ReservedSpace::initialize().
<br>
<br>
Here is what happens:
<br>
<br>
The reservation of memory is split up to a
few steps. When we want a
<br>
chunk of memory with large pages we first
just reserve some memory
<br>
large
<br>
enough for what we need. Then we realize
that we want large pages,
<br>
so we
<br>
want to re-map the reserved memory to use
large pages. But since this
<br>
requires that we have a large-page-aligned
memory chunk we may
<br>
have to
<br>
fix the recently reserved memory chunk up
a bit.
<br>
<br>
We do this in ReservedSpace::initialize()
by first releasing the
<br>
memory
<br>
we just reserved. Then requesting more
memory than we actually
<br>
need to
<br>
make sure that we have enough space to do
manual large page
<br>
alignment.
<br>
After we have gotten this memory we figure
out a nicely aligned start
<br>
address. We then release the memory again
and then reserve just
<br>
enough
<br>
memory but using the aligned start address
as a fixed address to make
<br>
sure that we get the memory we wanted in
an aligned way.
<br>
<br>
This is done in a loop to make sure that
we eventually get some
<br>
memory.
<br>
The interesting code looks like this:
<br>
<br>
do {
<br>
char* extra_base =
os::reserve_memory(extra_size, NULL,
alignment);
<br>
if (extra_base == NULL) return;
<br>
// Do manual alignement
<br>
base = (char*) align_size_up((uintptr_t)
extra_base, alignment);
<br>
assert(base >= extra_base, "just
checking");
<br>
// Re-reserve the region at the aligned
base address.
<br>
os::release_memory(extra_base,
extra_size); // (1)
<br>
base = os::reserve_memory(size, base); //
(2)
<br>
} while (base == NULL);
<br>
<br>
<br>
There is a race here between releasing the
memory in (1) and
<br>
re-reserving it in (2). But the loop is
supposed to handle this race.
<br>
<br>
The problem is that on posix platforms you
can remap the same memory
<br>
area several times. The call in (2) will
use mmap with MAP_FIXED.
<br>
This
<br>
means that the OS will think that you know
exactly what you are
<br>
doing.
<br>
So, if part of the memory has been mapped
already by the process it
<br>
will
<br>
just go ahead and reuse that memory.
<br>
<br>
This means that if we are having multiple
threads that do mmap. We
<br>
can
<br>
end up with a situation where we release
our mapping in (1). Then
<br>
another thread comes in and maps part of
the memory that we used to
<br>
have. Then we remap over that memory again
in (2) with MAP_FIXED.
<br>
Now we
<br>
have a situation where two threads in our
process have mapped the
<br>
same
<br>
memory. If both threads try to use it or
if one of the threads unmap
<br>
part or all of the memory we will crash.
<br>
<br>
On posix it is possible to unmap any part
of a mapped chunk. So, our
<br>
proposed solution to the race described
above is to not unmap all
<br>
memory
<br>
in (1) but rather just unmap the section
at the start and at the
<br>
end of
<br>
the chunk that we mapped to get alignment.
This also removes the need
<br>
for the loop.
<br>
<br>
However, on Windows you can only unmap
_all_ of the memory that you
<br>
have
<br>
mapped. On the other hand Windows also
will not allow you to map over
<br>
other mappings, so the original code is
actually safe. If we keep
<br>
the loop.
<br>
<br>
So, our solution is to treat this
differently on Windows and posix
<br>
platforms.
<br>
<br>
<br>
Thanks,
<br>
Bengt
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>