<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Bengt,<br>
<br>
Thanks for the review. Replies inline...<br>
<br>
On 09/21/12 07:17, Bengt Rutisson wrote:
<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix"><br>
Hi John,<br>
<br>
Thanks for doing the thorough analysis and providing the numbers.
Interesting that G1 does 21703 mallocs at startup whereas Parallel only
does 3500...<br>
<br>
However, I think I need some more background on this change. We are
choosing to allocating into a single VirtualSpace instead of using
separate mallocs. I understand that this will reduce the number of
mallocs we do, but is that really the problem? The CR says that we run
out of memory due to the fact that the GC threads allocate too much
memory. We will still allocate the same amount of memory just mmapped
instead of malloced, right? Do we have any test cases that fail before
your fix but pass now?<br>
</div>
</blockquote>
We didn't run out of memory - we ran out of C heap. <br>
<br>
There is an issue on Solaris (I have to check whether Ramki's email
describes the underlying problem) where by we asked for a 26Gb heap
(which we got) but the C heap was only given 2Gb. Vladimir saw the
issue during COOPs testing. In this situation we ran out of C heap
trying to allocate the work queues for the marking threads.<br>
<br>
Azeem submitted an issue a few weeks ago as well that ran into the same
problem (this time with ParallelGC/ParallelOld) and the process died
while trying to allocate the PSPromotionManager instances (and their
internal work queue) - see CR 7167969. I believe that 7197666 is
similar.<br>
<br>
A real reduction in memory use will come from reducing the task queue
size. I want to do that in a separate CR (for some or all of the
collectors) provided there is no real performance difference.<br>
<br>
<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">
<div class="moz-cite-prefix"> <br>
In fact, isn't the risk higher (at least on 32 bit platforms) that we
fail to allocate the bitmaps now that we try to allocate them from one
consecutive chunk of memory instead of several smaller ones? I'm
thinking that if the memory is fragmented we might not get the
contiguous memory we need for the VirtualSpace. But I am definitely no
expert in this.<br>
</div>
</blockquote>
<br>
Good question and I wish I knew the answer. The occupancy of the task
card bitmaps depends on the # of GC threads and the heap size (1 bit
per card) - I guess it might be possible. Should I then only be
allocating out of the backing store in a 64-bit JVM?<br>
<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">
<div class="moz-cite-prefix"> <br>
With the above reservation I think your change looks good. Here are a
couple of minor comments:<br>
<br>
CMMarkStack::allocate(size_t size) <br>
"size" is kind of an overloaded name for an "allocate" method. Could we
call it "capacity" or "number_of_entries"? <br>
</div>
</blockquote>
<br>
Sure. Done. I renamed 'size' to 'capacity' - which matches the field
name. But you bring up a good point about this confusion. If we look at
the CMS version of the same routine:<br>
<br>
bool CMSMarkStack::allocate(size_t size) {<br>
// allocate a stack of the requisite depth<br>
ReservedSpace rs(ReservedSpace::allocation_align_size_up(<br>
size * sizeof(oop)));<br>
if (!rs.is_reserved()) {<br>
warning("CMSMarkStack allocation failure");<br>
return false;<br>
}<br>
if (!_virtual_space.initialize(rs, rs.size())) {<br>
warning("CMSMarkStack backing store failure");<br>
return false;<br>
}<br>
assert(_virtual_space.committed_size() == rs.size(),<br>
"didn't reserve backing store for all of CMS stack?");<br>
_base = (oop*)(_virtual_space.low());<br>
_index = 0;<br>
_capacity = size;<br>
NOT_PRODUCT(_max_depth = 0);<br>
return true;<br>
}<br>
<br>
it used the parameter as the real size in bytes of the marking stack.
In G1, since we used NEW_C_HEAP_ARRAY, the size was multiplied by the
size of an oop and we got a larger marking stack than requested.
Instead of a 16 Mb stack (128 * 128K), we got a 128Mb stack (8 * 128 *
128K). Basically we asked for the equivalent of the task queues for
another 128 threads:<br>
<br>
dr-evil{jcuthber}:265> ./test.sh -d64 -XX:-ZapUnusedHeapArea
-XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g
-XX:+UseG1GC -XX:+PrintMallocStatistics<br>
Using java runtime at:
/net/jre.us.oracle.com/p/v42/jdk/7/fcs/b147/binaries/solaris-sparcv9/jre<br>
size: 16777216, MarkStackSize: 16777216, TASKQUEUE_SIZE: 131072<br>
os::malloc 134217728 bytes --> 0x00000001010709e8<br>
java version "1.7.0"<br>
Java(TM) SE Runtime Environment (build 1.7.0-b147)<br>
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b03-internal-fastdebug,
mixed mode)<br>
allocation stats: 24489 mallocs (212MB), 1147 frees (0MB), 4MB resrc<br>
<br>
Mimicking CMS is probably the way to go here.<br>
<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">
<div class="moz-cite-prefix"> <br>
<br>
<title>Snippet</title>
In ConcurrentMark::ConcurrentMark() we use both shifting and division
to accomplish the same thing on the lines after each other. Could we
maybe use the same style for both cases?<br>
<br>
// Card liveness bitmap size (in bits) <br>
BitMap::idx_t card_bm_size = (heap_rs.size() +
CardTableModRefBS::card_size - 1) <br>
>> CardTableModRefBS::card_shift;
<br>
// Card liveness bitmap size (in bytes) <br>
size_t card_bm_size_bytes = (card_bm_size + (BitsPerByte - 1)) /
BitsPerByte; <br>
</div>
</blockquote>
<br>
Sure. Changed the latter one to use LogBitsPerByte.<br>
<br>
<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">
<div class="moz-cite-prefix"> Also in
ConcurrentMark::ConcurrentMark() I think that
"marked_bytes_size_bytes" is not such a great name. Probably we could
skip the first "bytes" in "marked_bytes_size" and just call
"marked_bytes_size_bytes" for "marked_size_bytes". <br>
</div>
</blockquote>
<br>
OK. What about just marked_bytes_size?<br>
<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">
<div class="moz-cite-prefix"> <br>
I think it would be nice to factor some of the new stuff in
ConcurrentMark::ConcurrentMark() out into methods. Both the
calculations of the sizes and the creation/setting of the bitmaps. But
I admit that this is just a style issue. <br>
</div>
</blockquote>
<br>
OK. I'll try a few things to pretty it up.<br>
<br>
Expect a new webrev soon.<br>
<br>
JohnC<br>
</body>
</html>