<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Hi Bengt,<br>

<br>

Thanks for the review. Replies inline...<br>

<br>

On 09/21/12 07:17, Bengt Rutisson wrote:

<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

  <meta content="text/html; charset=ISO-8859-1"

 http-equiv="Content-Type">

  <div class="moz-cite-prefix"><br>

Hi John,<br>

  <br>

Thanks for doing the thorough analysis and providing the numbers.

Interesting that G1 does 21703 mallocs at startup whereas Parallel only

does 3500...<br>

  <br>

However, I think I need some more background on this change. We are

choosing to allocating into a single VirtualSpace instead of using

separate mallocs. I understand that this will reduce the number of

mallocs we do, but is that really the problem? The CR says that we run

out of memory due to the fact that the GC threads allocate too much

memory. We will still allocate the same amount of memory just mmapped

instead of malloced, right? Do we have any test cases that fail before

your fix but pass now?<br>

  </div>

</blockquote>

We didn't run out of memory - we ran out of C heap. <br>

<br>

There is an issue on Solaris (I have to check whether Ramki's email

describes the underlying problem) where by we asked for a 26Gb heap

(which we got) but the C heap was only given 2Gb. Vladimir saw the

issue during COOPs testing. In this situation we ran out of C heap

trying to allocate the work queues for the marking threads.<br>

<br>

Azeem submitted an issue a few weeks ago as well that ran into the same

problem (this time with ParallelGC/ParallelOld) and the process died

while trying to allocate the PSPromotionManager instances (and their

internal work queue) - see CR 7167969. I believe that 7197666 is

similar.<br>

<br>

A real reduction in memory use will come from reducing the task queue

size. I want to do that in a separate CR (for some or all of the

collectors)  provided there is no real performance difference.<br>

<br>

<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

  <div class="moz-cite-prefix"> <br>

In fact, isn't the risk higher (at least on 32 bit platforms) that we

fail to allocate the bitmaps now that we try to allocate them from one

consecutive chunk of memory instead of several smaller ones? I'm

thinking that if the memory is fragmented we might not get the

contiguous memory we need for the VirtualSpace. But I am definitely no

expert in this.<br>

  </div>

</blockquote>

<br>

Good question and I wish I knew the answer. The occupancy of the task

card bitmaps depends on the # of GC threads and the heap size (1 bit

per card) - I guess it might be possible. Should I then only be

allocating out of the backing store in a 64-bit JVM?<br>

<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

  <div class="moz-cite-prefix"> <br>

With the above reservation I think your change looks good. Here are a

couple of minor comments:<br>

  <br>

CMMarkStack::allocate(size_t size) <br>

"size" is kind of an overloaded name for an "allocate" method. Could we

call it "capacity" or "number_of_entries"? <br>

  </div>

</blockquote>

<br>

Sure. Done. I renamed 'size' to 'capacity' - which matches the field

name. But you bring up a good point about this confusion. If we look at

the CMS version of the same routine:<br>

<br>

bool CMSMarkStack::allocate(size_t size) {<br>

  // allocate a stack of the requisite depth<br>

  ReservedSpace rs(ReservedSpace::allocation_align_size_up(<br>

                   size * sizeof(oop)));<br>

  if (!rs.is_reserved()) {<br>

    warning("CMSMarkStack allocation failure");<br>

    return false;<br>

  }<br>

  if (!_virtual_space.initialize(rs, rs.size())) {<br>

    warning("CMSMarkStack backing store failure");<br>

    return false;<br>

  }<br>

  assert(_virtual_space.committed_size() == rs.size(),<br>

         "didn't reserve backing store for all of CMS stack?");<br>

  _base = (oop*)(_virtual_space.low());<br>

  _index = 0;<br>

  _capacity = size;<br>

  NOT_PRODUCT(_max_depth = 0);<br>

  return true;<br>

}<br>

<br>

it used the parameter as the real size in bytes of the marking stack.

In G1, since we used NEW_C_HEAP_ARRAY, the size was multiplied by the

size of an oop and we got a larger marking stack than requested.

Instead of a 16 Mb stack (128 * 128K), we got a 128Mb stack (8 * 128 *

128K). Basically we asked for the equivalent of the task queues for

another 128 threads:<br>

<br>

dr-evil{jcuthber}:265> ./test.sh -d64 -XX:-ZapUnusedHeapArea

-XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g

-XX:+UseG1GC -XX:+PrintMallocStatistics<br>

Using java runtime at:

/net/jre.us.oracle.com/p/v42/jdk/7/fcs/b147/binaries/solaris-sparcv9/jre<br>

size: 16777216, MarkStackSize: 16777216, TASKQUEUE_SIZE: 131072<br>

os::malloc 134217728 bytes --> 0x00000001010709e8<br>

java version "1.7.0"<br>

Java(TM) SE Runtime Environment (build 1.7.0-b147)<br>

Java HotSpot(TM) 64-Bit Server VM (build 25.0-b03-internal-fastdebug,

mixed mode)<br>

allocation stats: 24489 mallocs (212MB), 1147 frees (0MB), 4MB resrc<br>

<br>

Mimicking CMS is probably the way to go here.<br>

<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

  <div class="moz-cite-prefix"> <br>

  <br>

  <title>Snippet</title>

In ConcurrentMark::ConcurrentMark() we use both shifting and division

to accomplish the same thing on the lines after each other. Could we

maybe use the same style for both cases?<br>

  <br>

  // Card liveness bitmap size (in bits) <br>

  BitMap::idx_t card_bm_size = (heap_rs.size() +

CardTableModRefBS::card_size - 1) <br>

                                >> CardTableModRefBS::card_shift;

  <br>

  // Card liveness bitmap size (in bytes) <br>

  size_t card_bm_size_bytes = (card_bm_size + (BitsPerByte - 1)) /

BitsPerByte; <br>

  </div>

</blockquote>

<br>

Sure. Changed the latter one to use LogBitsPerByte.<br>

<br>

<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

  <div class="moz-cite-prefix"> Also in

ConcurrentMark::ConcurrentMark() I think  that

"marked_bytes_size_bytes" is not such a great name. Probably we could

skip the first "bytes" in "marked_bytes_size" and just call

"marked_bytes_size_bytes" for "marked_size_bytes". <br>

  </div>

</blockquote>

<br>

OK. What about just marked_bytes_size?<br>

<blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

  <div class="moz-cite-prefix"> <br>

I think it would be nice to factor some of the new stuff in

ConcurrentMark::ConcurrentMark() out into methods. Both the

calculations of the sizes and the creation/setting of the bitmaps. But

I admit that this is just a style issue. <br>

  </div>

</blockquote>

<br>

OK. I'll try a few things to pretty it up.<br>

<br>

Expect a new webrev soon.<br>

<br>

JohnC<br>

</body>

</html>