<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix"><br>

      Hi John,<br>

      <br>

      Thanks for the extra explanations. They helped a lot! And I think

      your suggestions, for addressing the comments I had, sounds good.

      In particular it makes sense to treat the task queue size the same

      way in CMS and G1.<br>

      <br>

      I'll look at your updated webrev when you send it out.<br>

      <br>

      Regarding whether or not to only use VirtualSpaces on 64 bit I

      feel a bit unsure. Using VirtualSpaces already make the code more

      complex than it was before with C heap allocations. Introducing

      platform dependencies on top of that seems to create a big mess.

      If it somehow is possible to abstract the allocation away so we

      keep it in one place it might be worth treating 32 and 64 bit

      differently.<br>

      <br>

      Not sure if this is a good thought; but if we would make it

      optional to use VirtualSpaces or CHeap to support 32/64 bit

      differences, would it make sense to only use VirtualSpaces on

      Solaris? You mentioned that the out-of-C-heap issue seem to happen

      due to a Solaris bug.<br>

      <br>

      Cheers,<br>

      Bengt<br>

      <br>

      On 2012-09-26 02:10, John Cuthbertson wrote:<br>

    </div>

    <blockquote cite="mid:506247DF.80709@oracle.com" type="cite">

      <meta content="text/html; charset=ISO-8859-1"

        http-equiv="Content-Type">

      Hi Bengt,<br>

      <br>

      Thanks for the review. Replies inline...<br>

      <br>

      On 09/21/12 07:17, Bengt Rutisson wrote:

      <blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

        <meta content="text/html; charset=ISO-8859-1"

          http-equiv="Content-Type">

        <div class="moz-cite-prefix"><br>

          Hi John,<br>

          <br>

          Thanks for doing the thorough analysis and providing the

          numbers.

          Interesting that G1 does 21703 mallocs at startup whereas

          Parallel only

          does 3500...<br>

          <br>

          However, I think I need some more background on this change.

          We are

          choosing to allocating into a single VirtualSpace instead of

          using

          separate mallocs. I understand that this will reduce the

          number of

          mallocs we do, but is that really the problem? The CR says

          that we run

          out of memory due to the fact that the GC threads allocate too

          much

          memory. We will still allocate the same amount of memory just

          mmapped

          instead of malloced, right? Do we have any test cases that

          fail before

          your fix but pass now?<br>

        </div>

      </blockquote>

      We didn't run out of memory - we ran out of C heap. <br>

      <br>

      There is an issue on Solaris (I have to check whether Ramki's

      email

      describes the underlying problem) where by we asked for a 26Gb

      heap

      (which we got) but the C heap was only given 2Gb. Vladimir saw the

      issue during COOPs testing. In this situation we ran out of C heap

      trying to allocate the work queues for the marking threads.<br>

      <br>

      Azeem submitted an issue a few weeks ago as well that ran into the

      same

      problem (this time with ParallelGC/ParallelOld) and the process

      died

      while trying to allocate the PSPromotionManager instances (and

      their

      internal work queue) - see CR 7167969. I believe that 7197666 is

      similar.<br>

      <br>

      A real reduction in memory use will come from reducing the task

      queue

      size. I want to do that in a separate CR (for some or all of the

      collectors)  provided there is no real performance difference.<br>

      <br>

      <blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

        <div class="moz-cite-prefix"> <br>

          In fact, isn't the risk higher (at least on 32 bit platforms)

          that we

          fail to allocate the bitmaps now that we try to allocate them

          from one

          consecutive chunk of memory instead of several smaller ones?

          I'm

          thinking that if the memory is fragmented we might not get the

          contiguous memory we need for the VirtualSpace. But I am

          definitely no

          expert in this.<br>

        </div>

      </blockquote>

      <br>

      Good question and I wish I knew the answer. The occupancy of the

      task

      card bitmaps depends on the # of GC threads and the heap size (1

      bit

      per card) - I guess it might be possible. Should I then only be

      allocating out of the backing store in a 64-bit JVM?<br>

      <blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

        <div class="moz-cite-prefix"> <br>

          With the above reservation I think your change looks good.

          Here are a

          couple of minor comments:<br>

          <br>

          CMMarkStack::allocate(size_t size) <br>

          "size" is kind of an overloaded name for an "allocate" method.

          Could we

          call it "capacity" or "number_of_entries"? <br>

        </div>

      </blockquote>

      <br>

      Sure. Done. I renamed 'size' to 'capacity' - which matches the

      field

      name. But you bring up a good point about this confusion. If we

      look at

      the CMS version of the same routine:<br>

      <br>

      bool CMSMarkStack::allocate(size_t size) {<br>

        // allocate a stack of the requisite depth<br>

        ReservedSpace rs(ReservedSpace::allocation_align_size_up(<br>

                         size * sizeof(oop)));<br>

        if (!rs.is_reserved()) {<br>

          warning("CMSMarkStack allocation failure");<br>

          return false;<br>

        }<br>

        if (!_virtual_space.initialize(rs, rs.size())) {<br>

          warning("CMSMarkStack backing store failure");<br>

          return false;<br>

        }<br>

        assert(_virtual_space.committed_size() == rs.size(),<br>

               "didn't reserve backing store for all of CMS stack?");<br>

        _base = (oop*)(_virtual_space.low());<br>

        _index = 0;<br>

        _capacity = size;<br>

        NOT_PRODUCT(_max_depth = 0);<br>

        return true;<br>

      }<br>

      <br>

      it used the parameter as the real size in bytes of the marking

      stack.

      In G1, since we used NEW_C_HEAP_ARRAY, the size was multiplied by

      the

      size of an oop and we got a larger marking stack than requested.

      Instead of a 16 Mb stack (128 * 128K), we got a 128Mb stack (8 *

      128 *

      128K). Basically we asked for the equivalent of the task queues

      for

      another 128 threads:<br>

      <br>

      dr-evil{jcuthber}:265> ./test.sh -d64 -XX:-ZapUnusedHeapArea

      -XX:CICompilerCount=1 -XX:ParallelGCThreads=10 -Xms20g -Xmx20g

      -XX:+UseG1GC -XX:+PrintMallocStatistics<br>

      Using java runtime at:

/net/jre.us.oracle.com/p/v42/jdk/7/fcs/b147/binaries/solaris-sparcv9/jre<br>

      size: 16777216, MarkStackSize: 16777216, TASKQUEUE_SIZE: 131072<br>

      os::malloc 134217728 bytes --> 0x00000001010709e8<br>

      java version "1.7.0"<br>

      Java(TM) SE Runtime Environment (build 1.7.0-b147)<br>

      Java HotSpot(TM) 64-Bit Server VM (build

      25.0-b03-internal-fastdebug,

      mixed mode)<br>

      allocation stats: 24489 mallocs (212MB), 1147 frees (0MB), 4MB

      resrc<br>

      <br>

      Mimicking CMS is probably the way to go here.<br>

      <blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

        <div class="moz-cite-prefix"> <br>

          <br>

          <title>Snippet</title>

          In ConcurrentMark::ConcurrentMark() we use both shifting and

          division

          to accomplish the same thing on the lines after each other.

          Could we

          maybe use the same style for both cases?<br>

          <br>

            // Card liveness bitmap size (in bits) <br>

            BitMap::idx_t card_bm_size = (heap_rs.size() +

          CardTableModRefBS::card_size - 1) <br>

                                          >>

          CardTableModRefBS::card_shift; <br>

            // Card liveness bitmap size (in bytes) <br>

            size_t card_bm_size_bytes = (card_bm_size + (BitsPerByte -

          1)) /

          BitsPerByte; <br>

        </div>

      </blockquote>

      <br>

      Sure. Changed the latter one to use LogBitsPerByte.<br>

      <br>

      <blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

        <div class="moz-cite-prefix"> Also in

          ConcurrentMark::ConcurrentMark() I think  that

          "marked_bytes_size_bytes" is not such a great name. Probably

          we could

          skip the first "bytes" in "marked_bytes_size" and just call

          "marked_bytes_size_bytes" for "marked_size_bytes". <br>

        </div>

      </blockquote>

      <br>

      OK. What about just marked_bytes_size?<br>

      <blockquote cite="mid:505C76EC.5030902@oracle.com" type="cite">

        <div class="moz-cite-prefix"> <br>

          I think it would be nice to factor some of the new stuff in

          ConcurrentMark::ConcurrentMark() out into methods. Both the

          calculations of the sizes and the creation/setting of the

          bitmaps. But

          I admit that this is just a style issue. <br>

        </div>

      </blockquote>

      <br>

      OK. I'll try a few things to pretty it up.<br>

      <br>

      Expect a new webrev soon.<br>

      <br>

      JohnC<br>

    </blockquote>

    <br>

  </body>

</html>