<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi all,</p>

    <p><br>

    </p>

    <p>First, thanks to all for your help. We're confirming about TPH

      with our sysadmins, but it is by default enabled on RHEL 6 as we

      understand.</p>

    <p>Unfortunately, this log was collected when the traffic was low

      for our standards, but we saw that the live set was going up and

      old generation peaks at about 180GB when traffic is full.</p>

    <p>Will try your suggestion and will post again with the results if

      you're OK with this.</p>

    <p>Thanks again<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 2017-10-20 02:34 PM, Kirk Pepperdine

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:AB73EFF2-BE4A-4938-8645-E5471CF39BA2@gmail.com">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      Hi Hector,

      <div class=""><br class="">

      </div>

      <div class="">This log lit up our CPU analytic. Out of 3042 GC

        pause events, kernel times exceeded User times <font

          class="code">25</font> times and GC threads collected an

        abnormal amount of kernel time 352 times.</div>

      <div class="">There could be a number of reasons for this. If the

        I/O channels are overloaded then writing perf data to temp could

        result in kernel time being accumulated. Transparent Huge Pages

        is also a common issue. Given the distribution of the problem,

        this could be likely.</div>

      <div class=""><br class="">

      </div>

      <div class="">The heap appears to be large, maybe even too large.

        That said, the live set is increasing over time. However, the

        runtime isn’t long enough to say if this is a problem or not.</div>

      <div class=""><br class="">

      </div>

      <div class="">The real issue allocation rates are insanely high.

        To top that off, there is a problem with promotions that suggest

        young is being resized too small. I don’t see the run up in

        memory that you were seeing but my belief is that the high

        allocation rate with the undersized young would be responsible

        for this. You survive because IHOP at 15% means your making up

        for the small young by running more mixed collections. This is

        all going to add to overhead.</div>

      <div class=""><br class="">

      </div>

      <div class="">The dominating cost in the collection is

        consistently object copy. If set the min size of young to a

        larger value you might be able to reduce the overall overhead by

        simply reducing the volume of data being prematurely promotion

        to tenured. The big win will come with a reduction in allocation

        rates.</div>

      <div class=""><br class="">

      </div>

      <div class="">This is a start.</div>

      <div class=""><br class="">

      </div>

      <div class="">Kind regards,</div>

      <div class="">Kirk Pepperdine</div>

      <div class=""><br class="">

      </div>

      <div class=""><br class="">

      </div>

      <div class=""><br class="">

      </div>

      <div class="">

        <div>

          <blockquote type="cite" class="">

            <div class="">On Oct 20, 2017, at 5:43 PM, Hector Caballero

              <<a href="mailto:hector.caballero@ericsson.com"

                class="" moz-do-not-send="true">hector.caballero@ericsson.com</a>>

              wrote:</div>

            <br class="Apple-interchange-newline">

            <div class="">

              <div class="">Hi,<br class="">

                <br class="">

                We've been troubleshooting a big traffic application for

                some time now and have discovered GC times are impacting

                performance. Next paragraph just to give some context:<br

                  class="">

                <br class="">

                We are using G1 GC with JDK 8u131 on a RHEL 6 server

                with latest patches and these JVM parameters:<br

                  class="">

                <br class="">

                        javaOptions = -Xms128g<br class="">

                        javaOptions = -Xmx128g<br class="">

                        javaOptions = -XX:MetaspaceSize=128m<br class="">

                        javaOptions = -XX:+UseG1GC<br class="">

                        javaOptions =

                -XX:InitiatingHeapOccupancyPercent=15<br class="">

                        javaOptions = -XX:MaxGCPauseMillis=200<br

                  class="">

                        javaOptions = -XX:+ParallelRefProcEnabled<br

                  class="">

                        javaOptions = -XX:SoftRefLRUPolicyMSPerMB=0<br

                  class="">

                        javaOptions = -XX:ParallelGCThreads=24<br

                  class="">

                        javaOptions = -XX:+PrintGCDetails<br class="">

                        javaOptions = -XX:+PrintGCDateStamps<br class="">

                        javaOptions = -XX:+PrintTenuringDistribution<br

                  class="">

                        javaOptions = -XX:+PrintAdaptiveSizePolicy<br

                  class="">

                        javaOptions = -XX:+PrintReferenceGC<br class="">

                        javaOptions = -XX:+UnlockDiagnosticVMOptions<br

                  class="">

                        javaOptions = -XX:+G1SummarizeRSetStats<br

                  class="">

                        javaOptions =

                -Xloggc:/opt/gerrit/review_site/logs/jvm/jvm_gc-%t.log<br

                  class="">

                        javaOptions = -XX:+UseGCLogFileRotation<br

                  class="">

                        javaOptions = -XX:GCLogFileSize=40M<br class="">

                        javaOptions = -XX:NumberOfGCLogFiles=20<br

                  class="">

                        javaOptions =

                -Djava.util.Arrays.useLegacyMergeSort=true<br class="">

                <br class="">

                and the system was stable. Some precision: The IHOP

                value was reduced as a way to avoid having full GCs that

                were killing us and it worked well for this purpose. The

                only problem we saw with these parameters was a bit of

                impact on throughput but, as said, the system was still

                very usable.<br class="">

                <br class="">

                After upgrading the application to latest version, the

                memory pattern changed dramatically and full GCs went

                back making the application unusable. The main change we

                saw was a very fast increase in old generation (up to

                115GB) while young generation kept low. As an urgent

                measure, the system memory was upgraded and the heap

                size put to 256GB with which there was a dramatic

                improvement in application performance but we're still

                seeing a big regression in some operations that are

                taking 3-4 times more. We understand we need to look at

                the code itself (this application is an open source one)

                to see how object creation has changed, but we have a

                lot of pressure of users and management to solve this

                right now so changes in code need to wait a bit. Our

                goal right now is to make the application as usable as

                possible tuning memory usage to give us a bit of room to

                start looking at the code.<br class="">

                <br class="">

                The increase in heap size led to very low throughput,

                around 70%. In order to try to improve it, we decided to

                relax pause goal to 300ms. Given doing this didn't

                affect much the application (at least it was not worse)

                and things improved a bit but not enough. Next we

                decided to try to start collecting a bit later to reduce

                the number of mixed collections so we increase IHOP to

                18% (very modest modification). After that, looking at

                the GC logs we saw that, without having full GCs, there

                are young collections taking long time (up to a minute

                in a case). In some of those cases, the sys time was a

                lot higher than the user time, which is kind of strange

                and we haven't found much information about the causes

                of this but we see is affecting us a lot:<br class="">

                <br class="">

                Timestamp    User Time (secs)    Sys Time (secs)    Real

                Time (secs)<br class="">

                2017-10-20T14:19:41    1.68    9.5    0.48<br class="">

                2017-10-20T14:19:59    2.26    9.56    0.56<br class="">

                2017-10-20T14:34:45    12.26    14.01    1.23<br

                  class="">

                2017-10-20T15:26:33    9.82    31.73    2.39<br class="">

                2017-10-20T15:26:37    15.68    46.47    3.19<br

                  class="">

                2017-10-20T15:26:41    10.11    39.64    2.44<br

                  class="">

                2017-10-20T15:28:18    15.53    45.14    3.08<br

                  class="">

                2017-10-20T15:29:01    5.4    41.63    2.43<br class="">

                2017-10-20T15:29:05    74.69    93.91    11.51<br

                  class="">

                2017-10-20T15:29:18    80.67    137.35    18.38<br

                  class="">

                2017-10-20T15:30:55    19.65    23.99    2.16<br

                  class="">

                2017-10-20T15:31:03    11.03    29.28    1.82<br

                  class="">

                2017-10-20T15:31:28    4.88    8.44    0.77<br class="">

                2017-10-20T15:31:30    1.58    59.97    3.21<br class="">

                2017-10-20T15:31:35    57.13    135.82    14.96<br

                  class="">

                2017-10-20T15:32:23    25.46    150.54    14.79<br

                  class="">

                2017-10-20T15:33:11    87.92    112.98    16.46<br

                  class="">

                2017-10-20T15:35:13    17.99    32.15    2.33<br

                  class="">

                2017-10-20T15:35:54    154.7    191.82    29.67<br

                  class="">

                <br class="">

                Any hints here?<br class="">

                <br class="">

                Second question is about general tuning for this big

                heap as seems we're not going well so far. We've been

                reading a lot of documentation about the tuning of this

                algorithm and thought we had a fairly acceptable

                understanding of the process but this has proven wrong

                as some of our tunings (as increasing the IHOP to bigger

                values and increasing the number of parallel threads)

                haven't show improvement and rather regressions in some

                cases, as causing longer young collections or dropping

                in throughput. I'm joining a chunk of the log so you can

                have a quick look if possible and probably point to some

                ideas.<br class="">

                <br class="">

                Thanks a lot,<br class="">

                <br class="">

                Hector Caballero<br class="">

                <br class="">

                <br class="">

                <br class="">

                <br class="">

                <span

                  id="cid:B592A967-A92C-42AE-A0BF-8AB64A485352@chello.hu"><jvm_gc-2017-10-20_14-19-14.log.0.current.gz></span></div>

            </div>

          </blockquote>

        </div>

        <br class="">

      </div>

    </blockquote>

    <br>

  </body>

</html>