High sys time in G1 GC logs and questions about big heap tuning
Vitaly Davidovich
vitalyd at gmail.com
Fri Oct 20 18:20:08 UTC 2017
On Fri, Oct 20, 2017 at 11:43 AM, Hector Caballero <
hector.caballero at ericsson.com> wrote:
> Hi,
>
> We've been troubleshooting a big traffic application for some time now and
> have discovered GC times are impacting performance. Next paragraph just to
> give some context:
>
> We are using G1 GC with JDK 8u131 on a RHEL 6 server with latest patches
> and these JVM parameters:
>
> javaOptions = -Xms128g
> javaOptions = -Xmx128g
> javaOptions = -XX:MetaspaceSize=128m
> javaOptions = -XX:+UseG1GC
> javaOptions = -XX:InitiatingHeapOccupancyPercent=15
> javaOptions = -XX:MaxGCPauseMillis=200
> javaOptions = -XX:+ParallelRefProcEnabled
> javaOptions = -XX:SoftRefLRUPolicyMSPerMB=0
> javaOptions = -XX:ParallelGCThreads=24
> javaOptions = -XX:+PrintGCDetails
> javaOptions = -XX:+PrintGCDateStamps
> javaOptions = -XX:+PrintTenuringDistribution
> javaOptions = -XX:+PrintAdaptiveSizePolicy
> javaOptions = -XX:+PrintReferenceGC
> javaOptions = -XX:+UnlockDiagnosticVMOptions
> javaOptions = -XX:+G1SummarizeRSetStats
> javaOptions = -Xloggc:/opt/gerrit/review_sit
> e/logs/jvm/jvm_gc-%t.log
> javaOptions = -XX:+UseGCLogFileRotation
> javaOptions = -XX:GCLogFileSize=40M
> javaOptions = -XX:NumberOfGCLogFiles=20
> javaOptions = -Djava.util.Arrays.useLegacyMergeSort=true
>
> and the system was stable. Some precision: The IHOP value was reduced as a
> way to avoid having full GCs that were killing us and it worked well for
> this purpose. The only problem we saw with these parameters was a bit of
> impact on throughput but, as said, the system was still very usable.
>
> After upgrading the application to latest version, the memory pattern
> changed dramatically and full GCs went back making the application
> unusable. The main change we saw was a very fast increase in old generation
> (up to 115GB) while young generation kept low. As an urgent measure, the
> system memory was upgraded and the heap size put to 256GB with which there
> was a dramatic improvement in application performance but we're still
> seeing a big regression in some operations that are taking 3-4 times more.
> We understand we need to look at the code itself (this application is an
> open source one) to see how object creation has changed, but we have a lot
> of pressure of users and management to solve this right now so changes in
> code need to wait a bit. Our goal right now is to make the application as
> usable as possible tuning memory usage to give us a bit of room to start
> looking at the code.
>
> The increase in heap size led to very low throughput, around 70%. In order
> to try to improve it, we decided to relax pause goal to 300ms. Given doing
> this didn't affect much the application (at least it was not worse) and
> things improved a bit but not enough. Next we decided to try to start
> collecting a bit later to reduce the number of mixed collections so we
> increase IHOP to 18% (very modest modification). After that, looking at the
> GC logs we saw that, without having full GCs, there are young collections
> taking long time (up to a minute in a case). In some of those cases, the
> sys time was a lot higher than the user time, which is kind of strange and
> we haven't found much information about the causes of this but we see is
> affecting us a lot:
>
> Timestamp User Time (secs) Sys Time (secs) Real Time (secs)
> 2017-10-20T14:19:41 1.68 9.5 0.48
> 2017-10-20T14:19:59 2.26 9.56 0.56
> 2017-10-20T14:34:45 12.26 14.01 1.23
> 2017-10-20T15:26:33 9.82 31.73 2.39
> 2017-10-20T15:26:37 15.68 46.47 3.19
> 2017-10-20T15:26:41 10.11 39.64 2.44
> 2017-10-20T15:28:18 15.53 45.14 3.08
> 2017-10-20T15:29:01 5.4 41.63 2.43
> 2017-10-20T15:29:05 74.69 93.91 11.51
> 2017-10-20T15:29:18 80.67 137.35 18.38
> 2017-10-20T15:30:55 19.65 23.99 2.16
> 2017-10-20T15:31:03 11.03 29.28 1.82
> 2017-10-20T15:31:28 4.88 8.44 0.77
> 2017-10-20T15:31:30 1.58 59.97 3.21
> 2017-10-20T15:31:35 57.13 135.82 14.96
> 2017-10-20T15:32:23 25.46 150.54 14.79
> 2017-10-20T15:33:11 87.92 112.98 16.46
> 2017-10-20T15:35:13 17.99 32.15 2.33
> 2017-10-20T15:35:54 154.7 191.82 29.67
>
> Any hints here?
Do you have transparent huge pages (THP) enabled? This is a fairly common
cause for outsized sys times.
>
> Second question is about general tuning for this big heap as seems we're
> not going well so far. We've been reading a lot of documentation about the
> tuning of this algorithm and thought we had a fairly acceptable
> understanding of the process but this has proven wrong as some of our
> tunings (as increasing the IHOP to bigger values and increasing the number
> of parallel threads) haven't show improvement and rather regressions in
> some cases, as causing longer young collections or dropping in throughput.
> I'm joining a chunk of the log so you can have a quick look if possible and
> probably point to some ideas.
>
> Thanks a lot,
>
> Hector Caballero
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20171020/610abfed/attachment.htm>
More information about the hotspot-gc-dev
mailing list