Large heap size, slow concurrent marking causing frequent full GC

Fri Nov 17 07:56:26 UTC 2017

Hi James,

On Thu, 2017-11-16 at 20:02 +0000, James Sun wrote:
> Dear
>  
> We observed frequent full GCs due to long concurrent marking phase
> (about 30 seconds to a minute). The GC log with heap histogram during
> full GC is attached.
> The Java version we use is 8_144 with G1 GC.  The machines are with
> 56 cores and a heap size around 180 – 210GB.
>  
> Example concurrent mark duration:
> 2017-11-16T09:32:04.565-0800: 167543.159: [GC concurrent-mark-end,
> 45.7020802 secs]
> 2017-11-16T09:33:16.314-0800: 167614.908: [GC concurrent-mark-end,
> 51.0809053 secs]
> 2017-11-16T09:34:28.343-0800: 167686.938: [GC concurrent-mark-end,
> 48.7335047 secs]
>  
>  
> Wonder if anyone could help in terms of:
> How in general we can make concurrent marking faster. We bumped up
> the ConcGCThread to 20 but it didn’t help that much.

There are known performance and scalability issues with JDK8(u). See
the reason in some JavaOne 2016 talk about scaling G1 for huge heaps
[0]. Unfortunately it seems you covered all other options already
(bumping mark stack size to avoid overflows, increasing the number of
heaps).

> We also turned on -XX:+UnlockDiagnosticVMOptions
> -XX:+G1SummarizeConcMark but nothing related to marking shows up.
> General advice in tuning GC in other aspects
>  

Apart from the full gc, and the mixed gcs I am discussing below, is
there anything else of concern?

> Thanks in advance
>  
> James
>  
>  
> Here is the JVM config we have
>  
> -Xss2048k
> -XX:MaxMetaspaceSize=4G
> -XX:+PreserveFramePointer
> -XX:-UseBiasedLocking
> -XX:+PrintGCApplicationConcurrentTime
> -XX:+PrintGCApplicationStoppedTime
> -XX:+UnlockExperimentalVMOptions
> -XX:+UseG1GC
> -XX:+ExplicitGCInvokesConcurrent
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UseGCOverheadLimit
> -XX:+ExitOnOutOfMemoryError
> -agentpath:/packages/presto.presto/bin/libjvmkill.so
> -agentpath:/packages/presto.presto/bin/libperfagent.so
> -XX:+PrintReferenceGC
> -XX:+PrintGCCause
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -XX:+PrintClassHistogramAfterFullGC
> -XX:+PrintClassHistogramBeforeFullGC
> -XX:PrintFLSStatistics=2

This one is CMS specific. Can remove.

> -XX:+PrintAdaptiveSizePolicy
> -XX:+PrintSafepointStatistics
> -XX:PrintSafepointStatisticsCount=1
> -XX:+PrintJNIGCStalls
> -XX:+UnlockDiagnosticVMOptions
> -XX:+AlwaysPreTouch
> -XX:+G1SummarizeRSetStats
> -XX:G1SummarizeRSetStatsPeriod=100
> -Dorg.eclipse.jetty.io.SelectorManager.submitKeyUpdates=true
> -XX:-OmitStackTraceInFastThrow
> -XX:ReservedCodeCacheSize=1G
> -Djdk.nio.maxCachedBufferSize=30000000
> -XX:G1MaxNewSizePercent=20
> -XX:G1HeapRegionSize=32M

At that heap size, the region size would be 32M anyway, may remove.

> -Xms180G
> -Xmx180G
> -XX:MarkStackSize=64M
> -XX:G1HeapWastePercent=2

That is very aggressive. That causes the long mixed gcs at the end of a
 old gen space reclamation phase. Either increase that to 5 to cut off
the "long" mixed gcs (still mostly within your 500ms pause time goal),
or increase G1MixedGCCountTarget to something like 16 to spread out the
work (you are observing the "added expensive regions to CSet" message).
See the documentation [1] for more information.

> -XX:ConcGCThreads=20
> -XX:MaxGCPauseMillis=500
> -XX:GCLockerRetryAllocationCount=5
> -XX:MarkStackSizeMax=256M
> -XX:G1OldCSetRegionThresholdPercent=20
> -XX:InitiatingHeapOccupancyPercent=40
> 

Thanks,
  Thomas

[0] https://www.youtube.com/watch?v=LppgqvKOUKs at 32:04; quoting from
the video: "marking runs 50x faster". It also shows other jdk9
improvements particularly applicable to running larger heaps.

[1] https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-col
lector-tuning.htm#GUID-D2B6ADCE-6766-4FF8-AA9D-B7F4F3D0F469