Tuning ShenandoahGC with 420 GB heaps

Edwin Graven edwin at graven-ict.nl
Tue Mar 5 08:17:42 UTC 2019


Aleksey

Thanks for looking into this I'm a colleague of Jeroen and can answer 
most of the questions

We start the application with the option -XX:+AlwaysPreTouch

if i look at the ps outpot it's also stating

acv      35165     1 99 Mar04 ?        9-09:00:00 java -server
-Drss.configuration.propertiesFile=/appl/acv/server/conf/server.properties
  -Djava.library.path=bin:/opt/mqm/java/lib64:/appl/acv/kafka/libs
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8005
  -Dcom.sun.management.jmxremote.authenticate=true
-Dcom.sun.management.jmxremote.password.file=/appl/acv/server/conf/jmxauth/jmxremote.password
  
-Dcom.sun.management.jmxremote.access.file=/appl/acv/server/conf/jmxauth/jmxremote.access
  -Dcom.sun.management.jmxremote.ssl=false
  -Djsse.enableSNIExtension=false -XX:+UseShenandoahGC -XX:+UseNUMA
-XX:+UseTransparentHugePages -XX:+UnlockDiagnosticVMOptions
-XX:+ShenandoahAllocationTrace -XX:+LogVMOutput -XX:ParallelGCThreads=44
  -XX:ShenandoahGCHeuristics=adaptive -XX:ConcGCThreads=16
  -XX:+UnlockExperimentalVMOptions -XX:+ClassUnloadingWithConcurrentMark
-XX:ShenandoahAllocSpikeFactor=7 -XX:ShenandoahPacingMaxDelay=75
-XX:+AlwaysPreTouch -verbosegc -Xms420g -Xmx420g
-XX:+UseGCLogFileRotation -XX:+PrintGCDetails -XX:NumberOfGCLogFiles=10
  -XX:GCLogFileSize=10M -XX:MaxTenuringThreshold=2
-Xloggc:/appl/acv/log/server/gc_Shenandoah_Mon.log -classpath
lib/.:lib/*:/opt/mqm/java/lib/*:/appl/acv/kafka/libs/*:/appl/acv/server/custom/*
  com.riskshield.server.Starter -i/appl/acv/server/conf/server.ini

Not sure why it's reporting in the gc log with -XX:-AlwaysPreTouch. we 
are currently testing with 2 machines and we are seeing this behavior on 
both machines. Both machine have 512GB installed. swap space is set to 
2GB on both machines. when running for a while (3days) both machines 
have still 60G available (1G free mem and 59G filecache) I'm going to 
try to install 8u nightlies today. an other strange thing we encounter 
is that we used the option UseTransparentHugePages But hardly any pages are in hugepages
when we start is we see around 12G address as hugepages and it stays on that level.

If we start the G1 collector with TransparentHugepages we see it running up to 290G.
currently we are using G1 on our production system with largepages (vm.nr_hugepages) also tried this option in combination with Shenandoah 
but then java reports that there is nog enough memory, it does not look 
at the reserved block (vm.nr_hugepages) Edwin Graven

On 3/4/19 6:28 PM, Aleksey Shipilev wrote:

> On 3/4/19 5:48 PM, Jeroen Borgers wrote:
>> Please find 3 gc log files at: 
>> www.jpinpoint.com/resources/gc_Shenandoah_Thu.log.0.current.zip 
>> <http://www.jpinpoint.com/resources/gc_Shenandoah_Thu.log.0.current.zip> 
>> gc_Shenandoah_Thu.log.1.zip gc_shenandoah_Weekend_25.zip
> Briefly looking at one of the logs, Weekend_25.zip. *) First things 
> first, command line options: CommandLine flags: -XX:-AlwaysPreTouch 
> -XX:+ClassUnloadingWithConcurrentMark -XX:ConcGCThreads=16 
> -XX:GCLogFileSize=10485760 -XX:InitialHeapSize=450971566080 
> -XX:InitialTenuringThreshold=2 -XX:+LogVMOutput -XX:+ManagementServer 
> -XX:MaxHeapSize=450971566080 -XX:MaxTenuringThreshold=2 
> -XX:NumberOfGCLogFiles=10 -XX:ParallelGCThreads=44 -XX:+PrintGC 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
> -XX:ShenandoahAllocSpikeFactor=5 -XX:+ShenandoahAllocationTrace 
> -XX:ShenandoahPacingMaxDelay=50 -XX:+UnlockDiagnosticVMOptions 
> -XX:+UnlockExperimentalVMOptions -XX:+UseGCLogFileRotation 
> -XX:+UseNUMA -XX:+UseNUMAInterleaving -XX:+UseShenandoahGC 
> -XX:+UseTransparentHugePages - Consider enabling -XX:+AlwaysPreTouch, 
> especially as you are running with NUMA turned on and THP turned on. 
> If memory allocator and/or defragger kicks in at unfortunate times, it 
> might stall the collector enough to trip off the the concurrent mode; 
> - These options are useless for Shenandoah: 
> "-XX:InitialTenuringThreshold=2 -XX:MaxTenuringThreshold=2" - 
> "-XX:InitialHeapSize=450971566080 -XX:MaxHeapSize=450971566080", but 
> machine has "physical 528157524k(460445396k free)". There is some 
> native memory spent on top of heap size, are you sure the machine 
> never swaps? *) 8u191 Shenandoah is a bit old, which might not have 
> all the performance touchups. Sorry about that. Can you try our 8u 
> nightlies? https://builds.shipilev.net/openjdk-shenandoah-jdk8/ *) 
> From very far out, it seems that the normal GC cycle takes around 16 
> seconds in that config. And the live data set is around 260G (the heap 
> size after Full GC) of 420G max. Which means, any allocation spike at 
> 10+ GB/sec would tank the whole thing. I'd say if you cannot crank up 
> the heap, then you'd have to choose whether you want much larger 
> allocation pacing delay (i.e. seconds) or accept more Degen/Full GCs. 
> *) Looking at progression of some counters: $ grep "Actual Free" 
> gc_shenandoah_Weekend_25.logfull | less ...it seems it slowly goes 
> down, until Full GC recovers from it. We have seen those as heuristics 
> issues before, and it should be fixed in recent 8u's, but probably not 
> in 8u191. -Aleksey
>


More information about the shenandoah-dev mailing list