G1 with Solr - thread from dev at lucene.apache.org
Thomas Schatzl
thomas.schatzl at oracle.com
Wed Dec 17 20:51:53 UTC 2014
Hi Shawn,
Shawn Heisey wrote:
> On 12/17/2014 8:50 AM, Thomas Schatzl wrote:
> > could you provide some logs to look at? It is impossible to give good
> > recommendations without having at least some more detail about what's
> > going on.
> >
> > Preferably logs with at least the mentioned options they used to tune
> > the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX:
> > +PrintAdaptiveSizePolicy
> >
> > It might also be a good idea to start with the options given in the
> > cloudera blog entry:
> >
> > -XX:MaxGCPauseMillis=100 // the max pause time you want
> > -XX:+ParallelRefProcEnabled // not sure, only if Solr uses lots of
> > soft or weak references.
> > -XX:-ResizePLAB // that's minor
> > -XX:G1NewSizePercent=1 // that may help in achieving the
> > pause time goal
> > -Xms<heap size>M
> > -Xmx<heap size>M
> >
> > I do not think there is need to set the ParallelGCThreads according to
> > that formula. This has been the default formula for calculating the
> > number of threads for all collectors for a long time (but then again it
> > might have changed sometime in jdk7).
> >
> > You may also want to use a JDK 8 build, preferably (for me :) some 8u40
> > EA build (e.g. from https://jdk8.java.net/download.html); there have
> > been a lot of improvements to G1 in JDK8, and in particular 8u40.
>
> Strange, I seem to have only received the copy of this message sent
> directly to me, I never got the list copy.
Not sure why. One copy has been archived in the mailing list archives though...
> Here's the options I'm using for G1 on 7u72:
>
> JVM_OPTS=" \
> -XX:+UseG1GC \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
>
> Here's the options I used for CMS on 7u25:
>
> JVM_OPTS=" \
> -XX:NewRatio=3 \
> -XX:SurvivorRatio=4 \
> -XX:TargetSurvivorRatio=90 \
> -XX:MaxTenuringThreshold=8 \
> -XX:+UseConcMarkSweepGC \
> -XX:+CMSScavengeBeforeRemark \
> -XX:PretenureSizeThreshold=64m \
> -XX:CMSFullGCsBeforeCompaction=1 \
> -XX:+UseCMSInitiatingOccupancyOnly \
> -XX:CMSInitiatingOccupancyFraction=70 \
> -XX:CMSTriggerPermRatio=80 \
> -XX:CMSMaxAbortablePrecleanTime=6000 \
> -XX:+CMSParallelRemarkEnabled
> -XX:+ParallelRefProcEnabled
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
>
> In both cases, I used -Xms4096M and -Xmx6144M. These are the GC logging
> options:
>
> GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails"
>
> Here's the GC logs that I already have:
>
> https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0
> https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0
>
please also add -XX:+PrintReferenceGC, and definitely use -XX:
+ParallelRefProcEnabled.
GC is spending a significant amount of the time in soft/weak reference
processing. -XX:+ParallelRefProcEnabled will help, but there will be
spikes still. I saw that GC sometimes spends 1000ms just processing
those references; using 8 threads this should get better.
That alone will likely make it hard reaching a 100ms pause time goal
(1000ms/8 = 125ms...).
CMS has the same problems, and while on average it has ~215ms pauses,
there seem to be a lot that are a lot longer too. Reference processing
also takes very long, even with -XX:+ParallelRefProcEnabled.
I am not sure about the cause for the full gc's: either the pause time
prediction in G1 in that version is too bad and it tries to use a way
too large young gen, or there are a few very large objects around.
Depending on the log output and the impact of the other options we might
want to cap the maximum young gen size.
> I believe that Lucene does use a lot of references.
I saw that. Must be millions. -XX:+PrintReferenceGC should show that
(also in CMS).
> I am more familiar
> with Solr code than Lucene, but even on Solr, I am not well-versed in
> the lower-level details.
>
> I will get PrintAdaptiveSizePolicy added to my GC logging options.
>
> Unless the performance improvement in Java 8 is significant, I don't
> think I can make a compelling case to switch from Java 7 yet.
>From the top of my head:
- logging is better
- parallelized a few more GC phases
- class unloading after concurrent mark (not only during full gc) - but
that does not seem to be a problem
- prediction fixes
- much improved handling of large objects - does not seem to be a
problem here
- slew of bugfixes
I am mostly missing the improved logging for analysis, and the
improvements in pause times.
> Although I have UseLargePages, I do not have any huge pages allocated in
> the CentOS 6 operating system, so this is not actually doing anything.
Thanks,
Thomas
More information about the hotspot-gc-use
mailing list