G1 with Solr - thread from dev at lucene.apache.org

Wed Dec 17 15:50:56 UTC 2014

Hi Shawn,

> On 12/6/2014 3:00 PM, Shawn Heisey wrote:
> > On 12/5/2014 2:42 PM, Erick Erickson wrote:
> > > Saw this on the Cloudera website:
> > > 
> > > http://blog.cloudera.com/blog/2014/12/tuning-java-garbage-collection-for-hbase/
> > > 
> > > O[...]
> Here's a graph of a GC log lasting over two weeks with a tuned CMS
> collector and Oracle Java 7u25 and a 6GB heap.
> 
> https://www.dropbox.com/s/mygjeviyybqqnqd/cms-7u25.png?dl=0
> 
> CMS was tuned using these settings:
> 
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
> 
> This graph shows that virtually all collection pauses were a little
> under half a second.  There were exactly three full garbage collections,
> and each one took around six seconds.  While that is a significant
> pause, having only three such collections over a period of 16 days
> sounds pretty good to me.
> 
> Here's about half as much runtime (8 days) on the same server running G1
> with Oracle 7u72 and the same 6GB heap.  G1 is untuned, because I do not
> know how:
>
> https://www.dropbox.com/s/2kgx60gj988rflj/g1-7u72.png?dl=0
> 
> Most of these collections were around a tenth of a second ... which is
> certainly better than nearly half a second ... but there are a LOT of
> collections that take longer than a second, and a fair number of them
> that took between 3 and 5 seconds.
> 
> It's difficult to say which of these graphs is actually better.  The CMS
> graph is certainly more consistent, and does a LOT fewer full GCs ...
> but is the 4 to 1 improvement in a typical GC enough to reveal
> significantly better performance?  My instinct says that it would NOT be
> enough for that, especially with so many collections taking 1-3 seconds.
> 
> If the server was really busy (mine isn't), I wonder whether the GC
> graph would look similar, or whether it would be really different.  A
> busy server would need to collect a lot more garbage, so I fear that the
> yellow and black parts of the G1 graph would dominate more than they do
> in my graph, which would be overall a bad thing.  Only real testing on
> busy servers can tell us that.
> 
> I can tell you for sure that the G1 graph looks a lot better than it did
> in early Java 7 releases, but additional work by Oracle (and perhaps
> some G1 tuning options) might significantly improve it.

  could you provide some logs to look at? It is impossible to give good
recommendations without having at least some more detail about what's
going on.

Preferably logs with at least the mentioned options they used to tune
the workload, i.e. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps and -XX:
+PrintAdaptiveSizePolicy

It might also be a good idea to start with the options given in the
cloudera blog entry:

  -XX:MaxGCPauseMillis=100        // the max pause time you want
  -XX:+ParallelRefProcEnabled     // not sure, only if Solr uses lots of
soft or weak references.
  -XX:-ResizePLAB                 // that's minor
  -XX:G1NewSizePercent=1          // that may help in achieving the
pause time goal
  -Xms<heap size>M
  -Xmx<heap size>M

I do not think there is need to set the ParallelGCThreads according to
that formula. This has been the default formula for calculating the
number of threads for all collectors for a long time (but then again it
might have changed sometime in jdk7).

You may also want to use a JDK 8 build, preferably (for me :) some 8u40
EA build (e.g. from https://jdk8.java.net/download.html); there have
been a lot of improvements to G1 in JDK8, and in particular 8u40.

Thanks,
  Thomas