G1 with Solr - thread from dev at lucene.apache.org

Sat Dec 20 01:28:55 UTC 2014

On 12/17/2014 1:51 PM, Thomas Schatzl wrote:
>> In both cases, I used -Xms4096M and -Xmx6144M.  These are the GC logging
>> options:
>>
>> GCLOG_OPTS="-verbose:gc -Xloggc:logs/gc.log -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails"
>>
>> Here's the GC logs that I already have:
>>
>> https://www.dropbox.com/s/4uy95g9zmc28xkn/gc-idxa1-cms-7u25.log?dl=0
>> https://www.dropbox.com/s/loyo6u0tqcba6sh/gc-idxa1-g1-7u72.log?dl=0
>>
> 
>   please also add -XX:+PrintReferenceGC, and definitely use -XX:
> +ParallelRefProcEnabled.
> 
> GC is spending a significant amount of the time in soft/weak reference
> processing. -XX:+ParallelRefProcEnabled will help, but there will be
> spikes still. I saw that GC sometimes spends 1000ms just processing
> those references; using 8 threads this should get better.
> 
> That alone will likely make it hard reaching a 100ms pause time goal
> (1000ms/8 = 125ms...).
> 
> CMS has the same problems, and while on average it has ~215ms pauses,
> there seem to be a lot that are a lot longer too. Reference processing
> also takes very long, even with -XX:+ParallelRefProcEnabled.
> 
> I am not sure about the cause for the full gc's: either the pause time
> prediction in G1 in that version is too bad and it tries to use a way
> too large young gen, or there are a few very large objects around.
> 
> Depending on the log output and the impact of the other options we might
> want to cap the maximum young gen size.
> 
>> I believe that Lucene does use a lot of references.
> 
> I saw that. Must be millions. -XX:+PrintReferenceGC should show that
> (also in CMS).

I still did not get the list message, but I figured out why.  The list
subscription has an option "Avoid duplicate copies of messages" that I
just had to turn off.  I prefer to reply to messages from the list
because I know for sure that all the right headers are included.

I would not be surprised if there are millions of references.  My whole
index is over 98 million documents and half of those documents are
present in shards on each server, taking up about 60GB of disk space per
server.

I already have ParallelRefProcEnabled and I have just added
PrintReferenceGC.

For reference, here are my options for CMS:

JVM_OPTS=" \
-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:CMSFullGCsBeforeCompaction=1 \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=70 \
-XX:CMSTriggerPermRatio=80 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Which of these options will apply to G1, and are any of them worthwhile
to include?  I haven't got any tuning options at all for G1, and I'm
looking for suggestions.  This is my current G1 option list:

JVM_OPTS=" \
-XX:+UseG1GC \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

Based on some recent list activity unrelated to this discussion, I also
opted to disable transparent huge pages on the Solr servers.  I haven't
noticed any real difference in the server resource graphs (CPU, load, etc).

I've started an internal discussion about Java 8 to see how receptive
everyone will be to an upgrade.

Thanks,
Shawn