Help processing G1GC logs (visualization tools) - also, mysterious great performance after inexplicable full GC?

Wed Oct 17 13:53:51 PDT 2012

Hi Aaron,

I'd like to start by noting that by explicitly setting NewRatio, you are 
basically not letting your nursery adaptively grow (upto 80% of heap) or 
shrink (down to 20% of heap) which is needed by G1 to try to meet its 
pause time goal. In your case, it works out OK though.

Anyway, here's what I understand from your log -

  * After your first 4 young collections the heap occupancy crosses 45%
    of 8G (3.6Gs) and hence you see the "initial mark" piggy-backed on
    the young collection
      o The time spent in copying here by itself was 1.1sec (on an
        average) for the 33GC threads
  * After plotting your heap occupancy and GC Histo, it seems that your
    heap occupancy stays around 3.5Gs till the first expansion to ~10Gs
    and then stays at around 4Gs (Note: with this expansion the new heap
    occupancy limit is around 4.5Gs) and it seems that you are
    allocating "humongous objects"
      o The reason being that the next multitude of initial-marks are
        not showing Eden reaching its full capacity.
      o And the assumption is that since the heap is near its Marking
        Threshold limit, the humongous object allocation is triggering
        the initial mark.
  * I also spotted some (528 in number)
    "concurrent-mark-reset-for-overflow" and when I consulted with John
    Cuthbertson, he mentioned that we should try to increase the
    MarkStackSize to alleviate that issue.

So quick recommendations based on your application "phases" that we see 
from you GC logs -

 1. Increase your InitiatingHeapOccupancyPercent. 70 could be a safe
    number for you.
 2. Increase your MarkStackSize.
 3. Print with GC cause enabled (PrintGCCause)//, so that you can
    understand what's causing all those pauses.
 4. Also, looking at your command-line, I wanted to quickly highlight
    that UseBiasedLocking and UseTLAB are probably enabled by default.
    Also, UseNUMA will have no effect.

Hope the above helps your application.

Regards,
Monica Beckwith

On 10/16/2012 9:01 PM, Aaron Daubman wrote:
> Simone, Monica,
>
> Thanks for the helpful pointers to presentations - I'm psyched to make 
> it through quite a few of the JavaOne presentations. In the mean time, 
> it looks like one of the JVMs I'm monitoring experienced the 
> mysterious full GC that helps improve performance afterward at ~23:26 
> GMT today. I'm attaching some zipped screenshots that perhaps will 
> provide some context of memory utilization as well as the full (zipped 
> as it is ~18M) GC log file. Any insight you can provide or suggestions 
> to improve performance, set up the JVM to avoid this full GC, 
> or especially to figure out how to start the JVM up in such a way as 
> to perform as it does after these rare full GC events occur would be 
> wonderful. (some responses to questions in-line at the bottom)
> You can't tell from the scaled-out AppDynamics graph, but avg GC time 
> goes from a consistent ~125ms/min before the full GC to a periodic 
> 0-20ms/min after. (7 minor collections per minute down to 0-1 per minute)
>
> Relevant JVM flags used for this log file were:
> -XX:-UseSplitVerifier
> -XX:+UseG1GC
> -XX:MaxGCPauseMillis=50
> -Xms8g
> -Xmx16g
> -XX:NewRatio=6
> -XX:+UseNUMA
> -XX:+UseBiasedLocking
> -XX:+UseTLAB
> -XX:+AggressiveOpts
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/sas01/solr.dump
> -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails
> -Xloggc:/var/log/solr/GCLogFile.log
>
> To jump right to the full GC event, just look for:
> 2012-10-16T23:26:08.048+0000: 45896.289: [Full GC 
> 11734M->3646M(16384M), 16.2904780 secs]
> It is the only "Full GC" event in the log.
>
> The screenshots from https://github.com/chewiebug/GCViewer center 
> around the Full GC event, and the AppDynamics graphs should make the 
> event timing pretty obvious.
>
> On Tue, Oct 16, 2012 at 9:42 AM, Monica Beckwith 
> <monica.beckwith at oracle.com <mailto:monica.beckwith at oracle.com>> wrote:
>
>     ... We should also be able to work on improving (or at-least
>     explaining) your response times, but we would like to know a) what
>     is your goal, b) what are the important factors (e.g. sizing
>     limitations, etc) and c) if you are working off some comparison
>     (e.g. a previously tuned garbage collector).
>
>
> I am currently trying to get a grip on real application requirements - 
> this particular JVM shares a 48-core 256G ram FusioIO disk 
> over-provisioned system with several other currently-poorly-tuned 
> large JVMs - so suffice it to say that sizing limitations, within 
> reason, are not a current first-rate concern. (I recently dropped it's 
> previously 50G heap down to the current 16G, and see typical heap 
> utilization of only 4-8G)
>
> a) the near-term goal is to reduce overall (mean) response time, as 
> well as to smooth the incredible (10ms to 30,000ms) variance in 
> response time
> Some background: This is a jetty servlet servicing requests by 
> processing data from a ~70G Solr index file. There is an unfortunate 
> amount of significant garbage produced per-request (allocation of 
> several large objects per-request is unavoidable - object pooling 
> might be worth looking in to here, and string interning in the latest 
> just-released solr 4.0 may also help, but both of these would be 
> longer-term efforts). ~10Mb dynamically created JSON is generated and 
> transmitted per-request by this servlet. We are currently at the stage 
> where higher request-rates actually cause better performance (likely 
> due to staying more active and thus preventing some paging out of the 
> 70G MMaped solr index file)
>
> b) it would be nice to reduce the JVM memory footprint to enable 
> deployment of this JVM on more commodity-like hardware (32-128G RAM, 
> 16 cores, etc...) but this is secondary to improving performance and 
> significantly reducing response-time variance. I can't think of any 
> real important limiting factors
>
> c) a previous un-tuned (JDK6 with default GC, 50G heap) which I moved 
> to CMS but still untuned both had better overall response times and 
> less variance than the current configuration. However, I've been 
> unable to stick to the "change one thing at a time" principle due to 
> difficulties making changes to production, etc... and so there were 
> many changes (JDK6 to 7, jetty 6 to 9, solr 3.3 to 3.6.1, java 
> collections to fastutil primitive collections, etc...) made all at 
> once, so it is difficult to point just at GC - however, going back to 
> the strange better performance after these sometimes-daily full GC 
> events, it "smells" like at least a GC-related issue.
>
> Thanks again!
>      Aaron
>

-- 
Oracle <http://www.oracle.com>
Monica Beckwith | Java Performance Engineer
VOIP: +1 512 401 1274 <tel:+1%20512%20401%201274>
Texas
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
developing practices and products that help protect the environment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121017/8b5c2e7d/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121017/8b5c2e7d/oracle_sig_logo.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: green-for-email-sig_0.gif
Type: image/gif
Size: 356 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20121017/8b5c2e7d/green-for-email-sig_0.gif