Minor GC Degradation Over Time

Sat Dec 21 02:24:47 PST 2013

Hi,

During trends analysis of performance we discovered that garbage collecting
time on JMVs steadily grows over long time.

Minor GC pauses increase from 0.1 sec to 1 sec. This chart shows that GC
time grows over time:

https://drive.google.com/file/d/0B3vklY0UY_wHODRkSWgyRjNhTms/edit?usp=sharing

This also causes some average query latency degradation.

We didn't observe frequency increase, we observed large increase of time
which it takes to perform minor GC.

This problem happens regardless of OS(Debian 6-7 64bit, RHEL 5.9 64bit),
JVM version(from 1.6.30 to 1.7.45), application or hardware.

Only one thing unites them: servlet container SpringDM. There is a defect
in this container - it creates a new thread for each request, so after we
had fixed it we had stable GC time:

https://drive.google.com/file/d/0B3vklY0UY_wHWVRJSU5QTnJOcG8/edit?usp=sharing

Right now I am trying to research what exactly is the root cause of this
problem.

Below are some facts and figures to help clarify what happens:

After some period of time(which depends on load), minor GC time grows over
time.

https://drive.google.com/file/d/0B3vklY0UY_wHRjh3eE9NN21FNWc/edit?usp=sharing

   - First of all I turned off all built-in GC ergonomics in
JVM(*UsePSAdaptiveSurvivorSizePolicy,
   UseAdaptiveGenerationSizePolicyAtMinorCollection,
   UseAdaptiveGenerationSizePolicyAtMajorCollection,
   AdaptiveSizeThroughPutPolicy, UseAdaptiveSizePolicyFootprint**Goal,
   UseAdaptiveSizePolicy, UseCMSBestFit*), but result the same.
   - As a next step I disabled all TLAB, PLAB stuff
*-XX:MinTLABSize=2m **XX:-ResizeTLAB
   -XX:TLABSize=2m -XX:-*ResizePLAB, but no changes.
   - I think I have excluded a memory problem from the list of
   possibilities, I mean all combinations of parameters *AlwaysPreTouch
   and *UseLargePages were tested.
   - I did not give a chance to GC threads *-XX:+BindGCTaskThreadsToCPUs *
   *-XX:+UseGCTaskAffinity **-XX:ParallelGCThreads=8 -XX:ConcGCThreads=8*
   - It was my last chance - build my own JVM, so all critical sections
   were covered with *GCTraceTime*. Almost immediately I found the only
   place where time grows over time:

void ParNewGeneration::collect(bool   full,
                               bool   clear_all_soft_refs,
                               size_t size,
                               bool   is_tlab) {
...
  if (n_workers > 1) {
    GenCollectedHeap::StrongRootsScope srs(gch);
    *workers->run_task(&tsk);                      <-- time grows here*
  } else {

So the question is: how and why does this happen? How the number of created
threads can affect stop-the-world parallel collection over
time(fragmentation)?
Maybe someone knows or can suggest how to identify the problem. Of course I
can find it(I hope), but I think it will take longer without any advice.

I have done tests with different JVM parameters, but currently I'm using
the following one:
    -XX:PermSize=256m \
    -XX:MaxPermSize=256m \
    -Xms4096m -Xmx4096m \
    -XX:MaxTenuringThreshold=5 \
    -XX:NewRatio=2 \
    -XX:MinTLABSize=2m \
    -XX:-ResizeTLAB \
    -XX:TLABSize=2m \
    -XX:-PrintTLAB \
    -verbose:gc \
    -XX:+UseConcMarkSweepGC
    -XX:+PrintGCDetails \
    -XX:+PrintGCCause \
    -XX:+PrintGCTimeStamps
    -XX:+PrintGCDateStamps \
    -XX:-ResizeOldPLAB \
    -XX:-ResizePLAB \
    -XX:ParallelGCThreads=8 \
    -XX:ConcGCThreads=8 \
    -XX:-UseAdaptiveSizePolicy \
    -XX:-UseCMSBestFit \
    -XX:+AlwaysPreTouch \
    -XX:-UseLargePages \
    -XX:+UseCondCardMark \
    -XX:-UsePSAdaptiveSurvivorSizePolicy \
    -XX:-UseAdaptiveGenerationSizePolicyAtMinorCollection \
    -XX:-UseAdaptiveGenerationSizePolicyAtMajorCollection \
    -XX:AdaptiveSizeThroughPutPolicy=1 \
    -XX:-UseAdaptiveSizePolicyFootprintGoal \
    -XX:+DisableExplicitGC \
    -XX:+UseXMMForArrayCopy \
    -XX:+UseUnalignedLoadStores \
    -XX:-UseBiasedLocking \
    -XX:+BindGCTaskThreadsToCPUs \
    -XX:+UseGCTaskAffinity \

-- 
Thanks,
Ivan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20131221/23ef80bd/attachment-0001.html