unusually long GC latencies

Mon Jan 11 08:35:40 PST 2010

Dear all,

We have a server application, which under some load seems to do some 
unusually long garbage collections that our clients experience as a 
service denied for 2+ minutes around each moment of the long garbage 
collection. Upon further look we see in the GC log that these happen 
during young generation cleanups:

GC log lines examplifying the issue
-----------------------------------
server1:
290059.768: [GC [PSYoungGen: 1804526K->83605K(2047744K)] 
4574486K->2873501K(5030720K), 0.0220940 secs] [Times: user=0.22 
sys=0.00, real=0.03 secs]
290297.525: [GC [PSYoungGen: 1790101K->92128K(2070592K)] 
4579997K->2882024K(5053568K), 183.0278480 secs] [Times: user=0.21 
sys=0.12, real=183.03 secs]
290575.578: [GC [PSYoungGen: 1831520K->57896K(2057792K)] 
4621416K->2892809K(5040768K), 0.0245410 secs] [Times: user=0.26 
sys=0.01, real=0.03 secs]

server2:
15535.809: [GC 15535.809: [ParNew: 2233185K->295969K(2340032K), 
0.0327680 secs] 4854859K->2917643K(5313116K), 0.0328570 secs] [Times: 
user=0.37 sys=0.00, real=0.03 secs]
15628.580: [GC 15628.580: [ParNew: 2246049K->301210K(2340032K), 
185.3817530 secs] 4867723K->2944629K(5313116K), 185.3836340 secs] 
[Times: user=0.38 sys=0.16, real=185.38 secs]
15821.983: [GC 15821.983: [ParNew: 2251290K->341977K(2340032K), 
0.0513300 secs] 4894709K->3012947K(5313116K), 0.0514380 secs] [Times: 
user=0.48 sys=0.03, real=0.05 secs]

If interested, I have the full logs - the issue appears 4-5 times in a day.

We have tried tuning the GC options of our server application, so far 
without success - we still get the same long collections. We suspect 
this happens because of the 16 core machine, because when we perform the 
same test on a 8 core machine, we do not observe the long garbage 
collections at all.

We also tried both the CMS and the ParallelGC algorithms - we get the 
delays on the 16 core machine in both cases, and in both cases the 
server works OK on the 8 core machine.

* Would you have a look what goes on? Below I post the information about 
the OS, Java version, application and HW. Attached find the GC logs from 
two servers - one running the CMS and the other running ParallelGC.

OS
--
Ubuntu 9.04.3 LTS

uname -a
Linux fas1 2.6.24-25-server #1 SMP Tue Oct 20 07:20:02 UTC 2009 x86_64 
GNU/Linux

HW
--
POWEREDGE R710 RACK CHASSIS with 4X INTEL XEON E5520 PROCESSOR 2.26GHZ
12GB MEMORY (6X2GB DUAL RANK)
2 x 300GB SAS 15K 3.5" HD HOT PLUG IN RAID-1
PERC 6/I RAID CONTROLLER CARD 256MB PCIE

Java
----
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

VM settings server1
-------------------
-server -Xms2461m -Xmx7000m -XX:MaxPermSize=128m -XX:+UseParallelGC 
-XX:+UseParallelOldGC -XX:ParallelGCThreads=10 -XX:MaxNewSize=2333m 
-XX:NewSize=2333m -XX:SurvivorRatio=5 -XX:+DisableExplicitGC 
-XX:+UseBiasedLocking -XX:+UseMembar -verbosegc -XX:+PrintGCDetails

VM settings server2
-------------------
-server -Xms2794m -Xmx8000m -XX:MaxPermSize=128m 
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSParallelRemarkEnabled -XX:MaxNewSize
=2666m -XX:SurvivorRatio=5 -XX:+DisableExplicitGC -XX:+UseBiasedLocking 
-XX:+UseMembar -verbosegc -XX:+PrintGCDetails

We have also observed the issue without all GC-related settings, thus 
the VM running with defaults - same result.

Application:
-----------
Our application runs in a JBoss container and processes http requests. 
We use up to 50 processing threads. We perform about 10-20 request 
processings per second during the executed test that exposed the issue. 
Our server produces quite substantial amount of garbage.

Yours sincerely,
   Nikolay Diakov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20100111/9e49acf3/attachment.html