Long pauses with parallel collector

Y Srinivas Ramakrishna Y.S.Ramakrishna at Sun.COM
Tue Mar 18 19:37:37 UTC 2008


Hi Karl --

When the long response times are seen, the free memory jumps from about ~300 MB
free to about ~4GB free. So it would seem as though the 60 s pauses are likely the
serial compaction of the 12 GB heap. I would suggest either using -XX:+UseParalleOldGC
(to compact the heap in parallel on the 4 cpus), or if that does not help enough (recall that
1.5.0_XX was the first version of parallel compaction and many performance enhancements
were made to it subsequently in 6uXX which are not in 1.5.0_XX), then try -XX:+UseConcMarkSweepGC.

Tuning docs etc can be found starting from:

    http://java.sun.com/javase/technologies/hotspot/index.jsp
    http://java.sun.com/javase/technologies/hotspot/gc/index.jsp

GC logs using -XX:+PrintGCDetails will of course clarify matters more. There may be other
flags with each of the parallel old and cms collectors that would provide more statistics
as to where the time is going.

HTHS.
-- ramki


----- Original Message -----
From: Karl Rosenbaum <kalle at rosenbaum.se>
Date: Tuesday, March 18, 2008 7:59 am
Subject: Long pauses with parallel collector
To: hotspot-gc-use at openjdk.java.net
Cc: mikael.lonneberg at telelogic.com, Mattias Fredsberg <mattias.fredsberg at focalpoint.se>


> Hi
> 
> We're experiencing long pauses (>60 seconds) using the parallel 
> collector and the serial compacting old generation collector (the 
> default choice on a server-class machine, right?). Unfortunately we 
> have 
> no logs from the gc, since this is a customer's installation and we 
> can't play around with that. But I do have memory statistics along 
> with 
> response time statistics. Before the problems started we didn't really 
> 
> notice the collections but recently the collections took more than 60 
> 
> seconds. This is measured using wget on a "ping"-servlet that returns 
> 
> some memory statistics. The RESPTIME is measured by timing the 
> wget-command. FREE MEM, MEM TOTAL and MEM ALLOCATED are the results of
> 
>                  long freeMem = Runtime.getRuntime().freeMemory();
>                  long totalMem = Runtime.getRuntime().totalMemory();
>                  long allocMem = Runtime.getRuntime().maxMemory();
> 
> 
> This is an example of the statistics before the problems started:
> 
> TIME        	FREE MEM	MEM TOTAL	MEM ALLOCATED	RESPTIME
> 10:59:49	1499233752	12724273152	12724273152	0.20
> 10:59:59	1431576712	12724273152	12724273152	0.16
> 11:00:09	1392930432	12724273152	12724273152	0.14
> 11:00:19	1385974568	12724273152	12724273152	0.13
> 11:00:30	1365510896	12724273152	12724273152	0.13
> 11:00:40	1248830048	12724273152	12724273152	0.19
> 11:00:50	1164298568	12724273152	12724273152	0.13
> 11:01:00	1122874896	12724273152	12724273152	0.15
> 11:01:10	1085027216	12724273152	12724273152	0.13
> 11:01:20	952254960	12724273152	12724273152	0.14
> 11:01:31	385638352	12724273152	12724273152	0.30
> 11:01:41	3905940544	12717326336	12717326336	0.13
> 11:01:51	3286915952	12717326336	12717326336	0.34
> 11:02:01	1862466680	12717326336	12717326336	0.17
> 11:02:12	1214561176	12717326336	12717326336	0.13
> 11:02:22	937846336	12717326336	12717326336	0.13
> 11:02:32	830626208	12717326336	12717326336	0.13
> 11:02:42	591847776	12717326336	12717326336	0.13
> 11:02:52	486858808	12717326336	12717326336	0.15
> 11:03:02	405118840	12717326336	12717326336	0.12
> 11:03:12	379997776	12717326336	12717326336	0.17
> 11:03:23	4231164944	12723027968	12723027968	0.13
> 11:03:33	2869501088	12723027968	12723027968	0.14
> 11:03:43	2848014968	12723027968	12723027968	0.15
> 11:03:53	2789025152	12723027968	12723027968	0.13
> 11:04:03	2141846392	12723027968	12723027968	0.14
> 11:04:13	1936386592	12723027968	12723027968	0.13
> 11:04:24	1922345160	12723027968	12723027968	0.13
> 11:04:34	1907951992	12723027968	12723027968	0.14
> 11:04:44	1884581280	12723027968	12723027968	0.15
> 11:04:54	1842599720	12723027968	12723027968	0.14
> 11:05:04	1833357144	12723027968	12723027968	0.14
> 11:05:14	819218528	12723027968	12723027968	0.15
> 11:05:25	766194152	12723027968	12723027968	0.14
> 11:05:35	738061968	12723027968	12723027968	0.14
> 11:05:45	716541120	12723027968	12723027968	0.13
> 11:05:55	690620328	12723027968	12723027968	0.13
> 11:06:05	601896808	12723027968	12723027968	0.13
> 11:06:15	3986587288	12721192960	12721192960	0.19
> 11:06:26	3981820120	12721192960	12721192960	0.19
> 11:06:36	3908686896	12721192960	12721192960	0.13
> 11:06:46	3896673928	12721192960	12721192960	0.13
> 
> 
> This is an example of the statistics after the problems started:
> 
> TIME        	FREE MEM	MEM TOTAL	MEM ALLOCATED	RESPTIME
> 15:26:13	132104472	12878741504	12878741504	0.11
> 15:26:23	89372112	12878741504	12878741504	0.21
> 15:26:33	61732520	12878741504	12878741504	0.12
> 15:26:43	15401280	12878741504	12878741504	12.14
> 15:27:05	4074793624	12878741504	12878741504	64.57
> 15:28:20	3378764376	12878741504	12878741504	0.12
> 15:28:30	2858626704	12878741504	12878741504	0.11
> 15:28:40	491173584	12878741504	12878741504	0.11
> 15:28:50	414802056	12878741504	12878741504	0.92
> 15:29:01	298133928	12878741504	12878741504	0.20
> 15:29:12	4164253720	12878741504	12878741504	64.36
> 15:30:26	3662076152	12878741504	12878741504	0.15
> 15:30:36	1823630328	12878741504	12878741504	0.11
> 15:30:46	1338510584	12878741504	12878741504	0.12
> 15:30:56	505714912	12878741504	12878741504	0.11
> 15:31:07	320903520	12878741504	12878741504	0.11
> 15:31:17	4179464600	12878741504	12878741504	71.62
> 15:32:38	3001104680	12878741504	12878741504	0.47
> 15:32:49	2593399336	12878741504	12878741504	0.13
> 15:32:59	2478360368	12878741504	12878741504	0.11
> 15:33:09	2391502968	12878741504	12878741504	0.11
> 15:33:19	785466304	12878741504	12878741504	0.13
> 15:33:29	4004574384	12878741504	12878741504	62.87
> 15:34:42	2643636392	12878741504	12878741504	0.11
> 15:34:52	2387674096	12878741504	12878741504	0.15
> 15:35:03	2226430872	12878741504	12878741504	0.11
> 15:35:13	1016694456	12878741504	12878741504	0.11
> 15:35:23	728910648	12878741504	12878741504	0.11
> 15:35:33	238095528	12878741504	12878741504	7.11
> 15:35:50	215109328	12878741504	12878741504	0.11
> 15:36:00	4025070120	12878741504	12878741504	69.14
> 15:37:19	2294767768	12878741504	12878741504	0.12
> 15:37:30	1897269848	12878741504	12878741504	0.11
> 15:37:40	1832375800	12878741504	12878741504	0.30
> 15:37:50	1741032976	12878741504	12878741504	0.11
> 15:38:00	1641480032	12878741504	12878741504	0.18
> 15:38:10	774869968	12878741504	12878741504	7.20
> 15:38:28	634489992	12878741504	12878741504	0.10
> 15:38:38	274370880	12878741504	12878741504	3.11
> 15:38:51	68896328	12878741504	12878741504	0.11
> 15:39:01	31211120	12878741504	12878741504	0.15
> 15:39:11	3997617032	12878741504	12878741504	60.77
> 
> As you can see, every collection results in a >60 seconds pause time. 
> 
> The server have been restarted once between the two log snippets and 
> they are from different dates. The problem lasted indefinitely and all 
> 
> collections seem to take this long.
> 
> We have now applied the CMS collector to "kill" these symptoms, but we 
> 
> still need to understand what was going on here.
> 
> Do you have any suggestions on what the cause of this might be?
> 
> The server is a Solaris 10 sparc machine with 16GB memory and 4 cpus. 
> 
> The java version is 1.5.0_11
> 
> These are the java parameters:
> root at fp # pargs 10502
> 
> 10502:  /usr/jdk/instances/jdk1.5.0/bin/sparcv9/java -Xmx12288m 
> -Dmx=12288 -Djava.awt.h
> argv[0]: /usr/jdk/instances/jdk1.5.0/bin/sparcv9/java
> argv[1]: -Xmx12288m
> argv[2]: -Dmx=12288
> argv[3]: -Djava.awt.headless=true
> argv[4]: -Dfile.encoding=UTF-8
> argv[5]: -Djava.library.path=/opt/FOCAfp/lib
> argv[6]: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> argv[7]: -Djava.endorsed.dirs=/opt/FOCAtomcat/common/endorsed
> argv[8]: -classpath
> argv[9]: 
> :/opt/FOCAtomcat/bin/bootstrap.jar:/opt/FOCAtomcat/bin/commons-logging-api.jar
> argv[10]: -Dcatalina.base=/opt/FOCAtomcat
> argv[11]: -Dcatalina.home=/opt/FOCAtomcat
> argv[12]: -Djava.io.tmpdir=/opt/FOCAtomcat/temp
> argv[13]: org.apache.catalina.startup.Bootstrap
> argv[14]: start
> 
> 
> Best regards,
> Karl Rosenbaum
> _______________________________________________
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list