Long pauses with parallel collector

Karl Rosenbaum kalle at rosenbaum.se
Tue Mar 18 20:49:49 UTC 2008


Yes, it should be the serial compaction that we see, but why don't I see any 
long pauses in the first piece of statistics. It obviously should be able to run 
  an old generation collection without a 60 s pause.

To clarify, my question is "Why do I see the 60 s pauses now, when I didn't see 
them a few days before."

As I said, we have applied the CMS now, but we still need to understand what was 
  going on, and unfortunately we couldn't make any logging, because it is a 
production server that we can't experiment with.

I should also mention that the objects in the ~8GB of constantly occupied space 
rarely dies, so there shouldn't be much compacting do do every ~2 minutes.

Regards,
Karl


Y Srinivas Ramakrishna wrote:
> Hi Karl --
> 
> When the long response times are seen, the free memory jumps from about ~300 MB
> free to about ~4GB free. So it would seem as though the 60 s pauses are likely the
> serial compaction of the 12 GB heap. I would suggest either using -XX:+UseParalleOldGC
> (to compact the heap in parallel on the 4 cpus), or if that does not help enough (recall that
> 1.5.0_XX was the first version of parallel compaction and many performance enhancements
> were made to it subsequently in 6uXX which are not in 1.5.0_XX), then try -XX:+UseConcMarkSweepGC.
> 
> Tuning docs etc can be found starting from:
> 
>     http://java.sun.com/javase/technologies/hotspot/index.jsp
>     http://java.sun.com/javase/technologies/hotspot/gc/index.jsp
> 
> GC logs using -XX:+PrintGCDetails will of course clarify matters more. There may be other
> flags with each of the parallel old and cms collectors that would provide more statistics
> as to where the time is going.
> 
> HTHS.
> -- ramki
> 
> 
> ----- Original Message -----
> From: Karl Rosenbaum <kalle at rosenbaum.se>
> Date: Tuesday, March 18, 2008 7:59 am
> Subject: Long pauses with parallel collector
> To: hotspot-gc-use at openjdk.java.net
> Cc: mikael.lonneberg at telelogic.com, Mattias Fredsberg <mattias.fredsberg at focalpoint.se>
> 
> 
>> Hi
>>
>> We're experiencing long pauses (>60 seconds) using the parallel 
>> collector and the serial compacting old generation collector (the 
>> default choice on a server-class machine, right?). Unfortunately we 
>> have 
>> no logs from the gc, since this is a customer's installation and we 
>> can't play around with that. But I do have memory statistics along 
>> with 
>> response time statistics. Before the problems started we didn't really 
>>
>> notice the collections but recently the collections took more than 60 
>>
>> seconds. This is measured using wget on a "ping"-servlet that returns 
>>
>> some memory statistics. The RESPTIME is measured by timing the 
>> wget-command. FREE MEM, MEM TOTAL and MEM ALLOCATED are the results of
>>
>>                  long freeMem = Runtime.getRuntime().freeMemory();
>>                  long totalMem = Runtime.getRuntime().totalMemory();
>>                  long allocMem = Runtime.getRuntime().maxMemory();
>>
>>
>> This is an example of the statistics before the problems started:
>>
>> TIME        	FREE MEM	MEM TOTAL	MEM ALLOCATED	RESPTIME
>> 10:59:49	1499233752	12724273152	12724273152	0.20
>> 10:59:59	1431576712	12724273152	12724273152	0.16
>> 11:00:09	1392930432	12724273152	12724273152	0.14
>> 11:00:19	1385974568	12724273152	12724273152	0.13
>> 11:00:30	1365510896	12724273152	12724273152	0.13
>> 11:00:40	1248830048	12724273152	12724273152	0.19
>> 11:00:50	1164298568	12724273152	12724273152	0.13
>> 11:01:00	1122874896	12724273152	12724273152	0.15
>> 11:01:10	1085027216	12724273152	12724273152	0.13
>> 11:01:20	952254960	12724273152	12724273152	0.14
>> 11:01:31	385638352	12724273152	12724273152	0.30
>> 11:01:41	3905940544	12717326336	12717326336	0.13
>> 11:01:51	3286915952	12717326336	12717326336	0.34
>> 11:02:01	1862466680	12717326336	12717326336	0.17
>> 11:02:12	1214561176	12717326336	12717326336	0.13
>> 11:02:22	937846336	12717326336	12717326336	0.13
>> 11:02:32	830626208	12717326336	12717326336	0.13
>> 11:02:42	591847776	12717326336	12717326336	0.13
>> 11:02:52	486858808	12717326336	12717326336	0.15
>> 11:03:02	405118840	12717326336	12717326336	0.12
>> 11:03:12	379997776	12717326336	12717326336	0.17
>> 11:03:23	4231164944	12723027968	12723027968	0.13
>> 11:03:33	2869501088	12723027968	12723027968	0.14
>> 11:03:43	2848014968	12723027968	12723027968	0.15
>> 11:03:53	2789025152	12723027968	12723027968	0.13
>> 11:04:03	2141846392	12723027968	12723027968	0.14
>> 11:04:13	1936386592	12723027968	12723027968	0.13
>> 11:04:24	1922345160	12723027968	12723027968	0.13
>> 11:04:34	1907951992	12723027968	12723027968	0.14
>> 11:04:44	1884581280	12723027968	12723027968	0.15
>> 11:04:54	1842599720	12723027968	12723027968	0.14
>> 11:05:04	1833357144	12723027968	12723027968	0.14
>> 11:05:14	819218528	12723027968	12723027968	0.15
>> 11:05:25	766194152	12723027968	12723027968	0.14
>> 11:05:35	738061968	12723027968	12723027968	0.14
>> 11:05:45	716541120	12723027968	12723027968	0.13
>> 11:05:55	690620328	12723027968	12723027968	0.13
>> 11:06:05	601896808	12723027968	12723027968	0.13
>> 11:06:15	3986587288	12721192960	12721192960	0.19
>> 11:06:26	3981820120	12721192960	12721192960	0.19
>> 11:06:36	3908686896	12721192960	12721192960	0.13
>> 11:06:46	3896673928	12721192960	12721192960	0.13
>>
>>
>> This is an example of the statistics after the problems started:
>>
>> TIME        	FREE MEM	MEM TOTAL	MEM ALLOCATED	RESPTIME
>> 15:26:13	132104472	12878741504	12878741504	0.11
>> 15:26:23	89372112	12878741504	12878741504	0.21
>> 15:26:33	61732520	12878741504	12878741504	0.12
>> 15:26:43	15401280	12878741504	12878741504	12.14
>> 15:27:05	4074793624	12878741504	12878741504	64.57
>> 15:28:20	3378764376	12878741504	12878741504	0.12
>> 15:28:30	2858626704	12878741504	12878741504	0.11
>> 15:28:40	491173584	12878741504	12878741504	0.11
>> 15:28:50	414802056	12878741504	12878741504	0.92
>> 15:29:01	298133928	12878741504	12878741504	0.20
>> 15:29:12	4164253720	12878741504	12878741504	64.36
>> 15:30:26	3662076152	12878741504	12878741504	0.15
>> 15:30:36	1823630328	12878741504	12878741504	0.11
>> 15:30:46	1338510584	12878741504	12878741504	0.12
>> 15:30:56	505714912	12878741504	12878741504	0.11
>> 15:31:07	320903520	12878741504	12878741504	0.11
>> 15:31:17	4179464600	12878741504	12878741504	71.62
>> 15:32:38	3001104680	12878741504	12878741504	0.47
>> 15:32:49	2593399336	12878741504	12878741504	0.13
>> 15:32:59	2478360368	12878741504	12878741504	0.11
>> 15:33:09	2391502968	12878741504	12878741504	0.11
>> 15:33:19	785466304	12878741504	12878741504	0.13
>> 15:33:29	4004574384	12878741504	12878741504	62.87
>> 15:34:42	2643636392	12878741504	12878741504	0.11
>> 15:34:52	2387674096	12878741504	12878741504	0.15
>> 15:35:03	2226430872	12878741504	12878741504	0.11
>> 15:35:13	1016694456	12878741504	12878741504	0.11
>> 15:35:23	728910648	12878741504	12878741504	0.11
>> 15:35:33	238095528	12878741504	12878741504	7.11
>> 15:35:50	215109328	12878741504	12878741504	0.11
>> 15:36:00	4025070120	12878741504	12878741504	69.14
>> 15:37:19	2294767768	12878741504	12878741504	0.12
>> 15:37:30	1897269848	12878741504	12878741504	0.11
>> 15:37:40	1832375800	12878741504	12878741504	0.30
>> 15:37:50	1741032976	12878741504	12878741504	0.11
>> 15:38:00	1641480032	12878741504	12878741504	0.18
>> 15:38:10	774869968	12878741504	12878741504	7.20
>> 15:38:28	634489992	12878741504	12878741504	0.10
>> 15:38:38	274370880	12878741504	12878741504	3.11
>> 15:38:51	68896328	12878741504	12878741504	0.11
>> 15:39:01	31211120	12878741504	12878741504	0.15
>> 15:39:11	3997617032	12878741504	12878741504	60.77
>>
>> As you can see, every collection results in a >60 seconds pause time. 
>>
>> The server have been restarted once between the two log snippets and 
>> they are from different dates. The problem lasted indefinitely and all 
>>
>> collections seem to take this long.
>>
>> We have now applied the CMS collector to "kill" these symptoms, but we 
>>
>> still need to understand what was going on here.
>>
>> Do you have any suggestions on what the cause of this might be?
>>
>> The server is a Solaris 10 sparc machine with 16GB memory and 4 cpus. 
>>
>> The java version is 1.5.0_11
>>
>> These are the java parameters:
>> root at fp # pargs 10502
>>
>> 10502:  /usr/jdk/instances/jdk1.5.0/bin/sparcv9/java -Xmx12288m 
>> -Dmx=12288 -Djava.awt.h
>> argv[0]: /usr/jdk/instances/jdk1.5.0/bin/sparcv9/java
>> argv[1]: -Xmx12288m
>> argv[2]: -Dmx=12288
>> argv[3]: -Djava.awt.headless=true
>> argv[4]: -Dfile.encoding=UTF-8
>> argv[5]: -Djava.library.path=/opt/FOCAfp/lib
>> argv[6]: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>> argv[7]: -Djava.endorsed.dirs=/opt/FOCAtomcat/common/endorsed
>> argv[8]: -classpath
>> argv[9]: 
>> :/opt/FOCAtomcat/bin/bootstrap.jar:/opt/FOCAtomcat/bin/commons-logging-api.jar
>> argv[10]: -Dcatalina.base=/opt/FOCAtomcat
>> argv[11]: -Dcatalina.home=/opt/FOCAtomcat
>> argv[12]: -Djava.io.tmpdir=/opt/FOCAtomcat/temp
>> argv[13]: org.apache.catalina.startup.Bootstrap
>> argv[14]: start
>>
>>
>> Best regards,
>> Karl Rosenbaum
>> _______________________________________________
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
_______________________________________________
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use



More information about the hotspot-gc-dev mailing list