From charles.nutter at sun.com  Tue Sep  2 13:06:05 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Tue, 02 Sep 2008 15:06:05 -0500
Subject: NewRatio: to twiddle or not to twiddle
Message-ID: <48BD9CAD.9000100@sun.com>

Moving the discussion here on Paul's recommendation.

Short version: JRuby achieves better performance using NewRatio=1. Paul 
recommends also looking at MaxTenuringThreshold and SurvivorRatio. 
There's a dearth of information on the interwebs about the best GC 
settings (and HotSpot settings in general) for running object-intensive 
dynamic languages like Ruby. Time to change that.

My original email:
 > I've ben playing with JRuby on various benchmarks recently and found
 > that several object-intensive scripts run better if I set NewRatio=1.
 > Ruby, even more than Java, tends to generate lots and lots of objects,
 > especially considering that there's no unboxed primitive numeric types
 > (and no fixnums on the JVM...ahem ahem). So my general theory is that:
 >
 > 1. A NewRatio of 1 allows all those extra transient objects to get
 > collected more quickly.
 > 2. Too small a "new" generation causes transient objects to get shoved
 > into older generations, potentially snowballing and forcing more
 > comprehensive GC runs as time goes on.
 >
 > I'm curious whether this theory sounds reasonable, whether there's a
 > better way I can adapt hotspot to the memory demands of a dynamic
 > language like Ruby, and what other implications there are in setting
 > NewRatio to 1.
 >
 > Thoughts?
 >
 > (And please let me know if there's a better list to post this sort of
 > question to)

Paul Hohensee's response:

You probably want hotspot-gc-use at openjdk.java.net.

A NewRatio of 1 means that half the heap is young gen.  The default
for the server vm is 2 (1/3 the heap is young gen) and for the client
vm it's 12 on x86 and 8 on sparc.  I.e., much smaller young gen for
client.

1. A NewRatio of 1 doesn't allow transient objects to get collected more
quickly, it allows them time to die in the young gen and thus not get
collected at all.  The bigger the young gen, the longer the interval
between collections, and the more time for transient objects to die.

2. Is correct.

You can increase the amount of time transient objects have to die by
increasing MaxTenuringThreshold.  It can have values between 0 and 15,
and is the number of times an object must survive a young gen collection
before being promoted to the old gen.  The survivor spaces in the young gen
must be large enough to contain everything that's live in the young gen,
else
the excess gets promoted before its time.  So check out lower values of
SurvivorRatio as well: lower values increase the size of the survivor
spaces.

Paul


From charles.nutter at sun.com  Tue Sep  2 13:20:23 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Tue, 02 Sep 2008 15:20:23 -0500
Subject: NewRatio: to twiddle or not to twiddle
In-Reply-To: <48BD9D79.9060709@Sun.COM>
References: <48BD8ADF.6070302@sun.com> <48BD9D79.9060709@Sun.COM>
Message-ID: <48BDA007.6010200@sun.com>

Cross-posting to hotspot-gc-use...

I guess a certain portion of the challenge for language impls like JRuby 
is similar to the challenge you all face as JVM implementers: finding 
typical applications, typical loads, and appropriate defaults when our 
heads are usually buried two or three levels of abstraction beneath such 
applications.

-Xmn is a good one I had forgotten about, and answers my question about 
how to increase NewRatio beyond 1. I guess there's really two tasks I'm 
looking into right now:

1. Discovering appropriate flags to tweak and "more appropriate" 
defaults dynlangs might want to try
2. Exploring real-world dynlang applications and loads to refine those 
better/best-guess defaults

I'd say this round of emails is focused mostly on 1, since 2 is going to 
vary more across languages. And I think we can only start to explore 2 
iff we know 1. So -Xmn goes on the list alongside NewRatio.

Perhaps a brief illustration of the adaptive capabilities of the other 
collectors would be useful here? Obviously for server applications we'd 
hope HotSpot "does the right thing" for us already. Then perhaps we can 
learn from that (and please, teach me how to learn, including better 
tools and switches I can use to get the necessary GC metrics) and apply 
what we've learned to command-line/client runs (which will be more 
transient but not asimilar to the server runs in a language like Ruby).

- Charlie

Peter B. Kessler wrote:
> Why limit yourself to NewRatio?  The best you can get that way is half 
> the heap for the young generation.  If you really want to a big young 
> generation (to give your temporary objects time to die without even 
> being looked at by the collector), use -Xmn (or -XX:NewSize= and 
> -XX:MaxNewSize=) to set it directly.  Figure out what your live data 
> size is and use that as the base size for the old generation.  Then 
> figure out what kinds of pauses the young generation collections impose, 
> and how much they promote, then amortize the eventual old generation 
> collection time over as many young generation collections as you can 
> give space to in the old generation.  Then make your total heap (-Xmx) 
> as big as you can afford to get as big a young generation as that will 
> allow.
> 
>             ... peter
> 
> Charles Oliver Nutter wrote:
>> I've ben playing with JRuby on various benchmarks recently and found 
>> that several object-intensive scripts run better if I set NewRatio=1. 
>> Ruby, even more than Java, tends to generate lots and lots of objects, 
>> especially considering that there's no unboxed primitive numeric types 
>> (and no fixnums on the JVM...ahem ahem). So my general theory is that:
>>
>> 1. A NewRatio of 1 allows all those extra transient objects to get 
>> collected more quickly.
>> 2. Too small a "new" generation causes transient objects to get shoved 
>> into older generations, potentially snowballing and forcing more 
>> comprehensive GC runs as time goes on.
>>
>> I'm curious whether this theory sounds reasonable, whether there's a 
>> better way I can adapt hotspot to the memory demands of a dynamic 
>> language like Ruby, and what other implications there are in setting 
>> NewRatio to 1.
>>
>> Thoughts?
>>
>> (And please let me know if there's a better list to post this sort of 
>> question to)
>>
>> - Charlie
> 


From kirk.pepperdine at gmail.com  Tue Sep  2 13:48:36 2008
From: kirk.pepperdine at gmail.com (kirk)
Date: Tue, 02 Sep 2008 22:48:36 +0200
Subject: NewRatio: to twiddle or not to twiddle
In-Reply-To: <48BDA007.6010200@sun.com>
References: <48BD8ADF.6070302@sun.com> <48BD9D79.9060709@Sun.COM>
	<48BDA007.6010200@sun.com>
Message-ID: <48BDA6A4.2040007@javaperformancetuning.com>

Hi Charles,

Peter, awesome explination. I use hpjmeter right now to do very gross 
analysis on gc logs. It doesn't tolerate all the Sun non-standard 
non-standard switches (-XX and as apposed to the standard non-standard 
switches -X) ;-) I've also signed up with Tony to work in gchisto, a 
tool that should be more tolerant to Sun gc logs (is there anyway we can 
settle on a standard format???) though that tool is still very limited. 
Eventually it should get rolled into visualvm but that is a future.

What I'm finding is that techniques that focus more on evacuation seem 
to work better than those that focus on collection compaction. This is 
one of the reasons I'm very interested in seeing G1 released into the 
wild. I believe that G1 will mix very well with dynamic languages 
because it primary logic is on evacuation. With G1, I don't believe it 
will matter if old fills up with transient objects that maybe should 
have died in young. In fact I'd bet money that G1 will excel under these 
conditions because old spaces will be evacuated based on who has the 
lowest live ratios. The biggest draw back that I can see will be copying 
costs. Even so, expect a win.

>
> -Xmn is a good one I had forgotten about, and answers my question 
> about how to increase NewRatio beyond 1.
use a combo of -Xmx and NewSize. That should pin things down.

- Kirk

> I guess there's really two tasks I'm looking into right now:
>
> 1. Discovering appropriate flags to tweak and "more appropriate" 
> defaults dynlangs might want to try
> 2. Exploring real-world dynlang applications and loads to refine those 
> better/best-guess defaults
>
> I'd say this round of emails is focused mostly on 1, since 2 is going 
> to vary more across languages. And I think we can only start to 
> explore 2 iff we know 1. So -Xmn goes on the list alongside NewRatio.
>
> Perhaps a brief illustration of the adaptive capabilities of the other 
> collectors would be useful here? Obviously for server applications 
> we'd hope HotSpot "does the right thing" for us already. Then perhaps 
> we can learn from that (and please, teach me how to learn, including 
> better tools and switches I can use to get the necessary GC metrics) 
> and apply what we've learned to command-line/client runs (which will 
> be more transient but not asimilar to the server runs in a language 
> like Ruby).
>
> - Charlie
>
> Peter B. Kessler wrote:
>> Why limit yourself to NewRatio?  The best you can get that way is 
>> half the heap for the young generation.  If you really want to a big 
>> young generation (to give your temporary objects time to die without 
>> even being looked at by the collector), use -Xmn (or -XX:NewSize= and 
>> -XX:MaxNewSize=) to set it directly.  Figure out what your live data 
>> size is and use that as the base size for the old generation.  Then 
>> figure out what kinds of pauses the young generation collections 
>> impose, and how much they promote, then amortize the eventual old 
>> generation collection time over as many young generation collections 
>> as you can give space to in the old generation.  Then make your total 
>> heap (-Xmx) as big as you can afford to get as big a young generation 
>> as that will allow.
>>
>>             ... peter
>>
>> Charles Oliver Nutter wrote:
>>> I've ben playing with JRuby on various benchmarks recently and found 
>>> that several object-intensive scripts run better if I set 
>>> NewRatio=1. Ruby, even more than Java, tends to generate lots and 
>>> lots of objects, especially considering that there's no unboxed 
>>> primitive numeric types (and no fixnums on the JVM...ahem ahem). So 
>>> my general theory is that:
>>>
>>> 1. A NewRatio of 1 allows all those extra transient objects to get 
>>> collected more quickly.
>>> 2. Too small a "new" generation causes transient objects to get 
>>> shoved into older generations, potentially snowballing and forcing 
>>> more comprehensive GC runs as time goes on.
>>>
>>> I'm curious whether this theory sounds reasonable, whether there's a 
>>> better way I can adapt hotspot to the memory demands of a dynamic 
>>> language like Ruby, and what other implications there are in setting 
>>> NewRatio to 1.
>>>
>>> Thoughts?
>>>
>>> (And please let me know if there's a better list to post this sort 
>>> of question to)
>>>
>>> - Charlie
>>
>
>


From charles.nutter at sun.com  Tue Sep  2 14:11:42 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Tue, 02 Sep 2008 16:11:42 -0500
Subject: NewRatio: to twiddle or not to twiddle
In-Reply-To: <48BDA6A4.2040007@javaperformancetuning.com>
References: <48BD8ADF.6070302@sun.com> <48BD9D79.9060709@Sun.COM>
	<48BDA007.6010200@sun.com> <48BDA6A4.2040007@javaperformancetuning.com>
Message-ID: <48BDAC0E.40305@sun.com>

kirk wrote:
> Hi Charles,
> 
> Peter, awesome explination. I use hpjmeter right now to do very gross 
> analysis on gc logs. It doesn't tolerate all the Sun non-standard 
> non-standard switches (-XX and as apposed to the standard non-standard 
> switches -X) ;-) I've also signed up with Tony to work in gchisto, a 
> tool that should be more tolerant to Sun gc logs (is there anyway we can 
> settle on a standard format???) though that tool is still very limited. 
> Eventually it should get rolled into visualvm but that is a future.

If not a standard format, at least a version-frozen format with a 
header, eh?

I'll have to look into hpjmeter. At the moment I think half our 
"problems" on JRuby come from lack of experience with the right tools to 
gain visibility into the JVM. Oh, and lack of resources; two guys does 
not a language implementation team make.

> What I'm finding is that techniques that focus more on evacuation seem 
> to work better than those that focus on collection compaction. This is 
> one of the reasons I'm very interested in seeing G1 released into the 
> wild. I believe that G1 will mix very well with dynamic languages 
> because it primary logic is on evacuation. With G1, I don't believe it 
> will matter if old fills up with transient objects that maybe should 
> have died in young. In fact I'd bet money that G1 will excel under these 
> conditions because old spaces will be evacuated based on who has the 
> lowest live ratios. The biggest draw back that I can see will be copying 
> costs. Even so, expect a win.

If it's true that G1 (what is G1?) wouldn't care about transient objects 
getting promoted, I'd be interested in that as well. Overall, though, I 
think the exercise of exploring memory usage patterns to reduce such 
promotions will be very important, since there's going to be a lot of 
users on less helpful JVM versions, hotspot or otherwise.

- Charlie


From charles.nutter at sun.com  Tue Sep  2 14:26:56 2008
From: charles.nutter at sun.com (Charles Oliver Nutter)
Date: Tue, 02 Sep 2008 16:26:56 -0500
Subject: NewRatio: to twiddle or not to twiddle
In-Reply-To: <48BDAC0E.40305@sun.com>
References: <48BD8ADF.6070302@sun.com> <48BD9D79.9060709@Sun.COM>
	<48BDA007.6010200@sun.com> <48BDA6A4.2040007@javaperformancetuning.com>
	<48BDAC0E.40305@sun.com>
Message-ID: <48BDAFA0.2040900@sun.com>

Charles Oliver Nutter wrote:
> If it's true that G1 (what is G1?) wouldn't care about transient objects 

Never mind this question...I found that G1 is the Garbage First 
collector. I'll read the paper.

- Charlie


From justin at techadvise.com  Thu Sep 11 09:51:02 2008
From: justin at techadvise.com (Justin Ellison)
Date: Thu, 11 Sep 2008 11:51:02 -0500
Subject: Reducing CMS-remark times
Message-ID: <c49ca9a00809110951y23ae5b6fva30dd03a8d504e23@mail.gmail.com>

Hi all,

Running a Weblogic jsp application on 1.4.2_17 on 32bit Sparc under
Solaris 9.  Here are the args:
-Xms2304m -Xmx2304m -XX:NewSize=384m -XX:MaxNewSize=384m
-XX:CompileThreshold=3000 -Djava.net.setSoTimeout=20000
-XX:LargePageSizeInBytes=4m -XX:+UseMPSS -Xss128k
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-Xnoclassgc -XX:ParallelGCThreads=4 -XX:MaxTenuringThreshold=8
-XX:SurvivorRatio=6 -XX:+UseCMSCompactAtFullCollection -Xloggc:gc.out
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-XX:MaxPermSize=92m

I'm seeing CMS-remark pauses that get longer and longer as time goes
on, some getting to be 3 seconds or more.

It's apples to oranges, I know, but my Core 2 Duo laptop running
Ubuntu 8.04 and 1.4.2_18 with the same arguments under a load
generator performs consistently better than production.  The
production application is on a Sun Fire v490 - it should run circles
around my laptop.

Here's the output of PrintGCStats against my laptop:
what         count       total      mean      max   stddev
gen0(s)       8129    1337.382   0.16452    0.812   0.0663
gen0t(s)      8129    1341.128   0.16498    1.132   0.0676
cmsIM(s)      4504     896.872   0.19913    0.542   0.1039
cmsRM(s)      4485    2129.909   0.47490    2.121   0.2060
GC(s)        12633    4367.908   0.34575    2.121   0.1184
cmsCM(s)      4503   37508.595   8.32969   45.481   1.5792
cmsCP(s)      4490    1158.141   0.25794   26.034   0.6600
cmsCS(s)      4485    6408.843   1.42895   11.013   0.3527
cmsCR(s)      4480     162.617   0.03630    0.441   0.0128
alloc(MB)     8129  2340385.405  287.90570  288.000   1.6059
promo(MB)     8129   30006.814   3.69133   73.405   4.4811

and against the v490:
bmapp2d-gc.out

what         count       total      mean      max   stddev
gen0(s)       4739     917.710   0.19365    0.765   0.0483
gen0t(s)      4739     919.012   0.19393    0.765   0.0483
cmsIM(s)        34       4.421   0.13003    0.510   0.0728
cmsRM(s)        34      52.707   1.55019    2.327   0.4003
GC(s)         4773     976.140   0.20451    2.327   0.1273
cmsCM(s)        34     455.876  13.40812   26.797   2.9626
cmsCP(s)        34       5.279   0.15526    0.285   0.0572
cmsCS(s)        34     265.217   7.80050    8.741   0.6182
cmsCR(s)        34       8.585   0.25250    0.295   0.0137
alloc(MB)     4739  1364784.244  287.98992  288.000   0.0525
promo(MB)     4739   15492.045   3.26905   19.177   3.1992

alloc/elapsed_time    = 1364784.244 MB /  77242.704 s  =  17.669 MB/s
alloc/tot_cpu_time    = 1364784.244 MB / 617941.632 s  =   2.209 MB/s
alloc/mut_cpu_time    = 1364784.244 MB / 609397.559 s  =   2.240 MB/s
promo/elapsed_time    =  15492.045 MB /  77242.704 s  =   0.201 MB/s
promo/gc0_time        =  15492.045 MB /    919.012 s  =  16.857 MB/s
gc_seq_load           =   7809.116 s  / 617941.632 s  =   1.264%
gc_conc_load          =    734.957 s  / 617941.632 s  =   0.119%
gc_tot_load           =   8544.073 s  / 617941.632 s  =   1.383%

Especially of note it the difference in mean remark times between the
two above.  I've placed full log snippets at the bottom of this email.

Here's my current trains of thought:

a) My load generator is not very close to real-world load.
b) There are some OS level tunables that need set on the v490
c) There is a bug in 1.4.2_17 that's biting me.
d) I'm not getting concurrency in the remark phase (which would
explain my dual core laptop keeping up with my 8 core server)
d) I'm running into what Jon is describing here:
http://blogs.sun.com/jonthecollector/entry/did_you_know regarding
CMSMaxAbortablePrecleanTime.  I have no idea how I can resolve this on
1.4.2 if that's the case.  Perhaps shrink my New???

Bah!  I've been working on this for too long, and I'm going in
circles.  Can anyone offer up any insights?

Justin


Production server:
....
18959.886: [GC 18959.886: [ParNew: 322704K->29504K(344064K), 0.2497164
secs] 1912832K->1624273K(2310144K), 0.2498432 secs]
18968.737: [GC 18968.737: [ParNew: 324416K->22709K(344064K), 0.2113128
secs] 1919185K->1625475K(2310144K), 0.2114685 secs]
18981.220: [GC 18981.220: [ParNew: 317621K->29164K(344064K), 0.1744130
secs] 1920387K->1631931K(2310144K), 0.1746267 secs]
18987.149: [GC 18987.149: [ParNew: 324076K->24086K(344064K), 0.2326857
secs] 1926843K->1634489K(2310144K), 0.2328768 secs]
18987.382: [GC [1 CMS-initial-mark: 1610403K(1966080K)]
1634489K(2310144K), 0.1155761 secs]
18987.498: [CMS-concurrent-mark-start]
18999.599: [CMS-concurrent-mark: 12.101/12.101 secs]
18999.599: [CMS-concurrent-preclean-start]
18999.698: [CMS-concurrent-preclean: 0.094/0.099 secs]
18999.700: [GC18999.700: [Rescan (parallel) , 1.0845970
secs]19000.785: [weak refs processing, 1.0721075 secs] [1 CMS-remark:
1610403K(1966080K)] 1894660K(2310144K), 2.1578777 secs]
19001.858: [CMS-concurrent-sweep-start]
19002.043: [GC 19002.043: [ParNew: 318998K->28072K(344064K), 0.1886607
secs] 1926098K->1635173K(2310144K), 0.1888124 secs]
19009.800: [CMS-concurrent-sweep: 7.751/7.942 secs]
19009.800: [CMS-concurrent-reset-start]
19010.051: [CMS-concurrent-reset: 0.250/0.250 secs]
19015.384: [GC 19015.384: [ParNew: 322984K->30317K(344064K), 0.2178368
secs] 1496676K->1207838K(2310144K), 0.2179773 secs]
19025.841: [GC 19025.842: [ParNew: 325229K->27691K(344064K), 0.2313683
secs] 1502750K->1213013K(2310144K), 0.2315669 secs]
....

Laptop:
12436.567: [GC 12436.567: [ParNew: 322821K->28170K(344064K), 0.4482710
secs] 1820614K->1532726K(2310144K), 0.4483510 secs]
12438.456: [GC 12438.456: [ParNew: 323082K->27352K(344064K), 0.3058890
secs] 1827638K->1537746K(2310144K), 0.3059690 secs]
12440.072: [GC 12440.072: [ParNew: 322264K->22917K(344064K), 0.2582750
secs] 1832658K->1543612K(2310144K), 0.2583510 secs]
12441.594: [GC 12441.594: [ParNew: 317829K->31252K(344064K), 0.2623760
secs] 1838524K->1551947K(2310144K), 0.2624510 secs]
12443.136: [GC 12443.137: [ParNew: 326164K->49152K(344064K), 0.5364920
secs] 1846859K->1594211K(2310144K), 0.5365690 secs]
12443.694: [GC [1 CMS-initial-mark: 1545059K(1966080K)]
1597435K(2310144K), 0.2112670 secs]
12443.905: [CMS-concurrent-mark-start]
12445.287: [GC 12445.287: [ParNew: 344063K->12141K(344064K), 0.5005420
secs] 1889123K->1604975K(2310144K), 0.5006200 secs]
12447.348: [GC 12447.348: [ParNew: 307053K->21747K(344064K), 0.2396110
secs] 1899887K->1614581K(2310144K), 0.2396900 secs]
12449.061: [GC 12449.061: [ParNew: 316659K->29249K(344064K), 0.2359550
secs] 1909493K->1622083K(2310144K), 0.2360330 secs]
12450.740: [GC 12450.740: [ParNew: 324161K->30724K(344064K), 0.3210140
secs] 1916995K->1631546K(2310144K), 0.3210850 secs]
12452.539: [GC 12452.539: [ParNew: 325636K->28577K(344064K), 0.3849060
secs] 1926458K->1639437K(2310144K), 0.3849890 secs]
12454.548: [GC 12454.548: [ParNew: 323489K->30308K(344064K), 0.3750850
secs] 1934349K->1650841K(2310144K), 0.3751660 secs]
12455.880: [CMS-concurrent-mark: 9.839/11.975 secs]
12455.880: [CMS-concurrent-preclean-start]
12457.341: [CMS-concurrent-preclean: 1.200/1.461 secs]
12457.363: [GC12457.363: [Rescan (parallel) , 1.9972650
secs]12459.360: [weak refs processing, 0.1239550 secs] [1 CMS-remark:
1620532K(1966080K)] 1941855K(2310144K), 2.1214010 secs]
12459.499: [CMS-concurrent-sweep-start]
12459.509: [GC 12459.509: [ParNew: 325220K->44523K(344064K), 0.4837280
secs] 1945752K->1673386K(2310144K), 0.4838110 secs]
12461.460: [CMS-concurrent-sweep: 1.455/1.961 secs]
12461.461: [CMS-concurrent-reset-start]
12461.496: [CMS-concurrent-reset: 0.036/0.036 secs]
12462.802: [GC 12462.802: [ParNew: 339435K->49152K(344064K), 0.6503040
secs] 1936692K->1669076K(2310144K), 0.6503840 secs]


From Y.S.Ramakrishna at Sun.COM  Thu Sep 11 12:25:20 2008
From: Y.S.Ramakrishna at Sun.COM (Y Srinivas Ramakrishna)
Date: Thu, 11 Sep 2008 12:25:20 -0700
Subject: Reducing CMS-remark times
In-Reply-To: <c49ca9a00809110951y23ae5b6fva30dd03a8d504e23@mail.gmail.com>
References: <c49ca9a00809110951y23ae5b6fva30dd03a8d504e23@mail.gmail.com>
Message-ID: <fccc9986557b.48c90e30@sun.com>


Hi Justin --

> Here's my current trains of thought:
> 
> a) My load generator is not very close to real-world load.

Probably :)

> b) There are some OS level tunables that need set on the v490

I don't think those would account for the diff. I know of no
specific OS tunables in this case.

> c) There is a bug in 1.4.2_17 that's biting me.

Unlikely.

> d) I'm not getting concurrency in the remark phase (which would
> explain my dual core laptop keeping up with my 8 core server)

The remark phase has never been "concurrent" (at least not in the sense
of being concurrent with mutators); instead it's "stop-the-world".
The application threads are all stopped while the remark phase proceeds,
which is why long remark pauses hurt a lot.

> d) I'm running into what Jon is describing here:
> http://blogs.sun.com/jonthecollector/entry/did_you_know regarding
> CMSMaxAbortablePrecleanTime.  I have no idea how I can resolve this on
> 1.4.2 if that's the case.  Perhaps shrink my New???

Right. The best way to see evidence of this is to use -XX:PrintCMSStatistics=1.
You will, in the remark phase, see details on how much time is spent in
which worker thread, and you'll notice that one of the threads takes extremely
long. This is the thread scanning a large monolithic Eden, which becomes
a serial bottleneck for this phase.

The parallelization of Eden scanning was implemented in 5.0, so
your best bet is to upgrade to a newer jvm if possible, and failing
that to use a smaller young gen.

Another optional tuning switch introduced in 5uXX was +CMSScavengeBeforeRemark
which does a scavenge which empties Eden imediately before doing a
remark. This makes what was the critical task here into a zero-length
task, and converts that work into more dirty cards in the old gen which
are scanned in parallel, leading to a reduction in the pause time.
Unfortunately, that's also only available in 5uXX for the first time.

All the extra processors on yr v490 cannot help for a large serial task
in this case (1.4.2_XX).

That said, if you compare the remark phases for the two snippets you give
also indicate that yr production system also sees a large "weak reference"
processing time, which is not the case on yr laptop.
I suspect that might be the result of a difference between yr production
load and yr synthetic load on the laptop. This again is a serial phase
by default that the extra cpus on the v490 do not help on.

This phase was optimized some on 5uXX (by adding a parallel reference
processing option), and further in 6uXX (by enhancing precleaning of
discovered references). I don;t believe either of those optimizations
was ever backported to 1.4.2_XX.

1.4.2_XX is really quite old and it shows its age, wrt modern
servers with much parallelism.

-- ramki


> Production server:
> ....

>
> 18999.700: [GC18999.700: [Rescan (parallel) , 1.0845970
> secs]19000.785: [weak refs processing, 1.0721075 secs] [1 CMS-remark:
> 1610403K(1966080K)] 1894660K(2310144K), 2.1578777 secs]


> Laptop:

> 12457.363: [GC12457.363: [Rescan (parallel) , 1.9972650
> secs]12459.360: [weak refs processing, 0.1239550 secs] [1 CMS-remark:
> 1620532K(1966080K)] 1941855K(2310144K), 2.1214010 secs]


From chansler at yahoo-inc.com  Thu Sep 18 11:52:12 2008
From: chansler at yahoo-inc.com (Robert Chansler)
Date: Thu, 18 Sep 2008 11:52:12 -0700
Subject: 15 Minute Garbage Collection (promotion failed)
In-Reply-To: <E58BD09B-1D2B-40D1-80D5-D04CF246EEB9@yahoo-inc.com>
Message-ID: <C4F7F16C.C907%chansler@yahoo-inc.com>

Yes, I?m the Rob Chansler of  the forum item. Thanks to Peter and Ramki for
the early analysis.

The gc logs are separate from from other system and user logs, so I?m
comfortable sharing them.

One of the questions from the forum was about the many days without a CMS
collection. (I think I searched the logs properly!) I do not recall such
long intervals when I previously examined the logs. I?ll peek to see if the
intervals are different on different clusters.


On 18 09 08 8:50, "Sanjay Radia" <sradia at yahoo-inc.com> wrote:

> 
> 
> Yes,
>   Robert Chansler did post on a GC problem. I cam cc'ing him on this.
> 
> sanjay
> BTW I have seeded the getting a service contract for Java from Sun a short
> while ago.
> 
> 
> 
> On Sep 12, 2008, at 10:21 AM, Peter B. Kessler wrote:
> 
>>  
>>  
>> 
>> This post
>>  
>>       http://forums.sun.com/thread.jspa?threadID=5330789
>>  
>>  appeared on our JVM forum.  Ramki (our CMS expert) has already answered
>>  the post (sort of), but would like to get more of the logs, discuss
>>  issues and solutions, etc.  Is that post from Rob Chansler at Yahoo?  I
>>  don't have his email address.  Would you be willing to give me his
>>  address?  Thanks.
>>  
>>                          ... peter
>>  
>>  P.S. I can't promise that we won't try to sell him a service contract.
>>  :-)  But in the meantime, we can offer him some support.
>>   
>>  
>>  
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/attachments/20080918/ce260296/attachment.html