Free ratio based heap shrinking in the parallel collector
Hiroshi Yamauchi
yamauchi at google.com
Tue May 18 00:26:45 UTC 2010
On Thu, May 13, 2010 at 10:44 PM, Jon Masamitsu
<jon.masamitsu at oracle.com> wrote:
> On 5/13/10 4:34 PM, Hiroshi Yamauchi wrote:
>>
>> I assume by 'close' you mean between UseSerialGC and UseParallelGC
>> with -UseAdaptiveSizePolicy.
>>
>> I need more elaboration. How would you do the testing described above
>> (eg how/when to measure them and what to run)?
>>
>>
>
> I would start by running some benchmarks with your your changes and with
> UseSerialGC
> and looking at the number of GC that are done and what the heap looks like
> at
> the end of the execution. When PrintGCDetails is set, the heap is printed
> on exit. GCOld would be a good benchmark to start with because it comes
> to
> steady state quickly and runs a specified amount of work. I would expect
> that
> the number of GC's would be within 10% of each other. I would also hope
> the generations to be within 20% of each other. If not we should look at
> the details to see why the sizes are not closer. Let's see how that goes
> before
> deciding what else we should do.
>
> GChisto is a good tool to compare the GC counts. There are also some
> scripts available (PrintGCStats) to do the same if you prefer.
>
>
>
Unfortunately, GCOld wasn't available for download. I instead used the
eclipse benchmark from Dacapo (which seems to be the most GC intensive
in the suite). The PrintGCStats script was available.
I measured the following four configurations.
1. +UseSerialGC
The serial collector with its (limited) support of resizing by free
ratio. Only the old gen is resized by free ratio and shrinking is less
aggressive (due to the 'shrink_factor' which damps the shrinkage down
40% of the straight-forward way)
2. +UseParallelGC-UseAdaptiveSizePolicy-1
This is as per the latest patch.
3. +UseParallelGC-UseAdaptiveSizePolicy-2
Same as 2 except that the free ratio on a minor collection is
computed based on both young and old gens rather than just the young
gen. This effectively temporarily reverses the last change we made.
4. +UseParallelGC+UseAdaptiveSizePolicy
The parallel collector and (the default) UseAdaptiveSizePolicy.
The following is the heap size stats at the end of the execution and
the PrintGCStats output for each of the four configurations. (It's
easiest to read in a fixed-width font.) The heap size parameters (Xmx
and Xms) are unset. The common VM flags are -server -verbose:gc
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails. Disclaimer: I have run
each only once.
1. +UseSerialGC
Heap
def new generation total 19392K, used 4653K [0x70d90000,
0x72290000, 0x85c30000)
eden space 17280K, 22% used [0x70d90000, 0x7115b458, 0x71e70000)
from space 2112K, 36% used [0x72080000, 0x72140220, 0x72290000)
to space 2112K, 0% used [0x71e70000, 0x71e70000, 0x72080000)
tenured generation total 42880K, used 20727K [0x85c30000,
0x88610000, 0xaf990000)
the space 42880K, 48% used [0x85c30000, 0x8706de88, 0x8706e000, 0x88610000)
compacting perm gen total 16640K, used 16407K [0xaf990000,
0xb09d0000, 0xb3990000)
the space 16640K, 98% used [0xaf990000, 0xb0995f08, 0xb0996000, 0xb09d0000)
No shared spaces configured.
what count total mean max stddev
gen0(s) 183 0.602 0.00329 0.014 0.0019
gen0t(s) 183 0.000 0.00000 0.000 0.0000
gen1t(s) 4 0.000 0.00000 0.000 0.0000
GC(s) 187 0.000 0.00000 0.000 0.0000
alloc(MB) 183 3078.570 16.82279 16.875 0.1896
promo(MB) 183 97.285 0.53161 4.647 0.7827
alloc/elapsed_time = 3078.570 MB / 29.343 s = 104.917 MB/s
alloc/tot_cpu_time = 3078.570 MB / 117.372 s = 26.229 MB/s
alloc/mut_cpu_time = 3078.570 MB / 117.372 s = 26.229 MB/s
promo/elapsed_time = 97.285 MB / 29.343 s = 3.315 MB/s
promo/gc0_time = 97.285 MB / 0.000 s = 0.000 MB/s
gc_seq_load = 0.000 s / 117.372 s = 0.000%
gc_conc_load = 0.000 s / 117.372 s = 0.000%
gc_tot_load = 0.000 s / 117.372 s = 0.000%
2. +UseParallelGC-UseAdaptiveSizePolicy-1
Heap
PSYoungGen total 1600K, used 905K [0x9eb30000, 0x9fcf0000, 0xb39d0000)
eden space 1536K, 56% used [0x9eb30000,0x9ec0a540,0x9ecb0000)
from space 64K, 50% used [0x9fce0000,0x9fce8000,0x9fcf0000)
to space 512K, 0% used [0x9fbf0000,0x9fbf0000,0x9fc70000)
PSOldGen total 35456K, used 23449K [0x74dd0000, 0x77070000, 0x9eb30000)
object space 35456K, 66% used [0x74dd0000,0x764b65c8,0x77070000)
PSPermGen total 19968K, used 16423K [0x70dd0000, 0x72150000, 0x74dd0000)
object space 19968K, 82% used [0x70dd0000,0x71dd9e70,0x72150000)
what count total mean max stddev
gen0(s) 2102 0.000 0.00000 0.000 0.0000
gen1t(s) 15 0.000 0.00000 0.000 0.0000
GC(s) 2117 0.000 0.00000 0.000 0.0000
alloc(MB) 2102 3156.040 1.50145 15.750 0.3145
promo(MB) 2102 338.998 0.16127 1.718 0.2520
alloc/elapsed_time = 3156.040 MB / 35.017 s = 90.129 MB/s
alloc/tot_cpu_time = 3156.040 MB / 140.068 s = 22.532 MB/s
alloc/mut_cpu_time = 3156.040 MB / 140.068 s = 22.532 MB/s
promo/elapsed_time = 338.998 MB / 35.017 s = 9.681 MB/s
promo/gc0_time = 338.998 MB / 0.000 s = 0.000 MB/s
gc_seq_load = 0.000 s / 140.068 s = 0.000%
gc_conc_load = 0.000 s / 140.068 s = 0.000%
gc_tot_load = 0.000 s / 140.068 s = 0.000%
3. +UseParallelGC-UseAdaptiveSizePolicy-2
Heap
PSYoungGen total 10176K, used 7135K [0x9eb00000, 0x9fcc0000, 0xb39a0000)
eden space 10112K, 69% used [0x9eb00000,0x9f1e7c68,0x9f4e0000)
from space 64K, 100% used [0x9fca0000,0x9fcb0000,0x9fcb0000)
to space 64K, 0% used [0x9fcb0000,0x9fcb0000,0x9fcc0000)
PSOldGen total 16640K, used 12978K [0x74da0000, 0x75de0000, 0x9eb00000)
object space 16640K, 77% used [0x74da0000,0x75a4cbd8,0x75de0000)
PSPermGen total 18688K, used 16431K [0x70da0000, 0x71fe0000, 0x74da0000)
object space 18688K, 87% used [0x70da0000,0x71dabe60,0x71fe0000)
what count total mean max stddev
gen0(s) 341 0.000 0.00000 0.000 0.0000
gen1t(s) 18 0.000 0.00000 0.000 0.0000
GC(s) 359 0.000 0.00000 0.000 0.0000
alloc(MB) 341 3151.223 9.24112 17.625 2.6183
promo(MB) 341 170.810 0.50091 3.881 0.5584
alloc/elapsed_time = 3151.223 MB / 29.328 s = 107.448 MB/s
alloc/tot_cpu_time = 3151.223 MB / 117.312 s = 26.862 MB/s
alloc/mut_cpu_time = 3151.223 MB / 117.312 s = 26.862 MB/s
promo/elapsed_time = 170.810 MB / 29.328 s = 5.824 MB/s
promo/gc0_time = 170.810 MB / 0.000 s = 0.000 MB/s
gc_seq_load = 0.000 s / 117.312 s = 0.000%
gc_conc_load = 0.000 s / 117.312 s = 0.000%
gc_tot_load = 0.000 s / 117.312 s = 0.000%
4. +UseParallelGC+UseAdaptiveSizePolicy
Heap
PSYoungGen total 288832K, used 47438K [0x9ea50000, 0xb27a0000, 0xb38f0000)
eden space 282176K, 14% used [0x9ea50000,0xa1226bc8,0xafde0000)
from space 6656K, 99% used [0xb1a60000,0xb20dd000,0xb20e0000)
to space 6912K, 0% used [0xb20e0000,0xb20e0000,0xb27a0000)
PSOldGen total 42880K, used 20044K [0x74cf0000, 0x776d0000, 0x9ea50000)
object space 42880K, 46% used [0x74cf0000,0x760831e0,0x776d0000)
PSPermGen total 34560K, used 16446K [0x70cf0000, 0x72eb0000, 0x74cf0000)
object space 34560K, 47% used [0x70cf0000,0x71cffbf8,0x72eb0000)
what count total mean max stddev
gen0(s) 29 0.000 0.00000 0.000 0.0000
gen1t(s) 2 0.000 0.00000 0.000 0.0000
GC(s) 31 0.000 0.00000 0.000 0.0000
alloc(MB) 29 3122.054 107.65702 318.750 110.5556
promo(MB) 29 57.154 1.97084 6.962 2.2706
alloc/elapsed_time = 3122.054 MB / 28.569 s = 109.281 MB/s
alloc/tot_cpu_time = 3122.054 MB / 114.276 s = 27.320 MB/s
alloc/mut_cpu_time = 3122.054 MB / 114.276 s = 27.320 MB/s
promo/elapsed_time = 57.154 MB / 28.569 s = 2.001 MB/s
promo/gc0_time = 57.154 MB / 0.000 s = 0.000 MB/s
gc_seq_load = 0.000 s / 114.276 s = 0.000%
gc_conc_load = 0.000 s / 114.276 s = 0.000%
gc_tot_load = 0.000 s / 114.276 s = 0.000%
There may be some issue in parsing the GC times in the script since
they are mostly zeros for some reason.
Config 2 caused a lot more GCs to happen than config 1. This appears
to be because the size of the young gen stayed mostly at ~2 MB
compared to ~20 MB in config 1 due to the (more aggressive) shrinking.
I can imagine that if the free ratio is solely based on the young gen,
this could happen. Config 3 caused fewer GCs than config 2 (still
about 2x config 1). But the elapsed time was on par with config 1.
Yet, config 3 ended with the smallest heap size (~30 MB). Config 4
caused fewest GCs. It was the fastest (by a few %) but ended with the
largest heap size (> 300 MB).
Anyway, the results of config 2 and 3 are out of the ranges that Jon mentioned.
Hiroshi
More information about the hotspot-gc-dev
mailing list