Free ratio based heap shrinking in the parallel collector

Hiroshi Yamauchi yamauchi at google.com
Tue May 18 00:26:45 UTC 2010


On Thu, May 13, 2010 at 10:44 PM, Jon Masamitsu
<jon.masamitsu at oracle.com> wrote:
> On 5/13/10 4:34 PM, Hiroshi Yamauchi wrote:
>>
>> I assume by 'close' you mean between UseSerialGC and UseParallelGC
>> with -UseAdaptiveSizePolicy.
>>
>> I need more elaboration. How would you do the testing described above
>> (eg how/when to measure them and what to run)?
>>
>>
>
> I would start by running some benchmarks with your your changes and with
> UseSerialGC
> and looking at the number of GC that are done and what the heap looks like
> at
> the end of the execution.  When PrintGCDetails is set, the heap is printed
> on exit.   GCOld would be  a good benchmark to start with because it comes
> to
> steady state quickly and runs a specified amount of work.  I would expect
> that
> the number of GC's would be within 10% of each other.   I would also hope
> the generations to be within 20% of each other.  If not we should look at
> the details to see why the sizes are not closer.   Let's see how that goes
> before
> deciding what else we should do.
>
> GChisto is a good tool to compare the GC counts.  There are also some
> scripts available (PrintGCStats) to do the same if you prefer.
>
>
>

Unfortunately, GCOld wasn't available for download. I instead used the
eclipse benchmark from Dacapo (which seems to be the most GC intensive
in the suite). The PrintGCStats script was available.

I measured the following four configurations.

1. +UseSerialGC
  The serial collector with its (limited) support of resizing by free
ratio. Only the old gen is resized by free ratio and shrinking is less
aggressive (due to the 'shrink_factor' which damps the shrinkage down
40% of the straight-forward way)
2. +UseParallelGC-UseAdaptiveSizePolicy-1
  This is as per the latest patch.
3. +UseParallelGC-UseAdaptiveSizePolicy-2
  Same as 2 except that the free ratio on a minor collection is
computed based on both young and old gens rather than just the young
gen. This effectively temporarily reverses the last change we made.
4. +UseParallelGC+UseAdaptiveSizePolicy
  The parallel collector and (the default) UseAdaptiveSizePolicy.

The following is the heap size stats at the end of the execution and
the PrintGCStats output for each of the four configurations. (It's
easiest to read in a fixed-width font.) The heap size parameters (Xmx
and Xms) are unset. The common VM flags are -server -verbose:gc
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails. Disclaimer: I have run
each only once.

1. +UseSerialGC

Heap
 def new generation   total 19392K, used 4653K [0x70d90000,
0x72290000, 0x85c30000)
  eden space 17280K,  22% used [0x70d90000, 0x7115b458, 0x71e70000)
  from space 2112K,  36% used [0x72080000, 0x72140220, 0x72290000)
  to   space 2112K,   0% used [0x71e70000, 0x71e70000, 0x72080000)
 tenured generation   total 42880K, used 20727K [0x85c30000,
0x88610000, 0xaf990000)
   the space 42880K,  48% used [0x85c30000, 0x8706de88, 0x8706e000, 0x88610000)
 compacting perm gen  total 16640K, used 16407K [0xaf990000,
0xb09d0000, 0xb3990000)
   the space 16640K,  98% used [0xaf990000, 0xb0995f08, 0xb0996000, 0xb09d0000)
No shared spaces configured.

what         count       total      mean      max   stddev
gen0(s)        183       0.602   0.00329    0.014   0.0019
gen0t(s)       183       0.000   0.00000    0.000   0.0000
gen1t(s)         4       0.000   0.00000    0.000   0.0000
GC(s)          187       0.000   0.00000    0.000   0.0000
alloc(MB)      183    3078.570  16.82279   16.875   0.1896
promo(MB)      183      97.285   0.53161    4.647   0.7827

alloc/elapsed_time    =   3078.570 MB /     29.343 s  = 104.917 MB/s
alloc/tot_cpu_time    =   3078.570 MB /    117.372 s  =  26.229 MB/s
alloc/mut_cpu_time    =   3078.570 MB /    117.372 s  =  26.229 MB/s
promo/elapsed_time    =     97.285 MB /     29.343 s  =   3.315 MB/s
promo/gc0_time        =     97.285 MB /      0.000 s  =   0.000 MB/s
gc_seq_load           =      0.000 s  /    117.372 s  =   0.000%
gc_conc_load          =      0.000 s  /    117.372 s  =   0.000%
gc_tot_load           =      0.000 s  /    117.372 s  =   0.000%

2. +UseParallelGC-UseAdaptiveSizePolicy-1

Heap
 PSYoungGen      total 1600K, used 905K [0x9eb30000, 0x9fcf0000, 0xb39d0000)
  eden space 1536K, 56% used [0x9eb30000,0x9ec0a540,0x9ecb0000)
  from space 64K, 50% used [0x9fce0000,0x9fce8000,0x9fcf0000)
  to   space 512K, 0% used [0x9fbf0000,0x9fbf0000,0x9fc70000)
 PSOldGen        total 35456K, used 23449K [0x74dd0000, 0x77070000, 0x9eb30000)
  object space 35456K, 66% used [0x74dd0000,0x764b65c8,0x77070000)
 PSPermGen       total 19968K, used 16423K [0x70dd0000, 0x72150000, 0x74dd0000)
  object space 19968K, 82% used [0x70dd0000,0x71dd9e70,0x72150000)

what         count       total      mean      max   stddev
gen0(s)       2102       0.000   0.00000    0.000   0.0000
gen1t(s)        15       0.000   0.00000    0.000   0.0000
GC(s)         2117       0.000   0.00000    0.000   0.0000
alloc(MB)     2102    3156.040   1.50145   15.750   0.3145
promo(MB)     2102     338.998   0.16127    1.718   0.2520

alloc/elapsed_time    =   3156.040 MB /     35.017 s  =  90.129 MB/s
alloc/tot_cpu_time    =   3156.040 MB /    140.068 s  =  22.532 MB/s
alloc/mut_cpu_time    =   3156.040 MB /    140.068 s  =  22.532 MB/s
promo/elapsed_time    =    338.998 MB /     35.017 s  =   9.681 MB/s
promo/gc0_time        =    338.998 MB /      0.000 s  =   0.000 MB/s
gc_seq_load           =      0.000 s  /    140.068 s  =   0.000%
gc_conc_load          =      0.000 s  /    140.068 s  =   0.000%
gc_tot_load           =      0.000 s  /    140.068 s  =   0.000%

3. +UseParallelGC-UseAdaptiveSizePolicy-2

Heap
 PSYoungGen      total 10176K, used 7135K [0x9eb00000, 0x9fcc0000, 0xb39a0000)
  eden space 10112K, 69% used [0x9eb00000,0x9f1e7c68,0x9f4e0000)
  from space 64K, 100% used [0x9fca0000,0x9fcb0000,0x9fcb0000)
  to   space 64K, 0% used [0x9fcb0000,0x9fcb0000,0x9fcc0000)
 PSOldGen        total 16640K, used 12978K [0x74da0000, 0x75de0000, 0x9eb00000)
  object space 16640K, 77% used [0x74da0000,0x75a4cbd8,0x75de0000)
 PSPermGen       total 18688K, used 16431K [0x70da0000, 0x71fe0000, 0x74da0000)
  object space 18688K, 87% used [0x70da0000,0x71dabe60,0x71fe0000)

what         count       total      mean      max   stddev
gen0(s)        341       0.000   0.00000    0.000   0.0000
gen1t(s)        18       0.000   0.00000    0.000   0.0000
GC(s)          359       0.000   0.00000    0.000   0.0000
alloc(MB)      341    3151.223   9.24112   17.625   2.6183
promo(MB)      341     170.810   0.50091    3.881   0.5584

alloc/elapsed_time    =   3151.223 MB /     29.328 s  = 107.448 MB/s
alloc/tot_cpu_time    =   3151.223 MB /    117.312 s  =  26.862 MB/s
alloc/mut_cpu_time    =   3151.223 MB /    117.312 s  =  26.862 MB/s
promo/elapsed_time    =    170.810 MB /     29.328 s  =   5.824 MB/s
promo/gc0_time        =    170.810 MB /      0.000 s  =   0.000 MB/s
gc_seq_load           =      0.000 s  /    117.312 s  =   0.000%
gc_conc_load          =      0.000 s  /    117.312 s  =   0.000%
gc_tot_load           =      0.000 s  /    117.312 s  =   0.000%

4. +UseParallelGC+UseAdaptiveSizePolicy

Heap
 PSYoungGen      total 288832K, used 47438K [0x9ea50000, 0xb27a0000, 0xb38f0000)
  eden space 282176K, 14% used [0x9ea50000,0xa1226bc8,0xafde0000)
  from space 6656K, 99% used [0xb1a60000,0xb20dd000,0xb20e0000)
  to   space 6912K, 0% used [0xb20e0000,0xb20e0000,0xb27a0000)
 PSOldGen        total 42880K, used 20044K [0x74cf0000, 0x776d0000, 0x9ea50000)
  object space 42880K, 46% used [0x74cf0000,0x760831e0,0x776d0000)
 PSPermGen       total 34560K, used 16446K [0x70cf0000, 0x72eb0000, 0x74cf0000)
  object space 34560K, 47% used [0x70cf0000,0x71cffbf8,0x72eb0000)

what         count       total      mean      max   stddev
gen0(s)         29       0.000   0.00000    0.000   0.0000
gen1t(s)         2       0.000   0.00000    0.000   0.0000
GC(s)           31       0.000   0.00000    0.000   0.0000
alloc(MB)       29    3122.054  107.65702  318.750  110.5556
promo(MB)       29      57.154   1.97084    6.962   2.2706

alloc/elapsed_time    =   3122.054 MB /     28.569 s  = 109.281 MB/s
alloc/tot_cpu_time    =   3122.054 MB /    114.276 s  =  27.320 MB/s
alloc/mut_cpu_time    =   3122.054 MB /    114.276 s  =  27.320 MB/s
promo/elapsed_time    =     57.154 MB /     28.569 s  =   2.001 MB/s
promo/gc0_time        =     57.154 MB /      0.000 s  =   0.000 MB/s
gc_seq_load           =      0.000 s  /    114.276 s  =   0.000%
gc_conc_load          =      0.000 s  /    114.276 s  =   0.000%
gc_tot_load           =      0.000 s  /    114.276 s  =   0.000%

There may be some issue in parsing the GC times in the script since
they are mostly zeros for some reason.

Config 2 caused a lot more GCs to happen than config 1. This appears
to be because the size of the young gen stayed mostly at ~2 MB
compared to ~20 MB in config 1 due to the (more aggressive) shrinking.
I can imagine that if the free ratio is solely based on the young gen,
this could happen. Config 3 caused fewer GCs than config 2 (still
about 2x config 1). But the elapsed time was on par with config 1.
Yet, config 3 ended with the smallest heap size (~30 MB). Config 4
caused fewest GCs. It was the fastest (by a few %) but ended with the
largest heap size (> 300 MB).

Anyway, the results of config 2 and 3 are out of the ranges that Jon mentioned.

Hiroshi



More information about the hotspot-gc-dev mailing list