RFR (S): 8223162: Increase ergonomics for Sparse PRT entry sizing

Fri May 24 13:24:32 UTC 2019

Hi all,

  can I have reviews for this small tweak for the ergonomic sizing of
Sparse PRT entries?

This change, instead of linearly increasing the number of entries per
sparse prt, increases it logarithmically like the region sizes do.

In effect, it increases the usage of sparse PRTs ("the data structure
holding remembered set entries for a particular region", "Per-Region-
Table"), and so decreases the usage of fine and coarse PRTs. Which is
more aligned with the intent of this first-level PRT.

The impact is that we decrease overall memory usage significantly for
heavy users of remembered sets (numbers follow), decrease the number of
coarsenings (and fine PRTs) that need to be traversed during GC,
increasing performance significantly as well.

In BigRAMTester, which somewhat resembles some of the large big data
applications, I have seen reductions of maximum remembered set size to
1/3rd of previous value.

E.g. BigRAMTester 20G heap: 

           Region Avg Pause Max Pause Avg RS size Max RS size 
          Size [MB] [ms] [ms] [kB] [kB] 
baseline 32M 727.8 1077 914029 1966080 
changes 32M 694.5 1082 621725 1186816 

I.e. with "tuned" region size of 32M (to only get 640 regions total) to
decrease remembered set overhead, this change decreases remembered set
size by ~35%. No impact on pause times though (A drop from ~10% of java
heap to 6%). There is no remembered set coarsening.

The situation changes if you let G1 ergonomically determine number of
regions. In this case there are 2560 regions, more than the coarsening
threshold.

           Region Avg Pause Max Pause Avg RS size Max RS size 
          Size [MB] [ms] [ms] [kB] [kB] 
baseline default 1045 1714 1532460 3028992 
changes default 731.2 1152 516574 926720 

G1 max pause time drops by 33%, and remembered set size by 66% :)
(using 8M regions, i.e. 32 entries instead of 16 in a sparse PRT).

(These improvements are in addition to the changes in JDK-8213108, but
should also apply without)

Note that in both cases the optimized value for the number of sparse
prt entries for this particular application would probably be a bit
higher. However I chose this heuristic (and the drop from 16 to 8
initial size of the hash table) to keep the impact on applications that
do not use many remembered sets low (i.e. in practice zero).

This heuristic could certainly be optimized (e.g. by sampling of actual
remembered sets which is easy in conjunction with JDK-8213108; make the
number of sparse prt entries dependent on heap size/#regions), but
given the plan to re-implement the remembered sets (JDK-8017163) where
we would redo that work, I think this heuristic is a good tradeoff
between effort and results.

CR:
https://bugs.openjdk.java.net/browse/JDK-8223162
Webrev:
http://cr.openjdk.java.net/~tschatzl/8223162/webrev/
Testing:
lots of remembered set size measurements on lots of applications, some
hs-tier1-5 runs together with other changes.

Thanks,
  Thomas