RFR (S): Percentile levels in -Xlog:gc+stats

Mon Jan 9 10:48:08 UTC 2017

Sounds good in general.

Maybe instead of multiplying by 1000, measure more precisely?

Can't we have both max/SD and percentile stats? Or does it not make
sense at all to see max/sd?

Roman

Am Freitag, den 06.01.2017, 16:01 +0100 schrieb Aleksey Shipilev:
> Hi,
> 
> The non-normality in phase times make average times in our gc+stats
> log
> confusing. For example, can you trust this line?
> 
>  Concurrent Marking Times  = 18.18 s (avg =   142.02 ms)  (num
> =   128, ...
> 
> You can't, because there were two very different phases in workload
> lifetime:
> the initial burst of short concmarks when app is initializing, and
> then the
> steady state concmarks on stable LDS. To identify these cases in the
> stats, we
> are better off reporting the n-quantile levels to get the immediate
> "feel" of
> the distribution we are looking at.
> 
> Webrev:
>  http://cr.openjdk.java.net/~shade/shenandoah/stats-percentiles/webre
> v.01/
> 
> This is a full line in patched version:
> 
>  Concurrent Marking Times  = 18.18 s (avg =   142018 us)
>   (num =   128, lvls (10% step, us) =
>       787, 858, 960, 2660, 4440, 4830, 5830, 7880, 9600, 2533512)
> 
> Notice the distribution skew in levels.
> 
> This is the line that is more trustable:
> 
>   Concurrent Marking Times  = 15.16 s (avg =    63693 us)
>     (num =   238, lvls (10% step, us) =
>        291, 524,  615, 772, 1000, 1600, 186000, 197000, 199000,
> 228671)
> 
> And this looks very solid:
> 
>   Concurrent Marking Times   = 1.80 s (avg =   179735 us)
>     (num =    10, lvls (10% step, us) =
>        174000, 176000, 176000, 176000, 177000, 180000, 180000,
> 181000, ...
> 
> Switching to microseconds instead of milliseconds helps to get more
> fidelity in
> sub-ms pause times.
> 
> Testing: hotspot_gc_shenandoah, selected benchmarks
> 
> Thanks,
> -Aleksey
>