Math.round optimization, and round to int

Mon Jun 6 20:44:17 UTC 2016

>As Andrew's other comment notes, the performance details can be
>processor and platform specific so discerning small performance
>differences really needs to be data-driven.

Then here is some data for my configuration (Jdk8u92/Win7/Core i7 4770S):

// arg = 123456.789:
Benchmark                                          Mode   Samples        Score  Score error    Units
n.r.RoundMul.bench_roundJdk_double                thrpt         5 345791733,235 44921158,366    ops/s
n.r.RoundMul.bench_roundJdk_double_DONT_INLINE    thrpt         5 198139328,055 37295625,269    ops/s
n.r.RoundMul.bench_roundJdk_double_EXCLUDE        thrpt         5 28674938,029  2257234,498    ops/s
n.r.RoundMul.bench_roundJdk_double_INLINE         thrpt         5 387090734,860 111385980,336    ops/s
n.r.RoundMul.bench_roundMul_double                thrpt         5 358461230,357  4769420,930    ops/s
n.r.RoundMul.bench_roundMul_double_DONT_INLINE    thrpt         5 230565623,867  6904679,377    ops/s
n.r.RoundMul.bench_roundMul_double_EXCLUDE        thrpt         5 33027875,346   756618,102    ops/s
n.r.RoundMul.bench_roundMul_double_INLINE         thrpt         5 358131402,671  3077717,219    ops/s

// arg = -123456.789:
Benchmark                                          Mode   Samples        Score  Score error    Units
n.r.RoundMul.bench_roundJdk_double                thrpt         5 334887992,224 30791767,263    ops/s
n.r.RoundMul.bench_roundJdk_double_DONT_INLINE    thrpt         5 193664353,776 22112771,184    ops/s
n.r.RoundMul.bench_roundJdk_double_EXCLUDE        thrpt         5 29657900,088 11425756,457    ops/s
n.r.RoundMul.bench_roundJdk_double_INLINE         thrpt         5 391304965,549  9281466,086    ops/s
n.r.RoundMul.bench_roundMul_double                thrpt         5 358014997,332  5162810,933    ops/s
n.r.RoundMul.bench_roundMul_double_DONT_INLINE    thrpt         5 229850524,665  5632201,764    ops/s
n.r.RoundMul.bench_roundMul_double_EXCLUDE        thrpt         5 33221440,252  1037541,018    ops/s
n.r.RoundMul.bench_roundMul_double_INLINE         thrpt         5 358823098,125  6627448,955    ops/s

===>
With multiply it's faster when not inlined, but slower when inlined.
For some reason the score error is smaller with multiply.

-Jeff