8234863: Increase default value of MaxInlineLevel

Laurent Bourgès bourges.laurent at gmail.com
Tue Feb 25 08:16:02 UTC 2020


Hi Claes,

I quickly tested my MapBench tool dedicated to evaluate performance of the
Marlin renderer (pure java code):
* I confirm that JDK15 + MaxInlineLevel=15 is on par with MaxInlineLevel=9.*

Java tuning:  -Xms2g  -Xmx2g -XX:+UseParallelGC -XX:+AlwaysPreTouch
-XX:+UseTransparentHugePages
JVM path
/home/bourgesl/libs/graphics-rasterizer/jdk/client/build/linux-x86_64-server-release/jdk//bin//java
Java version
openjdk version "15-internal" 2020-09-15
OpenJDK Runtime Environment (build 15-internal+0-adhoc.bourgesl.client)
OpenJDK 64-Bit Server VM (build 15-internal+0-adhoc.bourgesl.client, mixed
mode)

- MaxInlineLevel=15 (new default value):
Test                                         Threads Ops Med Pct95 Avg
StdDev Min Max FPS(med) TotalOps Scale [ms/op]
CircleTests.ser                               1 156 67.098 67.543 67.169
0.242 66.837 68.358 14.903 156 2.000
EllipseTests-fill-false.ser                   1 39 268.466 277.439 269.611
4.209 268.093 288.611 3.725 39 0.500
EllipseTests-fill-true.ser                   1 27 379.308 381.063 379.777
2.093 378.861 390.130 2.636 27 0.286
dc_boulder_2013-13-30-06-13-17.ser           1 108 96.767 97.764 97.015
1.247 96.308 107.018 10.334 108 1.000
dc_boulder_2013-13-30-06-13-20.ser           1 208 49.774 50.336 49.860
0.254 49.609 51.715 20.091 208 2.000
dc_shp_alllayers_2013-00-30-07-00-43.ser     1 257 40.361 40.736 40.412
0.152 40.247 41.332 24.777 257 2.000
dc_shp_alllayers_2013-00-30-07-00-47.ser     1 10 957.764 961.075 958.274
1.592 957.336 962.871 1.044 10 0.125
dc_spearfish_2013-11-30-06-11-15.ser         1 774 13.456 13.539 13.489
0.565 13.363 27.049 74.318 774 10.000
dc_spearfish_2013-11-30-06-11-19.ser         1 1583 6.580 6.669 6.599 0.053
6.549 8.056 151.973 1583 20.000
dc_topp:states_2013-11-30-06-11-06.ser       1 866 12.120 12.180 12.135
0.558 12.042 28.499 82.509 866 10.000
dc_topp:states_2013-11-30-06-11-07.ser       1 1323 7.882 7.944 7.875 0.038
7.820 8.051 126.870 1323 10.000
spiralTest-dash-false.ser                     1 10 1071.434 1098.847
1076.443 14.178 1070.302 1118.838 0.933 10 0.100
spiralTest-fill.ser                           1 11 885.414 887.847 885.815
1.289 883.984 887.964 1.129 11 0.125
test_z_625k.ser                               1 62 166.864 167.212 166.861
0.305 166.491 168.208 5.993 62 0.667

Pct95 113.219

- MaxInlineLevel=9 (explicitely set):
Java tuning:  -Xms2g  -Xmx2g -XX:+UseParallelGC -XX:+AlwaysPreTouch
-XX:+UseTransparentHugePages -XX:MaxInlineLevel=9

Test                                         Threads Ops Med Pct95 Avg
StdDev Min Max FPS(med) TotalOps Scale [ms/op]
CircleTests.ser                               1 159 65.692 66.168 65.760
0.261 65.411 66.979 15.223 159 2.000
EllipseTests-fill-false.ser                   1 36 284.335 306.740 286.727
8.266 283.966 321.791 3.517 36 0.500
EllipseTests-fill-true.ser                   1 26 403.696 413.285 405.109
5.199 403.054 428.103 2.477 26 0.286
dc_boulder_2013-13-30-06-13-17.ser           1 107 97.526 98.564 97.814
1.148 97.205 105.399 10.254 107 1.000
dc_boulder_2013-13-30-06-13-20.ser           1 207 50.120 50.641 50.212
0.270 49.984 52.039 19.952 207 2.000
dc_shp_alllayers_2013-00-30-07-00-43.ser     1 260 40.049 40.407 40.102
0.149 39.953 40.729 24.969 260 2.000
dc_shp_alllayers_2013-00-30-07-00-47.ser     1 10 961.397 962.212 961.541
0.429 960.971 962.237 1.040 10 0.125
dc_spearfish_2013-11-30-06-11-15.ser         1 773 13.468 13.533 13.502
0.665 13.375 30.025 74.252 773 10.000
dc_spearfish_2013-11-30-06-11-19.ser         1 1574 6.581 6.674 6.601 0.037
6.551 6.751 151.952 1574 20.000
dc_topp:states_2013-11-30-06-11-06.ser       1 858 12.168 12.227 12.166
0.038 12.086 12.297 82.184 858 10.000
dc_topp:states_2013-11-30-06-11-07.ser       1 1305 7.977 8.041 7.971 0.039
7.914 8.159 125.364 1305 10.000
spiralTest-dash-false.ser                     1 10 1027.332 1054.641
1032.259 13.940 1026.278 1073.887 0.973 10 0.100
spiralTest-fill.ser                           1 12 832.926 834.088 832.634
1.041 830.578 834.128 1.201 12 0.125
test_z_625k.ser                               1 63 164.752 165.334 164.816
0.311 164.255 165.805 6.070 63 0.667

Pct95 114.011

Laurent

Le mar. 25 févr. 2020 à 00:06, Claes Redestad <claes.redestad at oracle.com> a
écrit :

> Hi,
>
> I was finally able to build the 1.3.9 version of the benchmark, but on
> my local system (Haswell Xeon) I can not reliably reproduce the
> regression.
>
> The run-to-run variation in this microbenchmark seem very high
> regardless of MaxInlineLevel value.
>
> For size 100000 run over 10 forks, I'm getting anything from a stable
> 170ops/s in one fork to 330 ops/s in another (MaxInlineLevel=15, jdk/jdk
> build), and exactly the same highs and lows for MaxInlineLevel=9, see
> attached logs for details[1]. Not how the result per fork settles in
> pretty quick, but at a more or less random level in the above range.
>
> I'm leaning toward "Not an issue". Maybe there's a reason it's not there
> anymore in the 3.x branch.
>
> /Claes
>
> On 2020-02-24 20:25, Jiva, Azeem wrote:
> > Claes,
> >    As we discussed, this is an issue with the RxJava 1.3.x
> microbenchmarks and the delta is quite significant.  Have you had a chance
> to dive into what's happening with the deeper inlining level?
> >
>
>
> [1]
>
> Run logs for MaxInlineLevel=15 followed by =9
>
> $ numactl --cpunodebind=0 $jdk/bin/java -jar
> build/libs/rxjava-1.3.9-SNAPSHOT-benchmarks.jar
> ComputationSchedulerPerf.*observeOn -f 10  -p size=100000 -wi 5 -w 2 -i
> 5 -r 2 -jvmArgs="-XX:MaxInlineLevel=15"
> # JMH 1.11.3 (released 1502 days ago, please consider updating!)
> # VM version: JDK 15-internal, VM
> 15-internal+0-2020-02-24-1146405.clredest...
> # VM invoker: build/linux-x64/images/jdk/bin/java
> # VM options: -XX:MaxInlineLevel=15
> # Warmup: 5 iterations, 2 s each
> # Measurement: 5 iterations, 2 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: rx.schedulers.ComputationSchedulerPerf.observeOn
> # Parameters: (size = 100000)
>
> # Run progress: 0.00% complete, ETA 00:03:20
> # Fork: 1 of 10
> # Warmup Iteration   1: 308.137 ops/s
> # Warmup Iteration   2: 377.802 ops/s
> # Warmup Iteration   3: 381.467 ops/s
> # Warmup Iteration   4: 350.456 ops/s
> # Warmup Iteration   5: 331.369 ops/s
> Iteration   1: 308.591 ops/s
> Iteration   2: 306.630 ops/s
> Iteration   3: 310.932 ops/s
> Iteration   4: 311.423 ops/s
> Iteration   5: 331.754 ops/s
>
> # Run progress: 10.00% complete, ETA 00:03:03
> # Fork: 2 of 10
> # Warmup Iteration   1: 247.713 ops/s
> # Warmup Iteration   2: 312.164 ops/s
> # Warmup Iteration   3: 273.882 ops/s
> # Warmup Iteration   4: 237.406 ops/s
> # Warmup Iteration   5: 234.439 ops/s
> Iteration   1: 231.938 ops/s
> Iteration   2: 225.449 ops/s
> Iteration   3: 232.026 ops/s
> Iteration   4: 221.079 ops/s
> Iteration   5: 241.194 ops/s
>
> # Run progress: 20.00% complete, ETA 00:02:43
> # Fork: 3 of 10
> # Warmup Iteration   1: 292.728 ops/s
> # Warmup Iteration   2: 314.710 ops/s
> # Warmup Iteration   3: 320.282 ops/s
> # Warmup Iteration   4: 322.557 ops/s
> # Warmup Iteration   5: 316.284 ops/s
> Iteration   1: 327.343 ops/s
> Iteration   2: 306.609 ops/s
> Iteration   3: 317.059 ops/s
> Iteration   4: 315.174 ops/s
> Iteration   5: 283.809 ops/s
>
> # Run progress: 30.00% complete, ETA 00:02:22
> # Fork: 4 of 10
> # Warmup Iteration   1: 150.172 ops/s
> # Warmup Iteration   2: 190.462 ops/s
> # Warmup Iteration   3: 197.778 ops/s
> # Warmup Iteration   4: 229.885 ops/s
> # Warmup Iteration   5: 199.934 ops/s
> Iteration   1: 208.933 ops/s
> Iteration   2: 203.446 ops/s
> Iteration   3: 182.926 ops/s
> Iteration   4: 198.219 ops/s
> Iteration   5: 189.674 ops/s
>
> # Run progress: 40.00% complete, ETA 00:02:02
> # Fork: 5 of 10
> # Warmup Iteration   1: 173.136 ops/s
> # Warmup Iteration   2: 197.542 ops/s
> # Warmup Iteration   3: 199.816 ops/s
> # Warmup Iteration   4: 183.061 ops/s
> # Warmup Iteration   5: 181.675 ops/s
> Iteration   1: 182.893 ops/s
> Iteration   2: 182.437 ops/s
> Iteration   3: 170.715 ops/s
> Iteration   4: 185.610 ops/s
> Iteration   5: 186.175 ops/s
>
> # Run progress: 50.00% complete, ETA 00:01:41
> # Fork: 6 of 10
> # Warmup Iteration   1: 253.295 ops/s
> # Warmup Iteration   2: 286.376 ops/s
> # Warmup Iteration   3: 248.205 ops/s
> # Warmup Iteration   4: 217.620 ops/s
> # Warmup Iteration   5: 226.234 ops/s
> Iteration   1: 218.880 ops/s
> Iteration   2: 219.477 ops/s
> Iteration   3: 215.964 ops/s
> Iteration   4: 225.023 ops/s
> Iteration   5: 236.793 ops/s
>
> # Run progress: 60.00% complete, ETA 00:01:21
> # Fork: 7 of 10
> # Warmup Iteration   1: 304.706 ops/s
> # Warmup Iteration   2: 334.557 ops/s
> # Warmup Iteration   3: 306.369 ops/s
> # Warmup Iteration   4: 314.033 ops/s
> # Warmup Iteration   5: 309.778 ops/s
> Iteration   1: 287.821 ops/s
> Iteration   2: 288.081 ops/s
> Iteration   3: 302.696 ops/s
> Iteration   4: 309.526 ops/s
> Iteration   5: 319.038 ops/s
>
> # Run progress: 70.00% complete, ETA 00:01:01
> # Fork: 8 of 10
> # Warmup Iteration   1: 187.162 ops/s
> # Warmup Iteration   2: 228.934 ops/s
> # Warmup Iteration   3: 185.007 ops/s
> # Warmup Iteration   4: 183.127 ops/s
> # Warmup Iteration   5: 183.610 ops/s
> Iteration   1: 174.896 ops/s
> Iteration   2: 177.716 ops/s
> Iteration   3: 162.185 ops/s
> Iteration   4: 175.696 ops/s
> Iteration   5: 175.483 ops/s
>
> # Run progress: 80.00% complete, ETA 00:00:40
> # Fork: 9 of 10
> # Warmup Iteration   1: 247.256 ops/s
> # Warmup Iteration   2: 344.854 ops/s
> # Warmup Iteration   3: 332.317 ops/s
> # Warmup Iteration   4: 326.852 ops/s
> # Warmup Iteration   5: 309.211 ops/s
> Iteration   1: 280.466 ops/s
> Iteration   2: 302.288 ops/s
> Iteration   3: 267.403 ops/s
> Iteration   4: 278.625 ops/s
> Iteration   5: 263.112 ops/s
>
> # Run progress: 90.00% complete, ETA 00:00:20
> # Fork: 10 of 10
> # Warmup Iteration   1: 158.022 ops/s
> # Warmup Iteration   2: 189.021 ops/s
> # Warmup Iteration   3: 188.279 ops/s
> # Warmup Iteration   4: 185.716 ops/s
> # Warmup Iteration   5: 185.880 ops/s
> Iteration   1: 186.384 ops/s
> Iteration   2: 186.226 ops/s
> Iteration   3: 183.449 ops/s
> Iteration   4: 173.699 ops/s
> Iteration   5: 184.808 ops/s
>
>
> Result "observeOn":
>    239.155 ±(99.9%) 27.340 ops/s [Average]
>    (min, avg, max) = (162.185, 239.155, 331.754), stdev = 55.229
>    CI (99.9%): [211.815, 266.496] (assumes normal distribution)
>
>
> # Run complete. Total time: 00:03:23
>
> Benchmark                           (size)   Mode  Cnt    Score    Error
>   Units
> ComputationSchedulerPerf.observeOn  100000  thrpt   50  239.155 ± 27.340
>   ops/s
>
> $ numactl --cpunodebind=0 $jdk/bin/java -jar
> build/libs/rxjava-1.3.9-SNAPSHOT-benchmarks.jar
> ComputationSchedulerPerf.*observeOn -f 10  -p size=100000 -wi 5 -w 2 -i
> 5 -r 2 -jvmArgs="-XX:MaxInlineLevel=9"
> # JMH 1.11.3 (released 1503 days ago, please consider updating!)
> # VM version: JDK 15-internal, VM
> 15-internal+0-2020-02-24-1146405.clredest...
> # VM invoker: build/linux-x64/images/jdk/bin/java
> # VM options:
> --add-exports=jdk.management/com.sun.management.internal=ALL-UNNAMED
> -XX:MaxInlineLevel=9
> # Warmup: 5 iterations, 2 s each
> # Measurement: 5 iterations, 2 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: rx.schedulers.ComputationSchedulerPerf.observeOn
> # Parameters: (size = 100000)
>
> # Run progress: 0.00% complete, ETA 00:03:20
> # Fork: 1 of 10
> # Warmup Iteration   1: 218.720 ops/s
> # Warmup Iteration   2: 259.219 ops/s
> # Warmup Iteration   3: 245.145 ops/s
> # Warmup Iteration   4: 244.944 ops/s
> # Warmup Iteration   5: 186.248 ops/s
> Iteration   1: 180.263 ops/s
> Iteration   2: 165.831 ops/s
> Iteration   3: 178.660 ops/s
> Iteration   4: 180.241 ops/s
> Iteration   5: 176.193 ops/s
>
> # Run progress: 10.00% complete, ETA 00:03:04
> # Fork: 2 of 10
> # Warmup Iteration   1: 267.636 ops/s
> # Warmup Iteration   2: 332.165 ops/s
> # Warmup Iteration   3: 325.828 ops/s
> # Warmup Iteration   4: 325.195 ops/s
> # Warmup Iteration   5: 333.456 ops/s
> Iteration   1: 331.260 ops/s
> Iteration   2: 310.459 ops/s
> Iteration   3: 335.419 ops/s
> Iteration   4: 328.600 ops/s
> Iteration   5: 303.829 ops/s
>
> # Run progress: 20.00% complete, ETA 00:02:43
> # Fork: 3 of 10
> # Warmup Iteration   1: 259.850 ops/s
> # Warmup Iteration   2: 255.872 ops/s
> # Warmup Iteration   3: 268.157 ops/s
> # Warmup Iteration   4: 254.214 ops/s
> # Warmup Iteration   5: 270.159 ops/s
> Iteration   1: 240.874 ops/s
> Iteration   2: 225.527 ops/s
> Iteration   3: 254.826 ops/s
> Iteration   4: 238.400 ops/s
> Iteration   5: 243.383 ops/s
>
> # Run progress: 30.00% complete, ETA 00:02:22
> # Fork: 4 of 10
> # Warmup Iteration   1: 188.073 ops/s
> # Warmup Iteration   2: 258.390 ops/s
> # Warmup Iteration   3: 224.278 ops/s
> # Warmup Iteration   4: 228.948 ops/s
> # Warmup Iteration   5: 218.226 ops/s
> Iteration   1: 191.827 ops/s
> Iteration   2: 181.651 ops/s
> Iteration   3: 173.203 ops/s
> Iteration   4: 183.152 ops/s
> Iteration   5: 176.889 ops/s
>
> # Run progress: 40.00% complete, ETA 00:02:02
> # Fork: 5 of 10
> # Warmup Iteration   1: 232.270 ops/s
> # Warmup Iteration   2: 258.410 ops/s
> # Warmup Iteration   3: 244.721 ops/s
> # Warmup Iteration   4: 238.275 ops/s
> # Warmup Iteration   5: 227.297 ops/s
> Iteration   1: 217.916 ops/s
> Iteration   2: 203.226 ops/s
> Iteration   3: 198.320 ops/s
> Iteration   4: 211.672 ops/s
> Iteration   5: 231.153 ops/s
>
> # Run progress: 50.00% complete, ETA 00:01:42
> # Fork: 6 of 10
> # Warmup Iteration   1: 154.199 ops/s
> # Warmup Iteration   2: 188.968 ops/s
> # Warmup Iteration   3: 200.437 ops/s
> # Warmup Iteration   4: 205.786 ops/s
> # Warmup Iteration   5: 193.097 ops/s
> Iteration   1: 191.951 ops/s
> Iteration   2: 190.814 ops/s
> Iteration   3: 182.418 ops/s
> Iteration   4: 183.914 ops/s
> Iteration   5: 189.912 ops/s
>
> # Run progress: 60.00% complete, ETA 00:01:21
> # Fork: 7 of 10
> # Warmup Iteration   1: 151.756 ops/s
> # Warmup Iteration   2: 184.493 ops/s
> # Warmup Iteration   3: 180.740 ops/s
> # Warmup Iteration   4: 181.102 ops/s
> # Warmup Iteration   5: 181.172 ops/s
> Iteration   1: 181.011 ops/s
> Iteration   2: 181.592 ops/s
> Iteration   3: 187.456 ops/s
> Iteration   4: 174.353 ops/s
> Iteration   5: 186.561 ops/s
>
> # Run progress: 70.00% complete, ETA 00:01:01
> # Fork: 8 of 10
> # Warmup Iteration   1: 187.211 ops/s
> # Warmup Iteration   2: 194.340 ops/s
> # Warmup Iteration   3: 210.136 ops/s
> # Warmup Iteration   4: 202.079 ops/s
> # Warmup Iteration   5: 198.685 ops/s
> Iteration   1: 207.574 ops/s
> Iteration   2: 221.047 ops/s
> Iteration   3: 199.842 ops/s
> Iteration   4: 184.712 ops/s
> Iteration   5: 192.552 ops/s
>
> # Run progress: 80.00% complete, ETA 00:00:40
> # Fork: 9 of 10
> # Warmup Iteration   1: 149.395 ops/s
> # Warmup Iteration   2: 179.125 ops/s
> # Warmup Iteration   3: 175.259 ops/s
> # Warmup Iteration   4: 177.971 ops/s
> # Warmup Iteration   5: 169.131 ops/s
> Iteration   1: 174.573 ops/s
> Iteration   2: 174.098 ops/s
> Iteration   3: 171.107 ops/s
> Iteration   4: 163.116 ops/s
> Iteration   5: 174.914 ops/s
>
> # Run progress: 90.00% complete, ETA 00:00:20
> # Fork: 10 of 10
> # Warmup Iteration   1: 284.148 ops/s
> # Warmup Iteration   2: 352.798 ops/s
> # Warmup Iteration   3: 306.848 ops/s
> # Warmup Iteration   4: 250.787 ops/s
> # Warmup Iteration   5: 266.872 ops/s
> Iteration   1: 270.604 ops/s
> Iteration   2: 272.181 ops/s
> Iteration   3: 271.800 ops/s
> Iteration   4: 269.214 ops/s
> Iteration   5: 271.461 ops/s
>
>
> Result "observeOn":
>    214.631 ±(99.9%) 23.679 ops/s [Average]
>    (min, avg, max) = (163.116, 214.631, 335.419), stdev = 47.833
>    CI (99.9%): [190.952, 238.310] (assumes normal distribution)
>
>
> # Run complete. Total time: 00:03:23
>
> Benchmark                           (size)   Mode  Cnt    Score    Error
>   Units
> ComputationSchedulerPerf.observeOn  100000  thrpt   50  214.631 ± 23.679
>   ops/s
>


-- 
-- 
Laurent Bourgès


More information about the hotspot-compiler-dev mailing list