8234863: Increase default value of MaxInlineLevel
Claes Redestad
claes.redestad at oracle.com
Wed Feb 26 15:35:15 UTC 2020
Hi Laurent,
thanks for the added data point!
It would be interesting to chat some day about whether this MapBench of
yours could be integrated into the OpenJDK (either via code-tools, or
directly into OpenJDK). Regardless it does seem like something we should
track for regressions now that marlin has been integrated.
/Claes
On 2020-02-25 09:16, Laurent Bourgès wrote:
> Hi Claes,
>
> I quickly tested my MapBench tool dedicated to evaluate performance of
> the Marlin renderer (pure java code):
> *I confirm that JDK15 + MaxInlineLevel=15 is on par with MaxInlineLevel=9.*
>
> Java tuning: -Xms2g -Xmx2g -XX:+UseParallelGC -XX:+AlwaysPreTouch
> -XX:+UseTransparentHugePages
> JVM path
> /home/bourgesl/libs/graphics-rasterizer/jdk/client/build/linux-x86_64-server-release/jdk//bin//java
> Java version
> openjdk version "15-internal" 2020-09-15
> OpenJDK Runtime Environment (build 15-internal+0-adhoc.bourgesl.client)
> OpenJDK 64-Bit Server VM (build 15-internal+0-adhoc.bourgesl.client,
> mixed mode)
>
> - MaxInlineLevel=15 (new default value):
> Test Threads Ops Med Pct95 Avg
> StdDev Min Max FPS(med) TotalOps Scale [ms/op]
> CircleTests.ser 1 156 67.098 67.543 67.169
> 0.242 66.837 68.358 14.903 156 2.000
> EllipseTests-fill-false.ser 1 39 268.466 277.439
> 269.611 4.209 268.093 288.611 3.725 39 0.500
> EllipseTests-fill-true.ser 1 27 379.308 381.063
> 379.777 2.093 378.861 390.130 2.636 27 0.286
> dc_boulder_2013-13-30-06-13-17.ser 1 108 96.767 97.764 97.015
> 1.247 96.308 107.018 10.334 108 1.000
> dc_boulder_2013-13-30-06-13-20.ser 1 208 49.774 50.336 49.860
> 0.254 49.609 51.715 20.091 208 2.000
> dc_shp_alllayers_2013-00-30-07-00-43.ser 1 257 40.361 40.736 40.412
> 0.152 40.247 41.332 24.777 257 2.000
> dc_shp_alllayers_2013-00-30-07-00-47.ser 1 10 957.764 961.075
> 958.274 1.592 957.336 962.871 1.044 10 0.125
> dc_spearfish_2013-11-30-06-11-15.ser 1 774 13.456 13.539 13.489
> 0.565 13.363 27.049 74.318 774 10.000
> dc_spearfish_2013-11-30-06-11-19.ser 1 1583 6.580 6.669 6.599
> 0.053 6.549 8.056 151.973 1583 20.000
> dc_topp:states_2013-11-30-06-11-06.ser 1 866 12.120 12.180 12.135
> 0.558 12.042 28.499 82.509 866 10.000
> dc_topp:states_2013-11-30-06-11-07.ser 1 1323 7.882 7.944 7.875
> 0.038 7.820 8.051 126.870 1323 10.000
> spiralTest-dash-false.ser 1 10 1071.434 1098.847
> 1076.443 14.178 1070.302 1118.838 0.933 10 0.100
> spiralTest-fill.ser 1 11 885.414 887.847
> 885.815 1.289 883.984 887.964 1.129 11 0.125
> test_z_625k.ser 1 62 166.864 167.212
> 166.861 0.305 166.491 168.208 5.993 62 0.667
>
> Pct95 113.219
>
> - MaxInlineLevel=9 (explicitely set):
> Java tuning: -Xms2g -Xmx2g -XX:+UseParallelGC -XX:+AlwaysPreTouch
> -XX:+UseTransparentHugePages -XX:MaxInlineLevel=9
>
> Test Threads Ops Med Pct95 Avg
> StdDev Min Max FPS(med) TotalOps Scale [ms/op]
> CircleTests.ser 1 159 65.692 66.168 65.760
> 0.261 65.411 66.979 15.223 159 2.000
> EllipseTests-fill-false.ser 1 36 284.335 306.740
> 286.727 8.266 283.966 321.791 3.517 36 0.500
> EllipseTests-fill-true.ser 1 26 403.696 413.285
> 405.109 5.199 403.054 428.103 2.477 26 0.286
> dc_boulder_2013-13-30-06-13-17.ser 1 107 97.526 98.564 97.814
> 1.148 97.205 105.399 10.254 107 1.000
> dc_boulder_2013-13-30-06-13-20.ser 1 207 50.120 50.641 50.212
> 0.270 49.984 52.039 19.952 207 2.000
> dc_shp_alllayers_2013-00-30-07-00-43.ser 1 260 40.049 40.407 40.102
> 0.149 39.953 40.729 24.969 260 2.000
> dc_shp_alllayers_2013-00-30-07-00-47.ser 1 10 961.397 962.212
> 961.541 0.429 960.971 962.237 1.040 10 0.125
> dc_spearfish_2013-11-30-06-11-15.ser 1 773 13.468 13.533 13.502
> 0.665 13.375 30.025 74.252 773 10.000
> dc_spearfish_2013-11-30-06-11-19.ser 1 1574 6.581 6.674 6.601
> 0.037 6.551 6.751 151.952 1574 20.000
> dc_topp:states_2013-11-30-06-11-06.ser 1 858 12.168 12.227 12.166
> 0.038 12.086 12.297 82.184 858 10.000
> dc_topp:states_2013-11-30-06-11-07.ser 1 1305 7.977 8.041 7.971
> 0.039 7.914 8.159 125.364 1305 10.000
> spiralTest-dash-false.ser 1 10 1027.332 1054.641
> 1032.259 13.940 1026.278 1073.887 0.973 10 0.100
> spiralTest-fill.ser 1 12 832.926 834.088
> 832.634 1.041 830.578 834.128 1.201 12 0.125
> test_z_625k.ser 1 63 164.752 165.334
> 164.816 0.311 164.255 165.805 6.070 63 0.667
>
> Pct95 114.011
>
> Laurent
>
> Le mar. 25 févr. 2020 à 00:06, Claes Redestad <claes.redestad at oracle.com
> <mailto:claes.redestad at oracle.com>> a écrit :
>
> Hi,
>
> I was finally able to build the 1.3.9 version of the benchmark, but on
> my local system (Haswell Xeon) I can not reliably reproduce the
> regression.
>
> The run-to-run variation in this microbenchmark seem very high
> regardless of MaxInlineLevel value.
>
> For size 100000 run over 10 forks, I'm getting anything from a stable
> 170ops/s in one fork to 330 ops/s in another (MaxInlineLevel=15, jdk/jdk
> build), and exactly the same highs and lows for MaxInlineLevel=9, see
> attached logs for details[1]. Not how the result per fork settles in
> pretty quick, but at a more or less random level in the above range.
>
> I'm leaning toward "Not an issue". Maybe there's a reason it's not there
> anymore in the 3.x branch.
>
> /Claes
>
> On 2020-02-24 20:25, Jiva, Azeem wrote:
> > Claes,
> > As we discussed, this is an issue with the RxJava 1.3.x
> microbenchmarks and the delta is quite significant. Have you had a
> chance to dive into what's happening with the deeper inlining level?
> >
>
>
> [1]
>
> Run logs for MaxInlineLevel=15 followed by =9
>
> $ numactl --cpunodebind=0 $jdk/bin/java -jar
> build/libs/rxjava-1.3.9-SNAPSHOT-benchmarks.jar
> ComputationSchedulerPerf.*observeOn -f 10 -p size=100000 -wi 5 -w 2 -i
> 5 -r 2 -jvmArgs="-XX:MaxInlineLevel=15"
> # JMH 1.11.3 (released 1502 days ago, please consider updating!)
> # VM version: JDK 15-internal, VM
> 15-internal+0-2020-02-24-1146405.clredest...
> # VM invoker: build/linux-x64/images/jdk/bin/java
> # VM options: -XX:MaxInlineLevel=15
> # Warmup: 5 iterations, 2 s each
> # Measurement: 5 iterations, 2 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: rx.schedulers.ComputationSchedulerPerf.observeOn
> # Parameters: (size = 100000)
>
> # Run progress: 0.00% complete, ETA 00:03:20
> # Fork: 1 of 10
> # Warmup Iteration 1: 308.137 ops/s
> # Warmup Iteration 2: 377.802 ops/s
> # Warmup Iteration 3: 381.467 ops/s
> # Warmup Iteration 4: 350.456 ops/s
> # Warmup Iteration 5: 331.369 ops/s
> Iteration 1: 308.591 ops/s
> Iteration 2: 306.630 ops/s
> Iteration 3: 310.932 ops/s
> Iteration 4: 311.423 ops/s
> Iteration 5: 331.754 ops/s
>
> # Run progress: 10.00% complete, ETA 00:03:03
> # Fork: 2 of 10
> # Warmup Iteration 1: 247.713 ops/s
> # Warmup Iteration 2: 312.164 ops/s
> # Warmup Iteration 3: 273.882 ops/s
> # Warmup Iteration 4: 237.406 ops/s
> # Warmup Iteration 5: 234.439 ops/s
> Iteration 1: 231.938 ops/s
> Iteration 2: 225.449 ops/s
> Iteration 3: 232.026 ops/s
> Iteration 4: 221.079 ops/s
> Iteration 5: 241.194 ops/s
>
> # Run progress: 20.00% complete, ETA 00:02:43
> # Fork: 3 of 10
> # Warmup Iteration 1: 292.728 ops/s
> # Warmup Iteration 2: 314.710 ops/s
> # Warmup Iteration 3: 320.282 ops/s
> # Warmup Iteration 4: 322.557 ops/s
> # Warmup Iteration 5: 316.284 ops/s
> Iteration 1: 327.343 ops/s
> Iteration 2: 306.609 ops/s
> Iteration 3: 317.059 ops/s
> Iteration 4: 315.174 ops/s
> Iteration 5: 283.809 ops/s
>
> # Run progress: 30.00% complete, ETA 00:02:22
> # Fork: 4 of 10
> # Warmup Iteration 1: 150.172 ops/s
> # Warmup Iteration 2: 190.462 ops/s
> # Warmup Iteration 3: 197.778 ops/s
> # Warmup Iteration 4: 229.885 ops/s
> # Warmup Iteration 5: 199.934 ops/s
> Iteration 1: 208.933 ops/s
> Iteration 2: 203.446 ops/s
> Iteration 3: 182.926 ops/s
> Iteration 4: 198.219 ops/s
> Iteration 5: 189.674 ops/s
>
> # Run progress: 40.00% complete, ETA 00:02:02
> # Fork: 5 of 10
> # Warmup Iteration 1: 173.136 ops/s
> # Warmup Iteration 2: 197.542 ops/s
> # Warmup Iteration 3: 199.816 ops/s
> # Warmup Iteration 4: 183.061 ops/s
> # Warmup Iteration 5: 181.675 ops/s
> Iteration 1: 182.893 ops/s
> Iteration 2: 182.437 ops/s
> Iteration 3: 170.715 ops/s
> Iteration 4: 185.610 ops/s
> Iteration 5: 186.175 ops/s
>
> # Run progress: 50.00% complete, ETA 00:01:41
> # Fork: 6 of 10
> # Warmup Iteration 1: 253.295 ops/s
> # Warmup Iteration 2: 286.376 ops/s
> # Warmup Iteration 3: 248.205 ops/s
> # Warmup Iteration 4: 217.620 ops/s
> # Warmup Iteration 5: 226.234 ops/s
> Iteration 1: 218.880 ops/s
> Iteration 2: 219.477 ops/s
> Iteration 3: 215.964 ops/s
> Iteration 4: 225.023 ops/s
> Iteration 5: 236.793 ops/s
>
> # Run progress: 60.00% complete, ETA 00:01:21
> # Fork: 7 of 10
> # Warmup Iteration 1: 304.706 ops/s
> # Warmup Iteration 2: 334.557 ops/s
> # Warmup Iteration 3: 306.369 ops/s
> # Warmup Iteration 4: 314.033 ops/s
> # Warmup Iteration 5: 309.778 ops/s
> Iteration 1: 287.821 ops/s
> Iteration 2: 288.081 ops/s
> Iteration 3: 302.696 ops/s
> Iteration 4: 309.526 ops/s
> Iteration 5: 319.038 ops/s
>
> # Run progress: 70.00% complete, ETA 00:01:01
> # Fork: 8 of 10
> # Warmup Iteration 1: 187.162 ops/s
> # Warmup Iteration 2: 228.934 ops/s
> # Warmup Iteration 3: 185.007 ops/s
> # Warmup Iteration 4: 183.127 ops/s
> # Warmup Iteration 5: 183.610 ops/s
> Iteration 1: 174.896 ops/s
> Iteration 2: 177.716 ops/s
> Iteration 3: 162.185 ops/s
> Iteration 4: 175.696 ops/s
> Iteration 5: 175.483 ops/s
>
> # Run progress: 80.00% complete, ETA 00:00:40
> # Fork: 9 of 10
> # Warmup Iteration 1: 247.256 ops/s
> # Warmup Iteration 2: 344.854 ops/s
> # Warmup Iteration 3: 332.317 ops/s
> # Warmup Iteration 4: 326.852 ops/s
> # Warmup Iteration 5: 309.211 ops/s
> Iteration 1: 280.466 ops/s
> Iteration 2: 302.288 ops/s
> Iteration 3: 267.403 ops/s
> Iteration 4: 278.625 ops/s
> Iteration 5: 263.112 ops/s
>
> # Run progress: 90.00% complete, ETA 00:00:20
> # Fork: 10 of 10
> # Warmup Iteration 1: 158.022 ops/s
> # Warmup Iteration 2: 189.021 ops/s
> # Warmup Iteration 3: 188.279 ops/s
> # Warmup Iteration 4: 185.716 ops/s
> # Warmup Iteration 5: 185.880 ops/s
> Iteration 1: 186.384 ops/s
> Iteration 2: 186.226 ops/s
> Iteration 3: 183.449 ops/s
> Iteration 4: 173.699 ops/s
> Iteration 5: 184.808 ops/s
>
>
> Result "observeOn":
> 239.155 ±(99.9%) 27.340 ops/s [Average]
> (min, avg, max) = (162.185, 239.155, 331.754), stdev = 55.229
> CI (99.9%): [211.815, 266.496] (assumes normal distribution)
>
>
> # Run complete. Total time: 00:03:23
>
> Benchmark (size) Mode Cnt Score
> Error
> Units
> ComputationSchedulerPerf.observeOn 100000 thrpt 50 239.155 ±
> 27.340
> ops/s
>
> $ numactl --cpunodebind=0 $jdk/bin/java -jar
> build/libs/rxjava-1.3.9-SNAPSHOT-benchmarks.jar
> ComputationSchedulerPerf.*observeOn -f 10 -p size=100000 -wi 5 -w 2 -i
> 5 -r 2 -jvmArgs="-XX:MaxInlineLevel=9"
> # JMH 1.11.3 (released 1503 days ago, please consider updating!)
> # VM version: JDK 15-internal, VM
> 15-internal+0-2020-02-24-1146405.clredest...
> # VM invoker: build/linux-x64/images/jdk/bin/java
> # VM options:
> --add-exports=jdk.management/com.sun.management.internal=ALL-UNNAMED
> -XX:MaxInlineLevel=9
> # Warmup: 5 iterations, 2 s each
> # Measurement: 5 iterations, 2 s each
> # Timeout: 10 min per iteration
> # Threads: 1 thread, will synchronize iterations
> # Benchmark mode: Throughput, ops/time
> # Benchmark: rx.schedulers.ComputationSchedulerPerf.observeOn
> # Parameters: (size = 100000)
>
> # Run progress: 0.00% complete, ETA 00:03:20
> # Fork: 1 of 10
> # Warmup Iteration 1: 218.720 ops/s
> # Warmup Iteration 2: 259.219 ops/s
> # Warmup Iteration 3: 245.145 ops/s
> # Warmup Iteration 4: 244.944 ops/s
> # Warmup Iteration 5: 186.248 ops/s
> Iteration 1: 180.263 ops/s
> Iteration 2: 165.831 ops/s
> Iteration 3: 178.660 ops/s
> Iteration 4: 180.241 ops/s
> Iteration 5: 176.193 ops/s
>
> # Run progress: 10.00% complete, ETA 00:03:04
> # Fork: 2 of 10
> # Warmup Iteration 1: 267.636 ops/s
> # Warmup Iteration 2: 332.165 ops/s
> # Warmup Iteration 3: 325.828 ops/s
> # Warmup Iteration 4: 325.195 ops/s
> # Warmup Iteration 5: 333.456 ops/s
> Iteration 1: 331.260 ops/s
> Iteration 2: 310.459 ops/s
> Iteration 3: 335.419 ops/s
> Iteration 4: 328.600 ops/s
> Iteration 5: 303.829 ops/s
>
> # Run progress: 20.00% complete, ETA 00:02:43
> # Fork: 3 of 10
> # Warmup Iteration 1: 259.850 ops/s
> # Warmup Iteration 2: 255.872 ops/s
> # Warmup Iteration 3: 268.157 ops/s
> # Warmup Iteration 4: 254.214 ops/s
> # Warmup Iteration 5: 270.159 ops/s
> Iteration 1: 240.874 ops/s
> Iteration 2: 225.527 ops/s
> Iteration 3: 254.826 ops/s
> Iteration 4: 238.400 ops/s
> Iteration 5: 243.383 ops/s
>
> # Run progress: 30.00% complete, ETA 00:02:22
> # Fork: 4 of 10
> # Warmup Iteration 1: 188.073 ops/s
> # Warmup Iteration 2: 258.390 ops/s
> # Warmup Iteration 3: 224.278 ops/s
> # Warmup Iteration 4: 228.948 ops/s
> # Warmup Iteration 5: 218.226 ops/s
> Iteration 1: 191.827 ops/s
> Iteration 2: 181.651 ops/s
> Iteration 3: 173.203 ops/s
> Iteration 4: 183.152 ops/s
> Iteration 5: 176.889 ops/s
>
> # Run progress: 40.00% complete, ETA 00:02:02
> # Fork: 5 of 10
> # Warmup Iteration 1: 232.270 ops/s
> # Warmup Iteration 2: 258.410 ops/s
> # Warmup Iteration 3: 244.721 ops/s
> # Warmup Iteration 4: 238.275 ops/s
> # Warmup Iteration 5: 227.297 ops/s
> Iteration 1: 217.916 ops/s
> Iteration 2: 203.226 ops/s
> Iteration 3: 198.320 ops/s
> Iteration 4: 211.672 ops/s
> Iteration 5: 231.153 ops/s
>
> # Run progress: 50.00% complete, ETA 00:01:42
> # Fork: 6 of 10
> # Warmup Iteration 1: 154.199 ops/s
> # Warmup Iteration 2: 188.968 ops/s
> # Warmup Iteration 3: 200.437 ops/s
> # Warmup Iteration 4: 205.786 ops/s
> # Warmup Iteration 5: 193.097 ops/s
> Iteration 1: 191.951 ops/s
> Iteration 2: 190.814 ops/s
> Iteration 3: 182.418 ops/s
> Iteration 4: 183.914 ops/s
> Iteration 5: 189.912 ops/s
>
> # Run progress: 60.00% complete, ETA 00:01:21
> # Fork: 7 of 10
> # Warmup Iteration 1: 151.756 ops/s
> # Warmup Iteration 2: 184.493 ops/s
> # Warmup Iteration 3: 180.740 ops/s
> # Warmup Iteration 4: 181.102 ops/s
> # Warmup Iteration 5: 181.172 ops/s
> Iteration 1: 181.011 ops/s
> Iteration 2: 181.592 ops/s
> Iteration 3: 187.456 ops/s
> Iteration 4: 174.353 ops/s
> Iteration 5: 186.561 ops/s
>
> # Run progress: 70.00% complete, ETA 00:01:01
> # Fork: 8 of 10
> # Warmup Iteration 1: 187.211 ops/s
> # Warmup Iteration 2: 194.340 ops/s
> # Warmup Iteration 3: 210.136 ops/s
> # Warmup Iteration 4: 202.079 ops/s
> # Warmup Iteration 5: 198.685 ops/s
> Iteration 1: 207.574 ops/s
> Iteration 2: 221.047 ops/s
> Iteration 3: 199.842 ops/s
> Iteration 4: 184.712 ops/s
> Iteration 5: 192.552 ops/s
>
> # Run progress: 80.00% complete, ETA 00:00:40
> # Fork: 9 of 10
> # Warmup Iteration 1: 149.395 ops/s
> # Warmup Iteration 2: 179.125 ops/s
> # Warmup Iteration 3: 175.259 ops/s
> # Warmup Iteration 4: 177.971 ops/s
> # Warmup Iteration 5: 169.131 ops/s
> Iteration 1: 174.573 ops/s
> Iteration 2: 174.098 ops/s
> Iteration 3: 171.107 ops/s
> Iteration 4: 163.116 ops/s
> Iteration 5: 174.914 ops/s
>
> # Run progress: 90.00% complete, ETA 00:00:20
> # Fork: 10 of 10
> # Warmup Iteration 1: 284.148 ops/s
> # Warmup Iteration 2: 352.798 ops/s
> # Warmup Iteration 3: 306.848 ops/s
> # Warmup Iteration 4: 250.787 ops/s
> # Warmup Iteration 5: 266.872 ops/s
> Iteration 1: 270.604 ops/s
> Iteration 2: 272.181 ops/s
> Iteration 3: 271.800 ops/s
> Iteration 4: 269.214 ops/s
> Iteration 5: 271.461 ops/s
>
>
> Result "observeOn":
> 214.631 ±(99.9%) 23.679 ops/s [Average]
> (min, avg, max) = (163.116, 214.631, 335.419), stdev = 47.833
> CI (99.9%): [190.952, 238.310] (assumes normal distribution)
>
>
> # Run complete. Total time: 00:03:23
>
> Benchmark (size) Mode Cnt Score
> Error
> Units
> ComputationSchedulerPerf.observeOn 100000 thrpt 50 214.631 ±
> 23.679
> ops/s
>
>
>
> --
> --
> Laurent Bourgès
More information about the hotspot-compiler-dev
mailing list