RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions [v4]

Jatin Bhateja jbhateja at openjdk.java.net
Tue May 18 05:21:06 UTC 2021


> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs.
> 
> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length.
> 
> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted.
> 
> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons.
> 
> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes).
> 
> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :-
> 
> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
> 
> 
> 
> 
> 
> <meta name="ProgId" content="Excel.Sheet">
> <meta name="Generator" content="Microsoft Excel 15">
> <link id="Main-File" rel="Main-File" href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
> <link rel="File-List" href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> <style>
> 
> </style>
> 
> 
> 
> 
> 
> BENCHMARK | SIZE | Baseline (ops/ms) | PI32 (ops/ms) | Gain | PI64 (ops/ms) | Gain
> -- | -- | -- | -- | -- | -- | --
> ArraysMismatchPartialInlining.testByteMatch | 3 | 209915.663 | 209126.291 | 0.996239576 | 209073.888 | 0.995989937
> ArraysMismatchPartialInlining.testByteMatch | 4 | 157757.866 | 157763.787 | 1.000037532 | 157766.023 | 1.000051706
> ArraysMismatchPartialInlining.testByteMatch | 5 | 181182.854 | 180450.433 | 0.995957559 | 180465.978 | 0.996043356
> ArraysMismatchPartialInlining.testByteMatch | 6 | 146279.651 | 146276.69 | 0.999979758 | 146274.73 | 0.999966359
> ArraysMismatchPartialInlining.testByteMatch | 7 | 139099.287 | 137887.433 | 0.991287849 | 139159.131 | 1.000430225
> ArraysMismatchPartialInlining.testByteMatch | 15 | 127720.176 | 175732.078 | 1.375914781 | 169252.948 | 1.325185678
> ArraysMismatchPartialInlining.testByteMatch | 31 | 116472.861 | 176768.126 | 1.517676517 | 169773.326 | 1.457621325
> ArraysMismatchPartialInlining.testByteMatch | 63 | 104636.064 | 91564.893 | 0.875079676 | 160845.908 | 1.537193792
> ArraysMismatchPartialInlining.testByteMatch | 95 | 101099.48 | 89657.806 | 0.886827568 | 87334.192 | 0.863844127
> ArraysMismatchPartialInlining.testByteMatch | 800 | 45022.411 | 47905.179 | 1.064029623 | 47969.355 | 1.065455046
> ArraysMismatchPartialInlining.testCharMatch | 3 | 219405.496 | 219710.643 | 1.00139079 | 219242.048 | 0.999255041
> ArraysMismatchPartialInlining.testCharMatch | 4 | 170629.006 | 193121.02 | 1.131818233 | 182593.776 | 1.070121548
> ArraysMismatchPartialInlining.testCharMatch | 5 | 155518.733 | 169650.324 | 1.090867452 | 159963.097 | 1.028577676
> ArraysMismatchPartialInlining.testCharMatch | 6 | 154395.07 | 175616.979 | 1.137451986 | 147860.366 | 0.957675436
> ArraysMismatchPartialInlining.testCharMatch | 7 | 147630.171 | 168639.547 | 1.142310856 | 112467.214 | 0.761817271
> ArraysMismatchPartialInlining.testCharMatch | 15 | 130251.837 | 171755.645 | 1.318642784 | 159656.911 | 1.225755542
> ArraysMismatchPartialInlining.testCharMatch | 31 | 115510.532 | 106310.328 | 0.920351817 | 159957.379 | 1.384786099
> ArraysMismatchPartialInlining.testCharMatch | 63 | 96443.648 | 92545.364 | 0.959579671 | 92850.782 | 0.962746473
> ArraysMismatchPartialInlining.testCharMatch | 95 | 90001.485 | 81753.152 | 0.908353368 | 83890.742 | 0.932103976
> ArraysMismatchPartialInlining.testCharMatch | 800 | 22929.764 | 20699.791 | 0.902747669 | 22017.534 | 0.960216337
> ArraysMismatchPartialInlining.testDoubleMatch | 3 | 137422.911 | 134792.332 | 0.980857784 | 137047.846 | 0.997270724
> ArraysMismatchPartialInlining.testDoubleMatch | 4 | 140124.192 | 128321.199 | 0.915767628 | 128573.012 | 0.917564699
> ArraysMismatchPartialInlining.testDoubleMatch | 5 | 132385.81 | 132099.177 | 0.997834866 | 132337.729 | 0.999636812
> ArraysMismatchPartialInlining.testDoubleMatch | 6 | 122472.829 | 122301.343 | 0.998599804 | 122235.558 | 0.998062664
> ArraysMismatchPartialInlining.testDoubleMatch | 7 | 123867.736 | 123042.597 | 0.993338548 | 123060.617 | 0.993484026
> ArraysMismatchPartialInlining.testDoubleMatch | 15 | 102561.684 | 102697.933 | 1.001328459 | 100258.701 | 0.977545386
> ArraysMismatchPartialInlining.testDoubleMatch | 31 | 87019.261 | 87292.743 | 1.003142775 | 85003.323 | 0.976833428
> ArraysMismatchPartialInlining.testDoubleMatch | 63 | 62251.609 | 57261.214 | 0.919835084 | 62732.816 | 1.007730033
> ArraysMismatchPartialInlining.testDoubleMatch | 95 | 50885.381 | 48282.534 | 0.948848826 | 48533.009 | 0.953771163
> ArraysMismatchPartialInlining.testDoubleMatch | 800 | 7160.957 | 8209.345 | 1.146403337 | 7158.649 | 0.999677697
> ArraysMismatchPartialInlining.testFloatMatch | 3 | 144215.295 | 141572.656 | 0.981675737 | 117351.089 | 0.81372152
> ArraysMismatchPartialInlining.testFloatMatch | 4 | 149935.526 | 140116.547 | 0.934511992 | 138351.846 | 0.922742259
> ArraysMismatchPartialInlining.testFloatMatch | 5 | 134682.06 | 133892.853 | 0.994140222 | 139040.985 | 1.032364555
> ArraysMismatchPartialInlining.testFloatMatch | 6 | 139176.866 | 139452.984 | 1.001983936 | 158309.784 | 1.13747197
> ArraysMismatchPartialInlining.testFloatMatch | 7 | 127274.07 | 126137.824 | 0.991072447 | 146418.871 | 1.150421849
> ArraysMismatchPartialInlining.testFloatMatch | 15 | 115897.616 | 101808.969 | 0.878438854 | 108451.212 | 0.935750154
> ArraysMismatchPartialInlining.testFloatMatch | 31 | 96568.619 | 101492.986 | 1.05099345 | 88662.187 | 0.918126281
> ArraysMismatchPartialInlining.testFloatMatch | 63 | 75565.484 | 85526.546 | 1.131820263 | 74575.198 | 0.986894996
> ArraysMismatchPartialInlining.testFloatMatch | 95 | 69535.621 | 71823.072 | 1.032896104 | 64910.105 | 0.933479907
> ArraysMismatchPartialInlining.testFloatMatch | 800 | 13959.085 | 12768.069 | 0.914678075 | 12698.311 | 0.909680756
> ArraysMismatchPartialInlining.testIntMatch | 3 | 151925.753 | 152001.543 | 1.000498862 | 150351.321 | 0.989636833
> ArraysMismatchPartialInlining.testIntMatch | 4 | 151411.152 | 161021.852 | 1.063474188 | 152115.869 | 1.004654327
> ArraysMismatchPartialInlining.testIntMatch | 5 | 142305.114 | 134841.275 | 0.947550451 | 122718.584 | 0.862362431
> ArraysMismatchPartialInlining.testIntMatch | 6 | 144870.73 | 144186.562 | 0.99527739 | 166569.418 | 1.149779655
> ArraysMismatchPartialInlining.testIntMatch | 7 | 135132.736 | 131937.154 | 0.976352273 | 150670.855 | 1.114984122
> ArraysMismatchPartialInlining.testIntMatch | 15 | 118831.765 | 119947.806 | 1.009391773 | 161039.149 | 1.35518604
> ArraysMismatchPartialInlining.testIntMatch | 31 | 97247.157 | 95123.241 | 0.978159608 | 92586.255 | 0.952071586
> ArraysMismatchPartialInlining.testIntMatch | 63 | 78537.993 | 72904.05 | 0.928264744 | 72075.128 | 0.917710337
> ArraysMismatchPartialInlining.testIntMatch | 95 | 69356.234 | 69021.893 | 0.995179366 | 67435.202 | 0.972301956
> ArraysMismatchPartialInlining.testIntMatch | 800 | 14410.374 | 12715.733 | 0.882401317 | 12527.15 | 0.869314703
> ArraysMismatchPartialInlining.testLongMatch | 3 | 145434.777 | 147236.142 | 1.012386068 | 144269.34 | 0.991986532
> ArraysMismatchPartialInlining.testLongMatch | 4 | 149850.908 | 117182.939 | 0.781996857 | 116983.308 | 0.780664659
> ArraysMismatchPartialInlining.testLongMatch | 5 | 140694.62 | 141039.138 | 1.002448693 | 140721.407 | 1.000190391
> ArraysMismatchPartialInlining.testLongMatch | 6 | 136901.515 | 136215.609 | 0.994989785 | 136216.591 | 0.994996958
> ArraysMismatchPartialInlining.testLongMatch | 7 | 132233.847 | 131289.142 | 0.9928558 | 131315.326 | 0.993053813
> ArraysMismatchPartialInlining.testLongMatch | 15 | 108677.77 | 105050.548 | 0.966624067 | 108574.143 | 0.999046475
> ArraysMismatchPartialInlining.testLongMatch | 31 | 79476.103 | 79391.426 | 0.99893456 | 79519.006 | 1.000539823
> ArraysMismatchPartialInlining.testLongMatch | 63 | 58949.181 | 59102.766 | 1.00260538 | 59095.306 | 1.00247883
> ArraysMismatchPartialInlining.testLongMatch | 95 | 49438.419 | 49422.93 | 0.999686701 | 49390.033 | 0.999021287
> ArraysMismatchPartialInlining.testLongMatch | 800 | 7195.783 | 7201.554 | 1.000801998 | 7186.757 | 0.998745654
> ArraysMismatchPartialInlining.testShortMatch | 3 | 219642.309 | 219414.684 | 0.998963656 | 219760.127 | 1.000536408
> ArraysMismatchPartialInlining.testShortMatch | 4 | 169235.371 | 193907.437 | 1.145785517 | 170667.561 | 1.008462711
> ArraysMismatchPartialInlining.testShortMatch | 5 | 155537.852 | 147014.758 | 0.945202445 | 116770.798 | 0.750754858
> ArraysMismatchPartialInlining.testShortMatch | 6 | 155059.272 | 173756.546 | 1.120581464 | 152323.759 | 0.982358275
> ArraysMismatchPartialInlining.testShortMatch | 7 | 147370.359 | 154934.348 | 1.051326393 | 138398.19 | 0.939118225
> ArraysMismatchPartialInlining.testShortMatch | 15 | 130353.196 | 171653.208 | 1.316831603 | 160047.047 | 1.227795343
> ArraysMismatchPartialInlining.testShortMatch | 31 | 118458.443 | 106239.301 | 0.896848703 | 159726.936 | 1.348379499
> ArraysMismatchPartialInlining.testShortMatch | 63 | 97519.691 | 91591.145 | 0.939206678 | 91847.817 | 0.94183868
> ArraysMismatchPartialInlining.testShortMatch | 95 | 90818.111 | 77626.093 | 0.854742431 | 77653.086 | 0.855039652
> ArraysMismatchPartialInlining.testShortMatch | 800 | 21382.8 | 22841.791 | 1.06823199 | 22683.388 | 1.060824027

Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:

  8266951: Removing the changes to existing benchmark since a separate benchmark has been added to partial in-lining.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3999/files
  - new: https://git.openjdk.java.net/jdk/pull/3999/files/1070ab55..946e997a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3999&range=02-03

  Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3999.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3999/head:pull/3999

PR: https://git.openjdk.java.net/jdk/pull/3999


More information about the hotspot-compiler-dev mailing list