RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions
Paul Sandoz
psandoz at openjdk.java.net
Wed May 12 19:21:53 UTC 2021
On Wed, 12 May 2021 17:02:25 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs.
>
> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length.
>
> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted.
>
> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons.
>
> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes).
>
> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :-
>
> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
>
> JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline)
> -- | -- | -- | -- | -- | -- | --
> | | | | | |
> ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694
> ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217
> ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655
> ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427
> ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574
> ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586
> ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989
> ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055
> ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675
> ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846
> ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003
> ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904
> ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201
> ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545
> ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716
> ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504
> ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351
> ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825
> ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765
> ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451
> ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684
> ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993
> ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601
> ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864
> ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769
> ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088
> ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079
> ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655
> ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783
> ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767
> ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637
> ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014
> ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453
> ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121
> ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458
> ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004
> ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647
> ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794
> ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233
> ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061
> ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712
> ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992
> ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568
> ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937
> ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819
> ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078
> ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459
> ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318
> ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961
> ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831
> ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491
> ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111
> ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332
> ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648
> ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192
> ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122
> ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079
> ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232
> ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272
> ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574
> ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144
> ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023
> ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391
> ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916
> ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987
> ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903
> ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558
> ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049
> ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116
> ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362
> ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036
> ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753
> ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875
> ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162
> ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777
> ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885
> ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874
> ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809
> ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587
> ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508
> ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709
> ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321
> ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112
> ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343
> ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897
> ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602
> ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955
> ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514
> ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658
> ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289
> ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284
> ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716
> ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226
> ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545
> ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576
> ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657
> ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014
> ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682
> ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448
> ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539
> ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771
> ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021
> ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293
> ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319
> ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334
> ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468
> ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291
> ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806
> ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622
> ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475
> ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002
> ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613
> ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726
> ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381
> ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847
> ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407
> ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543
> ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222
> ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728
> ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831
> ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002
> ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382
> ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417
> ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065
> ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698
> ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232
> ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243
> ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012
> ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689
> ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966
> ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703
> ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778
> ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646
> ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511
> ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829
> ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583
> ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938
> ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815
> ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761
> ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547
> ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319
> ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397
> ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946
> ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334
> ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046
> ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895
> ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377
> ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941
> ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575
> ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208
> ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526
> ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464
> ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894
> ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768
> ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635
> ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911
> ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242
> ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858
> ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817
> ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324
> ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957
> ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143
> ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685
> ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645
> ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504
> ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355
> ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773
> ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059
> ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884
> ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306
> ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689
> ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911
> ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771
> ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464
> ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287
Is this optimization general for x86 platforms i.e. it is applicable AVX2/SSE in addition to AVX-512?
I notice there are some performance regressions in the data you presented. Do you know why?
The specification of `ArraysSupport.vectorizedMismatch` has changed to no longer return the bitwise compliment for a remaining tail to check. Did you encounter any performance issues that motivated the change?
I would prefer to leave the specification of `ArraysSupport.vectorizedMismatch` unchanged, even though the x86 implementation always returns a non-negative value. That gives other platforms flexibility, thus choosing to, or not to, add more complex optimizations like you are proposing i.e. i think the approach you are taking is biasing too much to one implementation.
Removal of the threshold check could result in performance regressions on various platforms, so potentially could the removal of the tail loop (and modifying the Unsafe implementation to check bytes).
I think we need to performance test small sizes, just below and above the current threshold, with and without the intrinsic disabled. Note that the Java code as written attempts to a delicate balance for cross platform in combination with an intrinsic, when enabled.
My general preference is to retain the existing specification and tail loops. To do that it may be necessary to add platform specific threshold values. Can we investigate whether you can achieve such performance when threshold values are set to zero on platforms that support partial inlining of vectorizedMismatch?
-------------
PR: https://git.openjdk.java.net/jdk/pull/3999
More information about the hotspot-compiler-dev
mailing list