RFR: 8266951: Partial in-lining for vectorized mismatch operation using AVX512 masked instructions
Jatin Bhateja
jbhateja at openjdk.java.net
Fri May 14 11:37:08 UTC 2021
On Thu, 13 May 2021 16:18:06 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:
>> ArraySupport.vectorizedMismatch is a leaf level comparison routine which gets called by various public Java APIs (Arrays.equals, Arrays.mismatch). Hotspot C2 compiler intrinsifies vectorizedMismatch routine and emits a call to a stub routine which uses vector instruction to compare the inputs.
>>
>> For small compare operation whose size fits in one vector register i.e. < 32 bytes or <= 64 bytes, this patch employ partial in-lining technique to emit the fast path code at the call site which does vector comparison under the influence of a predicate register/mask computed as a function of comparison length.
>>
>> If the length of comparison is greater than the vector register size then the slow path comprising of stub call is emitted.
>>
>> This prevents the call overhead associated with stub call which is significant compared to actual comparison operation for small sized comparisons.
>>
>> Partial in-lining works under the influence of a run time flag -XX:UsePartialInlineSize=32/64 (default 32 bytes).
>>
>> Following are performance number for an existing JMH benchmark (test/micro/org/openjdk/bench/java/util//ArrayMismatch.java) :-
>>
>> Machine : Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
>>
>> JMH Benchmark | Size | BaseLine (ops/ms) | PI32 (ops/ms) | Gain (PI32/Baseline) | PI64 (ops/ms) | Gain (PI64/Baseline)
>> -- | -- | -- | -- | -- | -- | --
>> | | | | | |
>> ArraysMismatch.Byte.differentSubrangeMatches | 16 | 129196.612 | 165376.715 | 1.2800391 | 157553.42 | 1.219485694
>> ArraysMismatch.Byte.differentSubrangeMatches | 32 | 125583.404 | 163645.759 | 1.303084275 | 157645.879 | 1.255308217
>> ArraysMismatch.Byte.differentSubrangeMatches | 64 | 121969.731 | 170648.152 | 1.399102471 | 157993.449 | 1.295349655
>> ArraysMismatch.Byte.differentSubrangeMatches | 90 | 91819.571 | 96154.479 | 1.047211155 | 157983.324 | 1.720584427
>> ArraysMismatch.Byte.differentSubrangeMatches | 800 | 65236.047 | 67243.131 | 1.030766487 | 67759.48 | 1.038681574
>> ArraysMismatch.Byte.matches | 16 | 151805.68 | 203802.717 | 1.342523659 | 188334.618 | 1.240629586
>> ArraysMismatch.Byte.matches | 32 | 151624.747 | 203731.315 | 1.343654773 | 185719.086 | 1.224859989
>> ArraysMismatch.Byte.matches | 64 | 138350.648 | 124158.139 | 0.897416389 | 188935.388 | 1.365627055
>> ArraysMismatch.Byte.matches | 90 | 102366.983 | 101474.688 | 0.991283371 | 100674.414 | 0.983465675
>> ArraysMismatch.Byte.matches | 800 | 46319.352 | 49585.514 | 1.070513983 | 49594.262 | 1.070702846
>> ArraysMismatch.Byte.mismatchEnd | 16 | 162382.057 | 191602.366 | 1.179947893 | 182425.362 | 1.123433003
>> ArraysMismatch.Byte.mismatchEnd | 32 | 146656.702 | 193510.637 | 1.319480354 | 182571.741 | 1.244891904
>> ArraysMismatch.Byte.mismatchEnd | 64 | 140799.385 | 122505.816 | 0.870073516 | 182360.435 | 1.295179201
>> ArraysMismatch.Byte.mismatchEnd | 90 | 117439.002 | 107296.27 | 0.913634041 | 108081.174 | 0.920317545
>> ArraysMismatch.Byte.mismatchEnd | 800 | 47542.975 | 47456.106 | 0.998172832 | 47289.082 | 0.994659716
>> ArraysMismatch.Byte.mismatchMid | 16 | 143112.591 | 189653.41 | 1.325204223 | 182411.81 | 1.274603504
>> ArraysMismatch.Byte.mismatchMid | 32 | 151759.608 | 193712.64 | 1.276443993 | 182689.18 | 1.203806351
>> ArraysMismatch.Byte.mismatchMid | 64 | 140756.035 | 122017.013 | 0.866868785 | 182508.473 | 1.296629825
>> ArraysMismatch.Byte.mismatchMid | 90 | 134230.235 | 122213.804 | 0.910478954 | 122566.133 | 0.913103765
>> ArraysMismatch.Byte.mismatchMid | 800 | 75512.985 | 64861.716 | 0.858947849 | 71607.794 | 0.94828451
>> ArraysMismatch.Byte.mismatchStart | 16 | 160628.501 | 193722.299 | 1.206026937 | 183190.972 | 1.140463684
>> ArraysMismatch.Byte.mismatchStart | 32 | 151629.56 | 193633.36 | 1.277015906 | 183230.666 | 1.20840993
>> ArraysMismatch.Byte.mismatchStart | 64 | 143345.272 | 130754.305 | 0.91216336 | 181837.864 | 1.268530601
>> ArraysMismatch.Byte.mismatchStart | 90 | 151557.205 | 130724.926 | 0.86254511 | 130962.682 | 0.864113864
>> ArraysMismatch.Byte.mismatchStart | 800 | 149416.06 | 130847.301 | 0.875724477 | 130952.683 | 0.876429769
>> ArraysMismatch.Char.differentSubrangeMatches | 16 | 124936.905 | 152375.103 | 1.219616438 | 146062.997 | 1.169094088
>> ArraysMismatch.Char.differentSubrangeMatches | 32 | 118878.291 | 158770.285 | 1.33557005 | 146561.488 | 1.232870079
>> ArraysMismatch.Char.differentSubrangeMatches | 64 | 110296.975 | 104885.041 | 0.95093307 | 146102.313 | 1.324626655
>> ArraysMismatch.Char.differentSubrangeMatches | 90 | 88056.395 | 90133.489 | 1.023588224 | 87883.169 | 0.998032783
>> ArraysMismatch.Char.differentSubrangeMatches | 800 | 41319.787 | 46257.464 | 1.119499091 | 46090.56 | 1.115459767
>> ArraysMismatch.Char.matches | 16 | 150428.182 | 197311.356 | 1.311664832 | 187199.805 | 1.24444637
>> ArraysMismatch.Char.matches | 32 | 132718.181 | 126373.231 | 0.952192307 | 187008.811 | 1.409067014
>> ArraysMismatch.Char.matches | 64 | 111659.84 | 107182.982 | 0.959906283 | 109772.951 | 0.983101453
>> ArraysMismatch.Char.matches | 90 | 86184.209 | 91977.05 | 1.067214645 | 90389.147 | 1.048790121
>> ArraysMismatch.Char.matches | 800 | 26332.084 | 25284.001 | 0.960197491 | 25855.38 | 0.981896458
>> ArraysMismatch.Char.mismatchEnd | 16 | 148547.251 | 189151.018 | 1.273339067 | 179675.328 | 1.209550004
>> ArraysMismatch.Char.mismatchEnd | 32 | 138219.785 | 119017.203 | 0.861072118 | 178701.685 | 1.292880647
>> ArraysMismatch.Char.mismatchEnd | 64 | 110435.452 | 103940.023 | 0.94118348 | 102078.889 | 0.924330794
>> ArraysMismatch.Char.mismatchEnd | 90 | 89375.63 | 87698.736 | 0.981237682 | 88037.787 | 0.985031233
>> ArraysMismatch.Char.mismatchEnd | 800 | 23632.584 | 22963.757 | 0.971698948 | 20497.605 | 0.867345061
>> ArraysMismatch.Char.mismatchMid | 16 | 148666.26 | 189258.721 | 1.273044207 | 178820.938 | 1.202834712
>> ArraysMismatch.Char.mismatchMid | 32 | 131949.59 | 119320.489 | 0.904288441 | 178579.245 | 1.35338992
>> ArraysMismatch.Char.mismatchMid | 64 | 122148.315 | 111033.597 | 0.909006375 | 109455.953 | 0.896090568
>> ArraysMismatch.Char.mismatchMid | 90 | 125032.714 | 109837.581 | 0.878470742 | 110283.097 | 0.882033937
>> ArraysMismatch.Char.mismatchMid | 800 | 42255.059 | 48153.688 | 1.139595806 | 43087.476 | 1.019699819
>> ArraysMismatch.Char.mismatchStart | 16 | 148493.976 | 189247.176 | 1.274443456 | 178915.503 | 1.204867078
>> ArraysMismatch.Char.mismatchStart | 32 | 148724.462 | 126724.721 | 0.852077186 | 178887.041 | 1.202808459
>> ArraysMismatch.Char.mismatchStart | 64 | 148635.338 | 126716.274 | 0.852531274 | 126747.94 | 0.852744318
>> ArraysMismatch.Char.mismatchStart | 90 | 140359.351 | 126708.588 | 0.902744186 | 125618.245 | 0.894975961
>> ArraysMismatch.Char.mismatchStart | 800 | 144649.46 | 125727.381 | 0.86918666 | 126664.011 | 0.875661831
>> ArraysMismatch.Double.differentSubrangeMatches | 16 | 116255.827 | 116156.952 | 0.999149505 | 116557.568 | 1.002595491
>> ArraysMismatch.Double.differentSubrangeMatches | 32 | 91940.498 | 97299.205 | 1.058284511 | 97466.224 | 1.06010111
>> ArraysMismatch.Double.differentSubrangeMatches | 64 | 78205.807 | 78189.378 | 0.999789926 | 78133.649 | 0.999077332
>> ArraysMismatch.Double.differentSubrangeMatches | 90 | 61330.454 | 68798.235 | 1.121763015 | 68524.188 | 1.117294648
>> ArraysMismatch.Double.differentSubrangeMatches | 800 | 14996.315 | 14979.647 | 0.998888527 | 15072.825 | 1.00510192
>> ArraysMismatch.Double.matches | 16 | 119342.024 | 120322.671 | 1.008217114 | 119531.315 | 1.001586122
>> ArraysMismatch.Double.matches | 32 | 88179.448 | 89069.505 | 1.010093701 | 88141.626 | 0.999571079
>> ArraysMismatch.Double.matches | 64 | 62622.253 | 62433.512 | 0.996986039 | 63041.774 | 1.006699232
>> ArraysMismatch.Double.matches | 90 | 49579.305 | 50632.739 | 1.021247454 | 46548.486 | 0.938869272
>> ArraysMismatch.Double.matches | 800 | 8850.013 | 8505.296 | 0.961048984 | 8490.327 | 0.959357574
>> ArraysMismatch.Double.mismatchEnd | 16 | 116594.224 | 119025.382 | 1.020851445 | 116310.567 | 0.997567144
>> ArraysMismatch.Double.mismatchEnd | 32 | 86183.542 | 86814.706 | 1.007323486 | 86258.696 | 1.000872023
>> ArraysMismatch.Double.mismatchEnd | 64 | 62695.058 | 62794.552 | 1.001586951 | 62769 | 1.001179391
>> ArraysMismatch.Double.mismatchEnd | 90 | 46899.021 | 47692.984 | 1.016929202 | 47598.715 | 1.01491916
>> ArraysMismatch.Double.mismatchEnd | 800 | 8132.64 | 8141.465 | 1.001085133 | 7176.583 | 0.882441987
>> ArraysMismatch.Double.mismatchMid | 16 | 110505.284 | 113732.521 | 1.029204368 | 113249.451 | 1.024832903
>> ArraysMismatch.Double.mismatchMid | 32 | 94259.439 | 93242.776 | 0.989214205 | 94420.206 | 1.00170558
>> ArraysMismatch.Double.mismatchMid | 64 | 76392.603 | 76344.962 | 0.999376366 | 76369.689 | 0.999700049
>> ArraysMismatch.Double.mismatchMid | 90 | 71578.538 | 71637.235 | 1.000820036 | 71582.34 | 1.000053116
>> ArraysMismatch.Double.mismatchMid | 800 | 14993.414 | 12701.251 | 0.84712201 | 14998.937 | 1.000368362
>> ArraysMismatch.Double.mismatchStart | 16 | 141188.616 | 141430.91 | 1.001716102 | 141517.873 | 1.002332036
>> ArraysMismatch.Double.mismatchStart | 32 | 141489.906 | 139633.297 | 0.986878152 | 141729.555 | 1.001693753
>> ArraysMismatch.Double.mismatchStart | 64 | 141502.44 | 139656.902 | 0.986957554 | 141488.272 | 0.999899875
>> ArraysMismatch.Double.mismatchStart | 90 | 141782.57 | 141508.142 | 0.998064445 | 141579.135 | 0.998565162
>> ArraysMismatch.Double.mismatchStart | 800 | 144565.191 | 139525.413 | 0.965138371 | 144607.95 | 1.000295777
>> ArraysMismatch.Float.differentSubrangeMatches | 16 | 120041.868 | 119986.512 | 0.999538861 | 120009.683 | 0.999731885
>> ArraysMismatch.Float.differentSubrangeMatches | 32 | 111402.873 | 111414.633 | 1.000105563 | 111442.964 | 1.000359874
>> ArraysMismatch.Float.differentSubrangeMatches | 64 | 85388.728 | 93884.13 | 1.099490907 | 95120.892 | 1.113974809
>> ArraysMismatch.Float.differentSubrangeMatches | 90 | 67617.865 | 75865.226 | 1.121970148 | 76179.814 | 1.126622587
>> ArraysMismatch.Float.differentSubrangeMatches | 800 | 24994.376 | 25011.775 | 1.000696117 | 24944.2 | 0.997992508
>> ArraysMismatch.Float.matches | 16 | 133159.39 | 137937.688 | 1.035884048 | 139461.652 | 1.047328709
>> ArraysMismatch.Float.matches | 32 | 111959.987 | 115420.6 | 1.030909373 | 117002.141 | 1.045035321
>> ArraysMismatch.Float.matches | 64 | 86892.65 | 87395.62 | 1.005788407 | 87345.458 | 1.00521112
>> ArraysMismatch.Float.matches | 90 | 67690.279 | 69156.772 | 1.02166475 | 69082.962 | 1.020574343
>> ArraysMismatch.Float.matches | 800 | 14894.94 | 15341.034 | 1.029949365 | 15779.117 | 1.059360897
>> ArraysMismatch.Float.mismatchEnd | 16 | 128854.048 | 128925.913 | 1.000557724 | 128985.299 | 1.001018602
>> ArraysMismatch.Float.mismatchEnd | 32 | 99825.842 | 104613.873 | 1.047963843 | 103876.271 | 1.040574955
>> ArraysMismatch.Float.mismatchEnd | 64 | 80190.706 | 84665.053 | 1.055796329 | 84582.712 | 1.054769514
>> ArraysMismatch.Float.mismatchEnd | 90 | 71406.594 | 76730.083 | 1.074551784 | 76596.258 | 1.072677658
>> ArraysMismatch.Float.mismatchEnd | 800 | 14348.159 | 14306.535 | 0.997099001 | 14360.603 | 1.000867289
>> ArraysMismatch.Float.mismatchMid | 16 | 123753.791 | 124291.601 | 1.004345806 | 123649.378 | 0.999156284
>> ArraysMismatch.Float.mismatchMid | 32 | 109105.215 | 111447.183 | 1.021465225 | 111494.37 | 1.021897716
>> ArraysMismatch.Float.mismatchMid | 64 | 93600.363 | 93741.993 | 1.001513135 | 93658.042 | 1.000616226
>> ArraysMismatch.Float.mismatchMid | 90 | 89991.128 | 89712.471 | 0.996903506 | 90031.763 | 1.000451545
>> ArraysMismatch.Float.mismatchMid | 800 | 23974.331 | 24301.075 | 1.01362891 | 24354.29 | 1.015848576
>> ArraysMismatch.Float.mismatchStart | 16 | 140889.393 | 140535.617 | 0.997488981 | 140222.656 | 0.995267657
>> ArraysMismatch.Float.mismatchStart | 32 | 140871.915 | 140318.765 | 0.996073383 | 140242.783 | 0.995534014
>> ArraysMismatch.Float.mismatchStart | 64 | 141197.313 | 140413.639 | 0.994449795 | 140792.879 | 0.997135682
>> ArraysMismatch.Float.mismatchStart | 90 | 139663.079 | 139775.065 | 1.00080183 | 143880.133 | 1.03019448
>> ArraysMismatch.Float.mismatchStart | 800 | 143930.882 | 143878.412 | 0.99963545 | 143923.022 | 0.99994539
>> ArraysMismatch.Int.differentSubrangeMatches | 16 | 110820.026 | 130943.67 | 1.181588515 | 131076.904 | 1.182790771
>> ArraysMismatch.Int.differentSubrangeMatches | 32 | 111706.868 | 121119.544 | 1.084262285 | 122049.921 | 1.092591021
>> ArraysMismatch.Int.differentSubrangeMatches | 64 | 93916.026 | 101624.789 | 1.082081444 | 100103.617 | 1.065884293
>> ArraysMismatch.Int.differentSubrangeMatches | 90 | 67478.955 | 83517.957 | 1.237688951 | 83549.562 | 1.238157319
>> ArraysMismatch.Int.differentSubrangeMatches | 800 | 24920.868 | 25100.838 | 1.007221659 | 25376.679 | 1.018290334
>> ArraysMismatch.Int.matches | 16 | 138004.078 | 142579.711 | 1.033155781 | 143465.516 | 1.039574468
>> ArraysMismatch.Int.matches | 32 | 111790.949 | 119018.169 | 1.06464942 | 119864.971 | 1.072224291
>> ArraysMismatch.Int.matches | 64 | 86997.004 | 88476.088 | 1.017001551 | 87755.688 | 1.008720806
>> ArraysMismatch.Int.matches | 90 | 69366.581 | 71427.315 | 1.029707879 | 71203.035 | 1.026474622
>> ArraysMismatch.Int.matches | 800 | 15119.02 | 15529.095 | 1.02712312 | 15828.336 | 1.046915475
>> ArraysMismatch.Int.mismatchEnd | 16 | 139862.143 | 135639.435 | 0.96980807 | 135661.244 | 0.969964002
>> ArraysMismatch.Int.mismatchEnd | 32 | 114870.328 | 115455.901 | 1.005097687 | 114992.965 | 1.001067613
>> ArraysMismatch.Int.mismatchEnd | 64 | 85291.637 | 85115.665 | 0.99793682 | 85179.114 | 0.998680726
>> ArraysMismatch.Int.mismatchEnd | 90 | 73049.868 | 78798.949 | 1.078700772 | 73365.106 | 1.004315381
>> ArraysMismatch.Int.mismatchEnd | 800 | 14597.509 | 12861.87 | 0.88110033 | 12845.178 | 0.879956847
>> ArraysMismatch.Int.mismatchMid | 16 | 131615.489 | 134691.219 | 1.023369058 | 134503.225 | 1.0219407
>> ArraysMismatch.Int.mismatchMid | 32 | 119291.19 | 121970.431 | 1.022459672 | 120647.357 | 1.011368543
>> ArraysMismatch.Int.mismatchMid | 64 | 100133.019 | 99827.03 | 0.996944175 | 98327.743 | 0.981971222
>> ArraysMismatch.Int.mismatchMid | 90 | 93062.689 | 95269.725 | 1.023715584 | 95457.632 | 1.025734728
>> ArraysMismatch.Int.mismatchMid | 800 | 24614.985 | 20853.102 | 0.847171022 | 20857.528 | 0.847350831
>> ArraysMismatch.Int.mismatchStart | 16 | 140229.222 | 147607.561 | 1.052616273 | 146278.15 | 1.043136002
>> ArraysMismatch.Int.mismatchStart | 32 | 140354.53 | 147448.421 | 1.050542658 | 146287.931 | 1.042274382
>> ArraysMismatch.Int.mismatchStart | 64 | 140256.12 | 147353.466 | 1.050602754 | 146094.059 | 1.041623417
>> ArraysMismatch.Int.mismatchStart | 90 | 135753.229 | 151205.439 | 1.113825727 | 152070.776 | 1.120200065
>> ArraysMismatch.Int.mismatchStart | 800 | 151565.887 | 145991.819 | 0.963223466 | 152020.842 | 1.003001698
>> ArraysMismatch.Long.differentSubrangeMatches | 16 | 125569.009 | 121469.175 | 0.967349953 | 121319.155 | 0.966155232
>> ArraysMismatch.Long.differentSubrangeMatches | 32 | 100126.557 | 103303.047 | 1.03172475 | 101476.788 | 1.013485243
>> ArraysMismatch.Long.differentSubrangeMatches | 64 | 80870.342 | 82334.336 | 1.018102978 | 82395.962 | 1.018865012
>> ArraysMismatch.Long.differentSubrangeMatches | 90 | 70673.831 | 72440.193 | 1.024993155 | 72067.497 | 1.019719689
>> ArraysMismatch.Long.differentSubrangeMatches | 800 | 15224.864 | 15077.429 | 0.99031617 | 15163.827 | 0.995990966
>> ArraysMismatch.Long.matches | 16 | 119857.871 | 123784.673 | 1.032762154 | 122968.267 | 1.025950703
>> ArraysMismatch.Long.matches | 32 | 88284.162 | 90825.719 | 1.028788369 | 91303.549 | 1.034200778
>> ArraysMismatch.Long.matches | 64 | 62827.102 | 63614.876 | 1.012538761 | 64469.82 | 1.026146646
>> ArraysMismatch.Long.matches | 90 | 49351.299 | 51199.947 | 1.037458953 | 51103.813 | 1.035511
>> ArraysMismatch.Long.matches | 800 | 8822.867 | 8512.064 | 0.964773015 | 8848.35 | 1.00288829
>> ArraysMismatch.Long.mismatchEnd | 16 | 124902.804 | 128237.911 | 1.026701618 | 128410.897 | 1.028086583
>> ArraysMismatch.Long.mismatchEnd | 32 | 86728.545 | 90519.608 | 1.043711825 | 88782.445 | 1.023681938
>> ArraysMismatch.Long.mismatchEnd | 64 | 64431.36 | 62735.702 | 0.973682722 | 64766.52 | 1.005201815
>> ArraysMismatch.Long.mismatchEnd | 90 | 47764.996 | 47635.982 | 0.997298984 | 47562.461 | 0.995759761
>> ArraysMismatch.Long.mismatchEnd | 800 | 8124.901 | 7194.444 | 0.88548082 | 7197.163 | 0.88581547
>> ArraysMismatch.Long.mismatchMid | 16 | 122857.442 | 121708.317 | 0.99064668 | 121071.994 | 0.985467319
>> ArraysMismatch.Long.mismatchMid | 32 | 99406.603 | 99376.972 | 0.999701921 | 97379.046 | 0.979603397
>> ArraysMismatch.Long.mismatchMid | 64 | 78596.148 | 76559.205 | 0.974083425 | 76538.811 | 0.973823946
>> ArraysMismatch.Long.mismatchMid | 90 | 74253.699 | 73267.252 | 0.98671518 | 74874.856 | 1.008365334
>> ArraysMismatch.Long.mismatchMid | 800 | 12739.526 | 12773.563 | 1.002671763 | 15215.721 | 1.194371046
>> ArraysMismatch.Long.mismatchStart | 16 | 143429.003 | 147610.51 | 1.029153846 | 146953.182 | 1.024570895
>> ArraysMismatch.Long.mismatchStart | 32 | 149771.413 | 149898.955 | 1.000851578 | 147743.864 | 0.986462377
>> ArraysMismatch.Long.mismatchStart | 64 | 149812.094 | 147738.977 | 0.986161885 | 147818.236 | 0.986690941
>> ArraysMismatch.Long.mismatchStart | 90 | 149834.855 | 147878.978 | 0.986946448 | 149768.864 | 0.999559575
>> ArraysMismatch.Long.mismatchStart | 800 | 150266.332 | 147175.353 | 0.979429996 | 153305.049 | 1.020222208
>> ArraysMismatch.Short.differentSubrangeMatches | 16 | 124956.808 | 152398.079 | 1.21960605 | 146222.898 | 1.170187526
>> ArraysMismatch.Short.differentSubrangeMatches | 32 | 118644.114 | 158832.405 | 1.338729749 | 146589.485 | 1.235539464
>> ArraysMismatch.Short.differentSubrangeMatches | 64 | 111036.197 | 106078.375 | 0.955349497 | 146122.18 | 1.315986894
>> ArraysMismatch.Short.differentSubrangeMatches | 90 | 79114.347 | 90244.347 | 1.140682448 | 91059.171 | 1.150981768
>> ArraysMismatch.Short.differentSubrangeMatches | 800 | 44794.065 | 46302.944 | 1.033684797 | 46086.671 | 1.028856635
>> ArraysMismatch.Short.matches | 16 | 150201.123 | 193264.21 | 1.28670283 | 185129.029 | 1.232540911
>> ArraysMismatch.Short.matches | 32 | 137672.122 | 126543.04 | 0.919162414 | 187187.586 | 1.359662242
>> ArraysMismatch.Short.matches | 64 | 113952.11 | 110124.025 | 0.966406195 | 109228.551 | 0.958547858
>> ArraysMismatch.Short.matches | 90 | 89491.351 | 91045.251 | 1.017363689 | 90362.175 | 1.009730817
>> ArraysMismatch.Short.matches | 800 | 25941.449 | 25887.28 | 0.997911875 | 25191.983 | 0.971109324
>> ArraysMismatch.Short.mismatchEnd | 16 | 142494.648 | 189203.368 | 1.327792802 | 176318.454 | 1.237368957
>> ArraysMismatch.Short.mismatchEnd | 32 | 139928.97 | 119098.052 | 0.851132199 | 178840.438 | 1.278080143
>> ArraysMismatch.Short.mismatchEnd | 64 | 115583.3 | 104264.811 | 0.902075049 | 102376.369 | 0.885736685
>> ArraysMismatch.Short.mismatchEnd | 90 | 86641.922 | 87669.462 | 1.011859617 | 87745.796 | 1.012740645
>> ArraysMismatch.Short.mismatchEnd | 800 | 23741.295 | 22911.558 | 0.965050895 | 22937.297 | 0.96613504
>> ArraysMismatch.Short.mismatchMid | 16 | 148684.747 | 189160.851 | 1.272227682 | 178776.065 | 1.202383355
>> ArraysMismatch.Short.mismatchMid | 32 | 133281.625 | 118690.88 | 0.890526957 | 178478.46 | 1.339107773
>> ArraysMismatch.Short.mismatchMid | 64 | 122399.072 | 110333.504 | 0.901424351 | 111504.705 | 0.910993059
>> ArraysMismatch.Short.mismatchMid | 90 | 119317.633 | 110483.29 | 0.925959451 | 111346.724 | 0.933195884
>> ArraysMismatch.Short.mismatchMid | 800 | 50742.831 | 43058.305 | 0.848559376 | 47917.118 | 0.94431306
>> ArraysMismatch.Short.mismatchStart | 16 | 148861.935 | 191984.933 | 1.289684519 | 178706.176 | 1.200482689
>> ArraysMismatch.Short.mismatchStart | 32 | 148701.043 | 126690.118 | 0.851978678 | 178702.06 | 1.201753911
>> ArraysMismatch.Short.mismatchStart | 64 | 148560.877 | 126747.337 | 0.853167668 | 126657.473 | 0.852562771
>> ArraysMismatch.Short.mismatchStart | 90 | 149824.411 | 126605.818 | 0.845027971 | 125719.231 | 0.839110464
>> ArraysMismatch.Short.mismatchStart | 800 | 152583.036 | 126437.329 | 0.828646043 | 126698.741 | 0.830359287
>
> Thanks for the explanations on why partial inlining can be beneficial. Ideally it would be great if the only changes we made to the Java code were to the threshold values.
>
> For example:
>
> public static int mismatch(byte[] a,
> byte[] b,
> int length) {
> // ISSUE: defer to index receiving methods if performance is good
> // assert length <= a.length
> // assert length <= b.length
>
> int i = 0;
> if (length > BYTE_THRESHOLD) {
> if (a[0] != b[0])
> return 0;
> i = vectorizedMismatch(
> a, Unsafe.ARRAY_BYTE_BASE_OFFSET,
> b, Unsafe.ARRAY_BYTE_BASE_OFFSET,
> length, LOG2_ARRAY_BYTE_INDEX_SCALE);
> if (i >= 0)
> return i;
> // Align to tail
> i = length - ~i;
> // assert i >= 0 && i <= 7;
> }
> // Tail < 8 bytes
> for (; i < length; i++) {
> if (a[i] != b[i])
> return i;
> }
> return -1;
> }
>
>
> Where `BYTE_THRESHOLD` is initialized to 7 or 0, based on querying some HotSpot runtime property. When `BYTE_THRESHOLD == 0` i hope the `length > BYTE_THRESHOLD` check is strength reduced in many cases.
>
> That does leave the `i >= 0` check of the result from `vectorizedMismatch`, perhaps that also has some minor impact? However, maybe since you are doing partial inlining and you know that your `vectorizedMismatch` intrinsic never returns a -ve value you could elide that check?
>
> A quick experiment would be to apply your HotSpot changes and use the existing Java code, replacing the constant threshold values with 0. The we can carefully look at the code gen and perf results.
Hi @PaulSandoz , I have reinstated the tail handling in java to avoid any impact on other targets. Update performance numbers still show gains for small comparison sized upto -XX:UsePartialInlineSize. Thus patch now does not changes existing java implementation of VectorizedMismatch.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3999
More information about the hotspot-compiler-dev
mailing list