RFR: 8256973: Intrinsic creation for VectorMask query (lastTrue, firstTrue, trueCount) APIs
Jatin Bhateja
jbhateja at openjdk.java.net
Fri May 7 14:38:39 UTC 2021
This patch intrinsifies following mask query APIs using optimal instruction sequence for X86 target.
1) VectorMask.firstTrue.
2) VectorMask.lastTrue.
3) VectorMask.trueCount.
Current implementations of above APIs iterates over the underlined boolean array encapsulated in a mask instance to ascertain the count/position index of true bits.
X86 AVX2 and AVX512 targets offers direct instructions to populate the masks held in the byte vector to a GP or an opmask register there by accelerating further querying.
Intrinsification is not performed for vector species containing less than two vector lanes.
Please find below the performance number for benchmark included in the patch:
Machine: Cascade Lake server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz 28C)
VectorMask.trueCount | VECTOR SIZE | ALGO | BASELINE AVX3 | WITH OPT AVX3 | GAIN
-- | -- | -- | -- | -- | --
MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 1 | 338396.436 | 362711.622 | 1.071854143
MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 2 | 205477.472 | 362668.035 | 1.765001445
MaskQueryOperationsBenchmark.testFirstTrueByte | 128 | 3 | 185613.377 | 362518.206 | 1.953082326
MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 1 | 338522.114 | 328751.231 | 0.971136648
MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 2 | 148825.341 | 328783.35 | 2.209189294
MaskQueryOperationsBenchmark.testFirstTrueByte | 256 | 3 | 200854.856 | 328784.24 | 1.636924526
MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 1 | 338551.089 | 319908.361 | 0.944933782
MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 2 | 116338.756 | 320026.839 | 2.750818816
MaskQueryOperationsBenchmark.testFirstTrueByte | 512 | 3 | 200871.692 | 320008.208 | 1.593097588
MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 1 | 338489.157 | 190221.57 | 0.561972418
MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 2 | 205140.903 | 362387.766 | 1.766531007
MaskQueryOperationsBenchmark.testFirstTrueInt | 128 | 3 | 185508.994 | 362566.265 | 1.95444036
MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 1 | 338403.999 | 328829.751 | 0.971707639
MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 2 | 148988.857 | 328835.479 | 2.207114583
MaskQueryOperationsBenchmark.testFirstTrueInt | 256 | 3 | 200815.907 | 328778.266 | 1.637212265
MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 1 | 338462.403 | 328796.84 | 0.971442728
MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 2 | 116355.623 | 328811.386 | 2.825917455
MaskQueryOperationsBenchmark.testFirstTrueInt | 512 | 3 | 200856.08 | 328773.859 | 1.636862867
MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 1 | 338451.783 | 204432.394 | 0.60402221
MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 2 | 204443.049 | 155670.633 | 0.761437641
MaskQueryOperationsBenchmark.testFirstTrueLong | 128 | 3 | 207254.769 | 155672.842 | 0.751118263
MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 1 | 338520.255 | 328789.176 | 0.971254072
MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 2 | 205883.123 | 328742.103 | 1.596741385
MaskQueryOperationsBenchmark.testFirstTrueLong | 256 | 3 | 185519.176 | 328733.537 | 1.771965271
MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 1 | 338605.11 | 328694.935 | 0.970732353
MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 2 | 148444.7 | 328352.346 | 2.211950619
MaskQueryOperationsBenchmark.testFirstTrueLong | 512 | 3 | 200884.874 | 328814.376 | 1.636829939
MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 1 | 338529.326 | 362293.877 | 1.070199387
MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 2 | 204676.583 | 362428.992 | 1.770739899
MaskQueryOperationsBenchmark.testFirstTrueShort | 128 | 3 | 185495.663 | 362422.835 | 1.953807594
MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 1 | 338533.82 | 328635.479 | 0.970761146
MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 2 | 148822.446 | 328803.55 | 2.209368001
MaskQueryOperationsBenchmark.testFirstTrueShort | 256 | 3 | 200752.028 | 328805.974 | 1.637871245
MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 1 | 338464.548 | 320054.91 | 0.945608371
MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 2 | 116329.063 | 328763.508 | 2.826151088
MaskQueryOperationsBenchmark.testFirstTrueShort | 512 | 3 | 199971.049 | 328819.066 | 1.644333355
MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 1 | 325618.244 | 337629.441 | 1.036887359
MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 2 | 197655.729 | 337544.012 | 1.707737052
MaskQueryOperationsBenchmark.testLastTrueByte | 128 | 3 | 325600.645 | 337256.796 | 1.035798919
MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 1 | 325677.144 | 308312.588 | 0.946681687
MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 2 | 138177.514 | 308293.997 | 2.231144476
MaskQueryOperationsBenchmark.testLastTrueByte | 256 | 3 | 201281.142 | 308353.239 | 1.531952949
MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 1 | 325499.635 | 305103.491 | 0.937338965
MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 2 | 98267.327 | 304803.64 | 3.101780106
MaskQueryOperationsBenchmark.testLastTrueByte | 512 | 3 | 201072.661 | 304969.972 | 1.516715253
MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 1 | 325286.171 | 337337.209 | 1.037047496
MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 2 | 197351.915 | 331432.723 | 1.679399579
MaskQueryOperationsBenchmark.testLastTrueInt | 128 | 3 | 325173.097 | 337518.586 | 1.037965899
MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 1 | 325199.786 | 308436.805 | 0.948453284
MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 2 | 138200.527 | 308405.442 | 2.231579348
MaskQueryOperationsBenchmark.testLastTrueInt | 256 | 3 | 201240.625 | 308234.527 | 1.531671485
MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 1 | 325590.639 | 308381.757 | 0.947145649
MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 2 | 98334.197 | 308440.373 | 3.13665421
MaskQueryOperationsBenchmark.testLastTrueInt | 512 | 3 | 200832.953 | 308431.355 | 1.535760693
MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 1 | 325564.887 | 193981.861 | 0.595831641
MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 2 | 214005.351 | 153667.869 | 0.718056199
MaskQueryOperationsBenchmark.testLastTrueLong | 128 | 3 | 214061.493 | 156337.24 | 0.730337988
MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 1 | 325601.502 | 308291.032 | 0.946835411
MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 2 | 197911.182 | 308292.149 | 1.557729815
MaskQueryOperationsBenchmark.testLastTrueLong | 256 | 3 | 325608.187 | 308405.393 | 0.947167195
MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 1 | 325734.897 | 308321.619 | 0.946541564
MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 2 | 137974.465 | 308131.475 | 2.233250008
MaskQueryOperationsBenchmark.testLastTrueLong | 512 | 3 | 205479.182 | 308311.636 | 1.500451934
MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 1 | 325681.411 | 337663.377 | 1.036790451
MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 2 | 198127.51 | 337287.453 | 1.702375672
MaskQueryOperationsBenchmark.testLastTrueShort | 128 | 3 | 325519.01 | 337453.387 | 1.036662612
MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 1 | 325647.378 | 308266.5 | 0.946626691
MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 2 | 138287.837 | 308402.656 | 2.230150263
MaskQueryOperationsBenchmark.testLastTrueShort | 256 | 3 | 205375.864 | 308418.101 | 1.501725154
MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 1 | 325548.631 | 308137.064 | 0.946516233
MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 2 | 98424.074 | 308145.17 | 3.130790644
MaskQueryOperationsBenchmark.testLastTrueShort | 512 | 3 | 205381.622 | 308345.763 | 1.50133084
MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 1 | 197488.249 | 340490.471 | 1.724104967
MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 2 | 191307.785 | 354400.26 | 1.852513529
MaskQueryOperationsBenchmark.testTrueCountByte | 128 | 3 | 181206.7 | 354512.75 | 1.956399791
MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 1 | 144485.784 | 328347.7 | 2.272525995
MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 2 | 136709.938 | 328318.229 | 2.401568122
MaskQueryOperationsBenchmark.testTrueCountByte | 256 | 3 | 141501.903 | 328274.337 | 2.319928779
MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 1 | 108395.25 | 318599.11 | 2.939234976
MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 2 | 98731.287 | 318651.791 | 3.22746518
MaskQueryOperationsBenchmark.testTrueCountByte | 512 | 3 | 106344.335 | 318657.098 | 2.99646519
MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 1 | 124691.716 | 354457.62 | 2.842671762
MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 2 | 191325.138 | 354360.523 | 1.852137815
MaskQueryOperationsBenchmark.testTrueCountInt | 128 | 3 | 181480.334 | 353746.697 | 1.949228818
MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 1 | 144513.076 | 328404.916 | 2.27249274
MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 2 | 136710.717 | 328516.92 | 2.403007805
MaskQueryOperationsBenchmark.testTrueCountInt | 256 | 3 | 141631.832 | 328432.841 | 2.318919669
MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 1 | 108479.473 | 328405.877 | 3.027355019
MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 2 | 98747.682 | 328300.378 | 3.324638831
MaskQueryOperationsBenchmark.testTrueCountInt | 512 | 3 | 106378.04 | 328384.537 | 3.086957957
MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 1 | 213646.579 | 159098.437 | 0.74468048
MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 2 | 212671.379 | 162528.924 | 0.764225655
MaskQueryOperationsBenchmark.testTrueCountLong | 128 | 3 | 212649.052 | 162530.898 | 0.764315178
MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 1 | 197350.819 | 328365.924 | 1.663869072
MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 2 | 191473.127 | 328501.883 | 1.715655289
MaskQueryOperationsBenchmark.testTrueCountLong | 256 | 3 | 185529.513 | 328428.64 | 1.770223156
MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 1 | 144516.188 | 328334.76 | 2.27195835
MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 2 | 136752.367 | 328505.571 | 2.402192943
MaskQueryOperationsBenchmark.testTrueCountLong | 512 | 3 | 141445.742 | 328392.887 | 2.321688036
MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 1 | 197863.202 | 354533.342 | 1.791810394
MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 2 | 191802.914 | 354377.939 | 1.84761499
MaskQueryOperationsBenchmark.testTrueCountShort | 128 | 3 | 181773.298 | 354374.525 | 1.949541153
MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 1 | 144414.679 | 328435.088 | 2.27425003
MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 2 | 136923.991 | 328267.898 | 2.397446171
MaskQueryOperationsBenchmark.testTrueCountShort | 256 | 3 | 141545.957 | 328308.681 | 2.319449371
MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 1 | 108420.143 | 328282.998 | 3.027878297
MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 2 | 98736.441 | 328420.616 | 3.326235103
MaskQueryOperationsBenchmark.testTrueCountShort | 512 | 3 | 106432.386 | 328245.585 | 3.084076166
ALGO (1=bestcase, 2=worstcast,3=avgcase)
-------------
Commit messages:
- 8256973: Removing white spaces to satisfy jcheck.
- 8256973: Intrinsic creation for VectorMask query (lastTrue,firstTrue,trueCount) APIs
Changes: https://git.openjdk.java.net/jdk/pull/3916/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3916&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8256973
Stats: 1279 lines in 49 files changed: 1246 ins; 30 del; 3 mod
Patch: https://git.openjdk.java.net/jdk/pull/3916.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/3916/head:pull/3916
PR: https://git.openjdk.java.net/jdk/pull/3916
More information about the hotspot-compiler-dev
mailing list