[vectorIntrinsics+mask] RFR: 8272971: Intrinsification of VectorMask.cast operation for all compatible vector species

Jatin Bhateja jbhateja at openjdk.java.net
Thu Aug 26 06:23:51 UTC 2021


- Patch intrinsifies VectorMask.cast operation if source and destination mask species are compatible i.e. have same vector length.
- Handles casting for both predicated/non-predicated targets.

Following is the performance data for new JMH benchmark included with the patch.

System: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (28C 2S Cascadelake Server)

Benchmark | Baseline AVX512 (ops/ms) | Withopt AVX512 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio
-- | -- | -- | -- | -- | -- | --
microMaskCastByte128ToInteger512 | 54516.035 | 112778.756 | 2.068726311 | 56353.873 | 48970.14 | 0.868975589
microMaskCastByte128ToShort256 | 55216.805 | 114020.66 | 2.064963013 | 56834.785 | 114543.015 | 2.015368141
microMaskCastByte256ToShort512 | 47392.839 | 90946.115 | 1.918984322 | 47412.246 | 43539.56 | 0.918318866
microMaskCastByte64ToInteger256 | 62578.981 | 128643.386 | 2.055696401 | 68103.798 | 131857.429 | 1.936124458
microMaskCastByte64ToLong512 | 65725.522 | 123135.03 | 1.873473595 | 68663.899 | 58686.52 | 0.854692507
microMaskCastByte64ToShort128 | 62440.621 | 121789.41 | 1.950483644 | 68775.626 | 130610.463 | 1.899080686
microMaskCastInteger128ToLong256 | 68458.06 | 130204.293 | 1.901957096 | 72769.986 | 132547.949 | 1.821464539
microMaskCastInteger128ToShort64 | 67889.419 | 126591.52 | 1.864672314 | 72177.696 | 128137.316 | 1.775303495
microMaskCastInteger256ToByte64 | 60895.223 | 130321.893 | 2.140100431 | 67920.658 | 130120.344 | 1.915769779
microMaskCastInteger256ToLong512 | 65975.311 | 129705.935 | 1.965976864 | 69022.136 | 58334.489 | 0.845156241
microMaskCastInteger256ToShort128 | 67545.659 | 125688.394 | 1.860791587 | 63210.734 | 130065.762 | 2.057653088
microMaskCastInteger512ToByte128 | 51766.31 | 115913.374 | 2.239166245 | 56546.461 | 49007.128 | 0.866670118
microMaskCastInteger512ToShort256 | 52156.663 | 109821.213 | 2.105602749 | 53074.581 | 48768.727 | 0.918871635
microMaskCastInteger64ToLong128 | 73578.517 | 63373.966 | 0.861310727 | 74943.574 | 65043.773 | 0.867903271
microMaskCastLong128ToInteger64 | 74027.908 | 63708.687 | 0.860603639 | 75332.964 | 65311.575 | 0.86697206
microMaskCastLong256ToInteger128 | 71876.726 | 123125.286 | 1.713006321 | 73417.365 | 132380.982 | 1.803129028
microMaskCastLong256ToShort64 | 72947.678 | 127544.459 | 1.748437545 | 73740.351 | 131155.599 | 1.778613706
microMaskCastLong512ToByte64 | 66746.009 | 126422.173 | 1.894078386 | 68695.16 | 58410.73 | 0.85028887
microMaskCastLong512ToInteger256 | 66989.512 | 120517.044 | 1.799043468 | 64162.04 | 58806.579 | 0.916532252
microMaskCastLong512ToShort128 | 66560.838 | 126906.819 | 1.906628925 | 68702.838 | 58956.922 | 0.85814391
microMaskCastShort128ToByte64 | 62698.789 | 126292.593 | 2.014274837 | 67675.889 | 128556.324 | 1.899588257
microMaskCastShort128ToInteger256 | 62545.978 | 130594.425 | 2.087974786 | 67611.643 | 126309.927 | 1.868168283
microMaskCastShort128ToLong512 | 65828.219 | 125557.859 | 1.90735616 | 69019.63 | 57951.985 | 0.839644968
microMaskCastShort256ToByte128 | 51423.139 | 116624.494 | 2.267938058 | 56031.712 | 116504.228 | 2.07925519
microMaskCastShort256ToInteger512 | 51563.845 | 110798.412 | 2.148761637 | 56541.831 | 49175.688 | 0.869722242
microMaskCastShort512ToByte256 | 47761.772 | 91753.708 | 1.921070014 | 45410.684 | 42683.147 | 0.939936227
microMaskCastShort64ToInteger128 | 69075.232 | 129302.738 | 1.871911744 | 72453.087 | 126654.897 | 1.748095247
microMaskCastShort64ToLong256 | 68596.655 | 130142.777 | 1.897217539 | 72278.575 | 127633.658 | 1.76585742

PS:  Around 2x gains is seen in all cases for fast path (C2 inline expansion) and slight degradation over AVX2 on slow path (interpreted) in cases where target do not support 512 bit vector due to additional call overhead.

-------------

Commit messages:
 - 8272971: Intrinsification of VectorMask.cast operation for all compatible vector species

Changes: https://git.openjdk.java.net/panama-vector/pull/113/files
 Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=113&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8272971
  Stats: 552 lines in 34 files changed: 233 ins; 129 del; 190 mod
  Patch: https://git.openjdk.java.net/panama-vector/pull/113.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/113/head:pull/113

PR: https://git.openjdk.java.net/panama-vector/pull/113


More information about the panama-dev mailing list