[vectorIntrinsics+mask] Integrated: 8272971: Intrinsification of VectorMask.cast operation for all compatible vector species

Jatin Bhateja jbhateja at openjdk.java.net
Mon Aug 30 14:25:45 UTC 2021


On Thu, 26 Aug 2021 06:18:45 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> - Patch intrinsifies VectorMask.cast operation if source and destination mask species are compatible i.e. have same vector length.
> - Handles casting for both predicated/non-predicated targets.
> 
> Following is the performance data for new JMH benchmark included with the patch.
> 
> System: Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (28C 2S Cascadelake Server)
> Benchmark | Baseline AVX512 (ops/ms) | Withopt AVX512 (ops/ms) | Gain ratio | Baseline AVX2 (ops/ms) | Withopt AVX2 (ops/ms) | Gain ratio
> -- | -- | -- | -- | -- | -- | --
> microMaskCastByte128ToInteger512 | 54516.035 | 112778.756 | 2.068726311 | 56144.479 | 48677.988 | 0.867012908
> microMaskCastByte128ToShort256 | 55216.805 | 114020.66 | 2.064963013 | 52357.222 | 113713.843 | 2.171884578
> microMaskCastByte256ToShort512 | 47392.839 | 90946.115 | 1.918984322 | 46976.122 | 44040.585 | 0.937510018
> microMaskCastByte64ToInteger256 | 62578.981 | 128643.386 | 2.055696401 | 64291.206 | 125241.322 | 1.948031928
> microMaskCastByte64ToLong512 | 65725.522 | 123135.03 | 1.873473595 | 63500.39 | 57353.881 | 0.903205177
> microMaskCastByte64ToShort128 | 62440.621 | 121789.41 | 1.950483644 | 68406.484 | 129829.223 | 1.897908143
> microMaskCastInteger128ToLong256 | 68458.06 | 130204.293 | 1.901957096 | 73194.15 | 129671.204 | 1.771606119
> microMaskCastInteger128ToShort64 | 67889.419 | 126591.52 | 1.864672314 | 72413.82 | 129555.214 | 1.789095148
> microMaskCastInteger256ToByte64 | 60895.223 | 130321.893 | 2.140100431 | 64238.202 | 126321.452 | 1.966453731
> microMaskCastInteger256ToLong512 | 65975.311 | 129705.935 | 1.965976864 | 68179.69 | 57691.751 | 0.846172093
> microMaskCastInteger256ToShort128 | 67545.659 | 125688.394 | 1.860791587 | 63548.106 | 122347.947 | 1.925280779
> microMaskCastInteger512ToByte128 | 51766.31 | 115913.374 | 2.239166245 | 55993.494 | 49020.628 | 0.875470068
> microMaskCastInteger512ToShort256 | 52156.663 | 109821.213 | 2.105602749 | 56366.012 | 48907.786 | 0.867682212
> microMaskCastInteger64ToLong128 | 73578.517 | 63373.966 | 0.861310727 | 74174.816 | 63532.575 | 0.856524875
> microMaskCastLong128ToInteger64 | 74027.908 | 63708.687 | 0.860603639 | 68350.908 | 64608.882 | 0.945252724
> microMaskCastLong256ToInteger128 | 71876.726 | 123125.286 | 1.713006321 | 69808.173 | 129450.203 | 1.854370304
> microMaskCastLong256ToShort64 | 72947.678 | 127544.459 | 1.748437545 | 72577.142 | 129282.92 | 1.781317319
> microMaskCastLong512ToByte64 | 66746.009 | 126422.173 | 1.894078386 | 68758.915 | 58392.958 | 0.849241993
> microMaskCastLong512ToInteger256 | 66989.512 | 120517.044 | 1.799043468 | 62663.689 | 58091.934 | 0.927042996
> microMaskCastLong512ToShort128 | 66560.838 | 126906.819 | 1.906628925 | 64319.673 | 58479.2 | 0.909196165
> microMaskCastShort128ToByte64 | 62698.789 | 126292.593 | 2.014274837 | 68764.768 | 131602.165 | 1.91380221
> microMaskCastShort128ToInteger256 | 62545.978 | 130594.425 | 2.087974786 | 63122.811 | 131626.603 | 2.085246219
> microMaskCastShort128ToLong512 | 65828.219 | 125557.859 | 1.90735616 | 68457.314 | 58924.963 | 0.86075482
> microMaskCastShort256ToByte128 | 51423.139 | 116624.494 | 2.267938058 | 55950.597 | 111919.098 | 2.000319997
> microMaskCastShort256ToInteger512 | 51563.845 | 110798.412 | 2.148761637 | 54465.523 | 48667.357 | 0.893544289
> microMaskCastShort512ToByte256 | 47761.772 | 91753.708 | 1.921070014 | 47341.838 | 44144.299 | 0.932458495
> microMaskCastShort64ToInteger128 | 69075.232 | 129302.738 | 1.871911744 | 71861.612 | 125784.021 | 1.75036459
> microMaskCastShort64ToLong256 | 68596.655 | 130142.777 | 1.897217539 | 72313.591 | 130789.753 | 1.808646911
> 
> 
> PS:  Around 2x gains is seen in all cases for fast path (C2 inline expansion) and slight degradation over AVX2 on slow path (interpreted) in cases where target do not support 512 bit vector due to additional call overhead.

This pull request has now been integrated.

Changeset: d5eb1297
Author:    Jatin Bhateja <jbhateja at openjdk.org>
URL:       https://git.openjdk.java.net/panama-vector/commit/d5eb1297b9181b77a9d06952186e59cae7a3cd79
Stats:     572 lines in 35 files changed: 251 ins; 129 del; 192 mod

8272971: Intrinsification of VectorMask.cast operation for all compatible vector species

Reviewed-by: sviswanathan, psandoz

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/113


More information about the panama-dev mailing list