[vectorIntrinsics+mask] RFR: 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets.

Jatin Bhateja jbhateja at openjdk.java.net
Thu Jul 22 06:40:16 UTC 2021


Intel targets supporting AVX512 feature offer predicated vector instructions. These are vector operations on selected vector lanes under the influence of opmask register. For non-AVX512 targets, masked vector operations are supported using an explicit vector blend operation after main vector operation which does the needed selection. 

This patch adds initial X86 backed support for predicated vector operations. 

Following is performance data for existing VectorAPI JMH benchmarks with the patch:
Test System:  Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Icelake server 40C 2S)

Benchmark | SIZE | Baseline (ops/ms) | WithOpts (ops/ms) | Gain
-- | -- | -- | -- | --
Int512Vector.ABSMasked | 1024 | 10132.664 | 10394.942 | 1.025884407
Int512Vector.ADDMasked | 1024 | 7785.805 | 8980.133 | 1.153398139
Int512Vector.ADDMaskedLanes | 1024 | 5809.455 | 6350.628 | 1.093153833
Int512Vector.ANDMasked | 1024 | 7776.729 | 8965.988 | 1.152925349
Int512Vector.ANDMaskedLanes | 1024 | 6717.202 | 7426.217 | 1.105552133
Int512Vector.AND_NOTMasked | 1024 | 7688.835 | 8988.659 | 1.169053439
Int512Vector.ASHRMasked | 1024 | 6808.185 | 7883.755 | 1.1579819
Int512Vector.ASHRMaskedShift | 1024 | 9523.164 | 12166.72 | 1.277592195
Int512Vector.BITWISE_BLENDMasked | 1024 | 5919.647 | 6864.988 | 1.159695502
Int512Vector.DIVMasked | 1024 | 237.174 | 236.014 | 0.995109076
Int512Vector.FIRST_NONZEROMasked | 1024 | 5387.315 | 7890.42 | 1.464629412
Int512Vector.LSHLMasked | 1024 | 6806.898 | 7881.315 | 1.157842383
Int512Vector.LSHLMaskedShift | 1024 | 9552.257 | 12153.769 | 1.272345269
Int512Vector.LSHRMasked | 1024 | 6776.605 | 7897.786 | 1.165448776
Int512Vector.LSHRMaskedShift | 1024 | 9500.087 | 12134.962 | 1.277352723
Int512Vector.MAXMaskedLanes | 1024 | 6993.149 | 7580.399 | 1.083975045
Int512Vector.MINMaskedLanes | 1024 | 6925.363 | 7450.814 | 1.075873424
Int512Vector.MULMasked | 1024 | 7732.753 | 8956.02 | 1.158192949
Int512Vector.MULMaskedLanes | 1024 | 4066.384 | 4152.375 | 1.021146798
Int512Vector.NEGMasked | 1024 | 8760.797 | 9255.063 | 1.056417926
Int512Vector.NOTMasked | 1024 | 8981.123 | 9229.573 | 1.027663578
Int512Vector.ORMasked | 1024 | 7786.787 | 8967.057 | 1.151573428
Int512Vector.ORMaskedLanes | 1024 | 6694.36 | 7450.106 | 1.112892943
Int512Vector.SUBMasked | 1024 | 7782.939 | 9001.692 | 1.156592901
Int512Vector.XORMasked | 1024 | 7785.031 | 9070.342 | 1.165100306
Int512Vector.XORMaskedLanes | 1024 | 6700.689 | 7454.73 | 1.112531861
Int512Vector.ZOMOMasked | 1024 | 6982.297 | 8313.51 | 1.190655453
Int512Vector.gatherMasked | 1024 | 361.497 | 1494.876 | 4.135237637
Int512Vector.scatterMasked | 1024 | 490.05 | 3120.425 | 6.367564534
Int512Vector.sliceMasked | 1024 | 1436.248 | 1597.805 | 1.112485448
Int512Vector.unsliceMasked | 1024 | 296.721 | 346.434 | 1.167541226
Float512Vector.ADDMasked | 1024 | 7645.873 | 9123.386 | 1.193243205
Float512Vector.ADDMaskedLanes | 1024 | 2404.371 | 2529.284 | 1.051952465
Float512Vector.DIVMasked | 1024 | 5134.602 | 5129.085 | 0.998925525
Float512Vector.FIRST_NONZEROMasked | 1024 | 5040.567 | 7078.828 | 1.404371373
Float512Vector.FMAMasked | 1024 | 5996.419 | 6902.626 | 1.151124696
Float512Vector.MAXMaskedLanes | 1024 | 1681.249 | 1727.444 | 1.027476596
Float512Vector.MINMaskedLanes | 1024 | 1610.115 | 1667.143 | 1.035418588
Float512Vector.MULMasked | 1024 | 7812.317 | 9054.137 | 1.158956683
Float512Vector.MULMaskedLanes | 1024 | 2406.81 | 2514.018 | 1.044543608
Float512Vector.NEGMasked | 1024 | 8248.933 | 9834.607 | 1.192227771
Float512Vector.SQRTMasked | 1024 | 4278.046 | 4281.009 | 1.000692606
Float512Vector.SUBMasked | 1024 | 7697.582 | 9044.305 | 1.174954031
Float512Vector.gatherMasked | 1024 | 428.428 | 1491.441 | 3.48119404
Float512Vector.scatterMasked | 1024 | 416.169 | 3216.628 | 7.729138883
Float512Vector.sliceMasked | 1024 | 1431.07 | 1609.12 | 1.124417394
Float512Vector.unsliceMasked | 1024 | 292.513 | 331.366 | 1.132824866



PS: Above data shows the performance gains for two vector species Int512, Float512.  In general for all the species we see 1.2-2.x gains on various masking operation supported uptill now.
New matcher routine `Matcher::match_rule_supported_vector_masked`   lists making operations supported by this patch.

-------------

Commit messages:
 - 8270349: Fix for 32-bit build failure.
 - 8270349: Initial X86 backend support for optimizing masking operations on AVX512 targets.

Changes: https://git.openjdk.java.net/panama-vector/pull/99/files
 Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=99&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8270349
  Stats: 2932 lines in 19 files changed: 2793 ins; 39 del; 100 mod
  Patch: https://git.openjdk.java.net/panama-vector/pull/99.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/99/head:pull/99

PR: https://git.openjdk.java.net/panama-vector/pull/99


More information about the panama-dev mailing list