RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms

Fri Aug 5 02:32:35 UTC 2022

Hi All,

This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:-
 * D2I , D2S,  D2B, F2I ,  F2S,  F2B

In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. 
 * D2I,  D2S, D2B

Following are the JMH micro performance results with and without patch.

System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) 

BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR
-- | -- | -- | -- | --
VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534
VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359
VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325
VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138
VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068
VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213
VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583
VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565
VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962
VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592
VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964
VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237
VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434
VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118
VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909
VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916
VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388
VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841
VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437
VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399
VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962
VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461
VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638
VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219
VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596
VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279
VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649
VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527
VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546
VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457
VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775
VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338
VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627
VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766
VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951
VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019
VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989
VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379
VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984
VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429
VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584
VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748
VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881
VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573

Kindly review and share your feedback.

Best Regards,
Jatin

-------------

Commit messages:
 - 8288043: Changing file permission.
 - 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms

Changes: https://git.openjdk.org/jdk/pull/9748/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9748&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8288043
  Stats: 847 lines in 8 files changed: 734 ins; 47 del; 66 mod
  Patch: https://git.openjdk.org/jdk/pull/9748.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9748/head:pull/9748

PR: https://git.openjdk.org/jdk/pull/9748