RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms

Mon Aug 29 07:02:10 UTC 2022

On Thu, 4 Aug 2022 16:20:10 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

> Hi All,
> 
> This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:-
>  * D2I , D2S,  D2B, F2I ,  F2S,  F2B
>  
> In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. 
>  * D2I,  D2S, D2B
>   
> Following are the JMH micro performance results with and without patch.
> 
> System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) 
> 
> BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR
> -- | -- | -- | -- | --
> VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534
> VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359
> VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325
> VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138
> VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068
> VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213
> VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583
> VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565
> VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962
> VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592
> VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964
> VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237
> VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434
> VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118
> VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909
> VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916
> VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388
> VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841
> VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437
> VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399
> VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962
> VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461
> VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638
> VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219
> VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596
> VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279
> VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649
> VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527
> VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546
> VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457
> VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775
> VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338
> VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627
> VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766
> VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951
> VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019
> VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989
> VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379
> VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984
> VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429
> VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584
> VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748
> VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881
> VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

I can run some testing in our system once you resolved the merge conflicts.

-------------

PR: https://git.openjdk.org/jdk/pull/9748