RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms [v7]

Tue Sep 20 17:24:53 UTC 2022

On Tue, 20 Sep 2022 10:54:47 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:-
>>  * D2I , D2S,  D2B, F2I ,  F2S,  F2B
>>  
>> In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. 
>>  * D2I,  D2S, D2B
>>   
>> Following are the JMH micro performance results with and without patch.
>> 
>> System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) 
>> 
>> BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR
>> -- | -- | -- | -- | --
>> VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534
>> VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359
>> VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325
>> VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138
>> VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068
>> VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213
>> VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583
>> VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565
>> VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962
>> VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592
>> VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964
>> VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237
>> VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434
>> VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118
>> VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909
>> VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916
>> VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388
>> VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841
>> VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437
>> VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399
>> VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962
>> VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461
>> VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638
>> VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219
>> VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596
>> VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279
>> VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649
>> VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527
>> VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546
>> VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457
>> VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775
>> VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338
>> VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627
>> VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766
>> VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951
>> VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019
>> VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989
>> VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379
>> VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984
>> VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429
>> VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584
>> VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748
>> VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881
>> VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   8288043: Adding descriptive comments.

Good. I will test it.

You need second review.

-------------

PR: https://git.openjdk.org/jdk/pull/9748