RFR: 8287835: Add support for additional float/double to integral conversion for x86

Mon Jun 6 14:36:44 UTC 2022

On Sun, 5 Jun 2022 01:42:40 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Currently the C2 JIT only supports float -> int and double -> long conversion for x86. 
>> This PR adds the support for following conversions in the c2 JIT:
>>   float -> long, short, byte
>>   double -> int, short, byte
>> 
>> The performance gain is as follows.
>> Before the patch:
>> Benchmark                                       Mode  Cnt      Score       Error   Units
>> VectorFPtoIntCastOperations.microDouble2Byte   thrpt    3  32367.971 ±  6161.118  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Int    thrpt    3  25825.251 ±  5417.104  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Long   thrpt    3  59641.958 ± 17307.177  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Short  thrpt    3  29641.505 ± 12023.015  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Byte    thrpt    3  16271.224 ±  1523.083  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Int     thrpt    3  59199.994 ± 14357.959  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Long    thrpt    3  17169.197 ±  1738.273  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Short   thrpt    3  14934.139 ±  2329.253  ops/ms
>> 
>> After the patch:
>> Benchmark                                       Mode  Cnt       Score       Error   Units
>> VectorFPtoIntCastOperations.microDouble2Byte   thrpt    3  115436.659 ± 21282.364  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Int    thrpt    3   87194.395 ±  9443.106  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Long   thrpt    3   59652.356 ±  7240.721  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Short  thrpt    3  110570.719 ± 10401.620  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Byte    thrpt    3  110028.539 ± 11113.137  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Int     thrpt    3   59469.193 ± 18272.495  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Long    thrpt    3   59897.101 ±  7249.268  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Short   thrpt    3   86167.554 ±  8253.232  ops/ms
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> src/hotspot/cpu/x86/x86.ad line 1877:
> 
>> 1875:       if (is_integral_type(bt) && !VM_Version::supports_avx512dq()) {
>> 1876:         return false;
>> 1877:       }
> 
> Overlapping conditions for the same types are confusing.

I will add comments and rephrase the checks to make it clearer.

> src/hotspot/cpu/x86/x86.ad line 1889:
> 
>> 1887:         return false;
>> 1888:       }
>> 1889:       if ((bt == T_LONG) && !VM_Version::supports_avx512dq()) {
> 
> Again overlapping conditions. So T_LONG requires both: AVX512, avx512vl and avx512dq?
> 
> What about T_INT?

T_INT doesn't need AVX512dq. Float to long conversion (T_LONG) uses evcvttps2qq, which needs AVX512dq.

-------------

PR: https://git.openjdk.java.net/jdk/pull/9032