RFR: 8287835: Add support for additional float/double to integral conversion for x86 [v5]

Wed Jun 8 13:25:40 UTC 2022

On Mon, 6 Jun 2022 23:27:23 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Currently the C2 JIT only supports float -> int and double -> long conversion for x86. 
>> This PR adds the support for following conversions in the c2 JIT:
>>   float -> long, short, byte
>>   double -> int, short, byte
>> 
>> The performance gain is as follows.
>> Before the patch:
>> Benchmark                                       Mode  Cnt      Score       Error   Units
>> VectorFPtoIntCastOperations.microDouble2Byte   thrpt    3  32367.971 ±  6161.118  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Int    thrpt    3  25825.251 ±  5417.104  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Long   thrpt    3  59641.958 ± 17307.177  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Short  thrpt    3  29641.505 ± 12023.015  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Byte    thrpt    3  16271.224 ±  1523.083  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Int     thrpt    3  59199.994 ± 14357.959  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Long    thrpt    3  17169.197 ±  1738.273  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Short   thrpt    3  14934.139 ±  2329.253  ops/ms
>> 
>> After the patch:
>> Benchmark                                       Mode  Cnt       Score       Error   Units
>> VectorFPtoIntCastOperations.microDouble2Byte   thrpt    3  115436.659 ± 21282.364  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Int    thrpt    3   87194.395 ±  9443.106  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Long   thrpt    3   59652.356 ±  7240.721  ops/ms
>> VectorFPtoIntCastOperations.microDouble2Short  thrpt    3  110570.719 ± 10401.620  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Byte    thrpt    3  110028.539 ± 11113.137  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Int     thrpt    3   59469.193 ± 18272.495  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Long    thrpt    3   59897.101 ±  7249.268  ops/ms
>> VectorFPtoIntCastOperations.microFloat2Short   thrpt    3   86167.554 ±  8253.232  ops/ms
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix extra space

src/hotspot/cpu/x86/x86.ad line 1892:

> 1890:       //     Conversion to long in addition needs avx512dq
> 1891:       //     Need avx512vl for size_in_bits < 512
> 1892:       if (is_integral_type(bt) && (bt != T_INT)) {

Why special check for bt != T_INT

src/hotspot/cpu/x86/x86.ad line 7349:

> 7347:         assert(to_elem_bt == T_BYTE, "required");
> 7348:         __ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
> 7349:       }

We do support F2I cast on AVX2 and that can be extended for sub-word types using 
signed saturated lane packing instructions (PACKSSDW and PACKSSWB).

src/hotspot/cpu/x86/x86.ad line 7388:

> 7386:         case T_BYTE:
> 7387:           __ evpmovsqd($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);
> 7388:           __ evpmovdb($dst$$XMMRegister, $dst$$XMMRegister, vlen_enc);

Sub-word handling can be extended for AVX2 using packing instruction sequence similar to VectorStoreMask for quad ward lanes.

src/hotspot/cpu/x86/x86.ad line 7391:

> 7389:           break;
> 7390:         default: assert(false, "%s", type2name(to_elem_bt));
> 7391:       }

Please move this to a macro assembly routine named vector_castD2X_evex

test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java line 45:

> 43:     private static final int COUNT = 16;
> 44:     private static final VectorSpecies<Float> fspec512 = FloatVector.SPECIES_512;
> 45:     private static final VectorSpecies<Double> dspec512 = DoubleVector.SPECIES_512;

Unused declarations.

test/micro/org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java line 59:

> 57:     @Benchmark
> 58:     public IntVector microFloat2Int() {
> 59:         return (IntVector)fvec512.convertShape(VectorOperators.F2I, IntVector.SPECIES_512, 0);

We can remove explicit cast by setting return type to Vector<Integer>

Applicable to all cases.

-------------

PR: https://git.openjdk.java.net/jdk/pull/9032