RFR: 8294588: Auto vectorize half precision floating point conversion APIs [v5]

Tue Dec 6 23:30:52 UTC 2022

On Tue, 6 Dec 2022 19:58:20 GMT, Smita Kamath <svkamath at openjdk.org> wrote:

>> Hi All, 
>> 
>> I have added changes for autovectorizing Float.float16ToFloat and Float.floatToFloat16 API's.
>> Following are the performance numbers of JMH micro Fp16ConversionBenchmark:
>> Before code changes:
>> Benchmark | (size) | Mode | Cnt | Score | Error | Units
>> Fp16ConversionBenchmark.float16ToFloat | 2048 | thrpt | 3 | 1044.653 | ±     0.041 | ops/ms
>> Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | thrpt | 3 | 2341529.9 | ± 11765.453 | ops/ms
>> Fp16ConversionBenchmark.floatToFloat16 | 2048 | thrpt | 3 | 2156.662 | ±     0.653 | ops/ms
>> Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | thrpt | 3 | 2007988.1 | ±   361.696 | ops/ms
>> 
>> After:
>> Benchmark | (size) | Mode |  Cnt | Score | Error |   Units
>> Fp16ConversionBenchmark.float16ToFloat  | 2048 | thrpt | 3 |  20460.349 |±  372.327 |  ops/ms
>> Fp16ConversionBenchmark.float16ToFloatMemory | 2048 |  thrpt | 3 | 2342125.200 |± 9250.899  |ops/ms
>> Fp16ConversionBenchmark.floatToFloat16  |  2048 | thrpt  |  3 |   22553.977 |±  483.034 | ops/ms
>> Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | thrpt |  3 |  2007899.797 |±  150.296 | ops/ms
>> 
>> Kindly review and share your feedback.
>> 
>> Thanks.
>> Smita
>
> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Addressed review comment

Changes are straight-forward but I have few comments.

And we need to test it again.

src/hotspot/cpu/x86/assembler_x86.cpp line 1958:

> 1956:   InstructionMark im(this);
> 1957:   InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /*uses_vl */ true);
> 1958:   attributes.set_address_attributes(/* tuple_type */ EVEX_HVM, /* input_size_in_bits */ EVEX_NObit);

Is it correct to set `EVEX_*` attributes in case EVEX is switched off (by `UseAVX` flag)?

src/hotspot/cpu/x86/vm_version_x86.cpp line 959:

> 957:     _features &= ~CPU_AVX;
> 958:     _features &= ~CPU_VZEROUPPER;
> 959:     _features &= ~CPU_F16C;

Is `is_knights_family()` supports `f16c`?  We switch off some avx512 features for it. But it looks like `f16c` is not connected to `avx512`.

-------------

PR: https://git.openjdk.org/jdk/pull/11471