RFR: 8294588: Auto vectorize half precision floating point conversion APIs
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Dec 2 21:40:17 UTC 2022
On Fri, 2 Dec 2022 04:22:39 GMT, Smita Kamath <svkamath at openjdk.org> wrote:
> Hi All,
>
> I have added changes for autovectorizing Float.float16ToFloat and Float.floatToFloat16 API's.
> Following are the performance numbers of JMH micro Fp16ConversionBenchmark:
> Before code changes:
> Benchmark | (size) | Mode | Cnt | Score | Error | Units
> Fp16ConversionBenchmark.float16ToFloat | 2048 | thrpt | 3 | 1044.653 | ± 0.041 | ops/ms
> Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | thrpt | 3 | 2341529.9 | ± 11765.453 | ops/ms
> Fp16ConversionBenchmark.floatToFloat16 | 2048 | thrpt | 3 | 2156.662 | ± 0.653 | ops/ms
> Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | thrpt | 3 | 2007988.1 | ± 361.696 | ops/ms
>
> After:
> Benchmark | (size) | Mode | Cnt | Score | Error | Units
> Fp16ConversionBenchmark.float16ToFloat | 2048 | thrpt | 3 | 20460.349 |± 372.327 | ops/ms
> Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | thrpt | 3 | 2342125.200 |± 9250.899 |ops/ms
> Fp16ConversionBenchmark.floatToFloat16 | 2048 | thrpt | 3 | 22553.977 |± 483.034 | ops/ms
> Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | thrpt | 3 | 2007899.797 |± 150.296 | ops/ms
>
> Kindly review and share your feedback.
>
> Thanks.
> Smita
src/hotspot/cpu/x86/x86.ad line 1688:
> 1686: case Op_HF2FV:
> 1687: case Op_F2HFV:
> 1688: if (!VM_Version::supports_f16c() && !VM_Version::supports_avx512vl()) {
We need different check for vector flavors (HF2FV/F2HV) vs the scalar flavors (ConvF2HF/ConvHF2F).
The check needed for vector flavors is:
if (!VM_Version::supports_f16c() && !VM_Version::supports_avx512()) { return false; }
Also in vm_version_x86.cpp, the F16C features should be disabled when UseAVX is set to 0, i.e. the following
if (UseAVX < 1) {
_features &= ~CPU_AVX;
_features &= ~CPU_VZEROUPPER;
}
should be updated to:
if (UseAVX < 1) {
_features &= ~CPU_AVX;
_features &= ~CPU_VZEROUPPER;
_features &= ~CPU_F16C;
}
src/hotspot/cpu/x86/x86.ad line 2002:
> 2000: return false;
> 2001: }
> 2002: break;
This can be removed as match_rule_supported() has previously happened.
src/hotspot/cpu/x86/x86.ad line 3710:
> 3708: int src_size = Matcher::vector_length_in_bytes(this, $src);
> 3709: int dst_size = src_size * 2;
> 3710: int vlen_enc = vector_length_encoding(dst_size);
This could now be changed to:
int vlen_enc = Matcher::vector_length_encoding(this);
-------------
PR: https://git.openjdk.org/jdk/pull/11471
More information about the hotspot-compiler-dev
mailing list