RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11]

Thu Oct 6 06:28:06 UTC 2022

On Tue, 4 Oct 2022 09:07:42 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>>> You can use `kmovwl` instead which will relax the avx512bw constraint, however, you will need avx512vl for `evcvtps2ph`. Thanks.
>> 
>> Yes, in general all AVX512VL targets support AVX512BW, but cloud instances give freedom to enable custom features. Regarding K0, as per section "15.6.1.1" of SDM, expectation is that K0 can appear in source and destination of regular non predication context, k0 should always contain all true mask so it should be unmodifiable for subsequent usages i.e. should not be present as destination of a mask manipulating instruction. Your suggestion is to have that in source but it may not work either. Changing existing sequence to use kmovw and replace AVX512BW with AVX512VL will again mean introducing an additional predication check for this pattern.
>
> Ah I get it, the encoding of k0 is treated specially in predicated instructions to refer to an all-set mask, but the register itself may not actually contain that value. So usage in `kshiftrw` may fail. In that case I think we can generate an all-set mask on the fly using `kxnorw(ktmp, ktmp, ktmp)` to save a GPR in this occasion. Thanks.

Hi @merykitty, I am seeing performance regression with kxnorw instruction. So I have updated the PR with kmovwl. Thanks.

-------------

PR: https://git.openjdk.org/jdk/pull/9781