RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v11]
Quan Anh Mai
qamai at openjdk.org
Tue Oct 4 09:11:28 UTC 2022
On Tue, 4 Oct 2022 06:49:53 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> @merykitty Thanks for the suggestion. I will update the instruct to use kmovwl. I will also experiment with kshiftrw and let you know.
>
>> You can use `kmovwl` instead which will relax the avx512bw constraint, however, you will need avx512vl for `evcvtps2ph`. Thanks.
>
> Yes, in general all AVX512VL targets support AVX512BW, but cloud instances give freedom to enable custom features. Regarding K0, as per section "15.6.1.1" of SDM, expectation is that K0 can appear in source and destination of regular non predication context, k0 should always contain all true mask so it should be unmodifiable for subsequent usages i.e. should not be present as destination of a mask manipulating instruction. Your suggestion is to have that in source but it may not work either. Changing existing sequence to use kmovw and replace AVX512BW with AVX512VL will again mean introducing an additional predication check for this pattern.
Ah I get it, the encoding of k0 is treated specially in predicated instructions to refer to an all-set mask, but the register itself may not actually contain that value. So usage in `kshiftrw` may fail. In that case I think we can generate an all-set mask on the fly using `kxnorw(ktmp, ktmp)` to save a GPR in this occasion. Thanks.
-------------
PR: https://git.openjdk.org/jdk/pull/9781
More information about the core-libs-dev
mailing list