RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2]
Emanuel Peter
epeter at openjdk.org
Tue Oct 15 07:00:11 UTC 2024
On Mon, 14 Oct 2024 18:35:52 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> src/hotspot/cpu/x86/x86.ad line 3679:
>>
>>> 3677:
>>> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{
>>> 3679: predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16);
>>
>> Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`?
>
> @eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory. Please see before and after code snippets below.
>
> Generated code snippet for 2 element float vector to float16 vector conversion
> Before:
> vmovq 0x10(%rdx,%rbx,4),%xmm2 ; load 8 bytes from memory into xmm2 (correct)
> vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2) ; convert to float16 and store 8 bytes to memory (incorrect)
>
> After:
> vmovq 0x10(%rdx,%rbx,4),%xmm15 ; load 8 bytes from memory into xmm15 (correct)
> vcvtps2ph $0x4,%xmm15,%xmm0 ; convert to float16 into register (correct)
> vmovd %xmm0,0x10(%rsi,%rbx,2) ; store 4 byte into memory (correct)
Ah, I see. You are using a 4-element register-only `vcvtps2ph` instruction, but only use the first 2-elements of it. Great :)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800564054
More information about the hotspot-compiler-dev
mailing list