RFR: 8338126 : C2 SuperWord: VectorCastF2HF / vcvtps2ph produces wrong results for vector length 2 [v2]

Tue Oct 15 07:00:11 UTC 2024

On Mon, 14 Oct 2024 18:35:52 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> src/hotspot/cpu/x86/x86.ad line 3679:
>> 
>>> 3677: 
>>> 3678: instruct vconvF2HF_mem_reg(memory mem, vec src) %{
>>> 3679:   predicate(Matcher::vector_length_in_bytes(n->in(3)->in(1)) >= 16);
>> 
>> Ok, and so what alternative path is the matcher now going to take if we have `vector_length = 8`?
>
> @eme64 It is now going to do the load from memory in register and use the register/register version for the conversion and then use the store to memory.  Please see before and after code snippets below.
> 
> Generated code snippet for 2 element float vector to float16 vector conversion
> Before:
>       vmovq  0x10(%rdx,%rbx,4),%xmm2              ; load 8 bytes from memory into xmm2 (correct)
>       vcvtps2ph $0x4,%xmm2,0x10(%rsi,%rbx,2)  ; convert to float16 and store 8 bytes to memory (incorrect)
> 
> After:
>       vmovq  0x10(%rdx,%rbx,4),%xmm15           ; load 8 bytes from memory into xmm15 (correct)
>       vcvtps2ph $0x4,%xmm15,%xmm0               ; convert to float16 into register (correct) 
>       vmovd  %xmm0,0x10(%rsi,%rbx,2)              ; store 4 byte into memory (correct)

Ah, I see. You are using a 4-element register-only `vcvtps2ph` instruction, but only use the first 2-elements of it. Great :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21480#discussion_r1800564054