[jdk16] RFR: 8259775: [Vector API] Incorrect code-gen for VectorReinterpret operation

Jie Fu jiefu at openjdk.java.net
Wed Jan 20 11:26:47 UTC 2021


On Thu, 14 Jan 2021 12:32:41 GMT, Jie Fu <jiefu at openjdk.org> wrote:

> Hi all,
> 
> The code-gen for VectorReinterpret may be wrong on x86.
> 
> Let's see the opto-assembly for the reproducer in the JBS, which was actually based on @XiaohongGong 's example in JDK-8259353 and many thanks to her.
> 066     B7: #   out( N1 ) <- in( B6 )  Freq: 0.999994
> 066     vector_reinterpret_expand XMM0,XMM0     !
> 066     store_vector [R12 + R11 << 3 + #16] (compressed oop addressing),XMM0
>  
> Please note that the dst and src [1] share the same XMM0 register and movdqu [2] should be generated for this case.
> But when dst == src, movdqu actually generates nothing [3], which leads to incorrect result;
> 
> For this case, movdqu should not be empty since the upper bits of dst should be zeroed.
> The similar error also exists for vmovdqu [4].
> 
> I think we should also change movflt [5] to movss but I just can't understand why we have 4-byte vectors.
> Isn't the shortest vectors 8-byte on x86?
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3354
> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3364
> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L2490
> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L2515
> [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L3379

Hi all,

The reason for the wrong execution is that the upper bits of vector registers fails to be zeroed.
This is because movdqu(XMMRegister dst, XMMRegister src) and vmovdqu(XMMRegister dst, XMMRegister src) were incorrectly optimized when dst == src after JDK-8223347 (Integration of Vector API, Oct 14 20:02:46 2020).
So this seems to be a regression of JDK-8223347.

The 4-byte vectors also be fixed by using movfltz since we are not recommended to use movss directly [1].
And the jtreg test has been added to reproduce this bug on both AVX256 and AVX512 machines.

Could you please review it?

Thanks.
Best regards,
Jie

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.hpp#L1048

-------------

PR: https://git.openjdk.java.net/jdk16/pull/122


More information about the hotspot-compiler-dev mailing list