RFR: 8271078: jdk/incubator/vector/Float128VectorTests.java failed a subtest [v4]

Wed May 18 03:07:53 UTC 2022

On Tue, 17 May 2022 03:34:58 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> Dean Long has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Just do full 512-bit memory accesses when -XX:+UseKNLSetting is set
>
> Regarding @sviswa7 question.
> 
> The comment in [sharedRuntime_x86_64.cpp#L458](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp#L458) says:
> `16 bytes XMM registers are saved by default using fxsave/fxrstor instructions.`
> That is why we did not care about saving 128 bit xmm registers before AVX512. Unfortunately `fxsave` saves only `xmm0-xmm15`. So we save `xmm16-xmm31` manually in the code Dean is fixing. But we save only 64-bits before.
> 
> What I was surprise that there is no evex instruction to save only 128 bit of `xmm15-31` registers if `avx512vl` is not supported. I see specific asserts regarding that: [macroAssembler_x86.cpp#L2561](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L2561)

> @vnkozlov vec_spill_helper in x86.ad shows how to save 128 bits or 256 bits on platforms where avx512vl is not supported.

Thanks, I could change the code to use vextractf32x4/vinsertf32x4, but if it's really important to optimize memory bandwidth for this case, then we should probably go with the C2-specific solution #2.

@sviswa7, the problem happens when this C2 register class

`reg_class_dynamic vectorx_reg_vlbwdq(vectorx_reg_evex, vectorx_reg_legacy, %{ VM_Version::supports_avx512vlbwdq() %} );
`

selects `vectorx_reg_evex` based on `supports_avx512vlbwdq()`.  This will still use the "else" path because 16-byte vectors are not considered "wide" by is_wide_vector().

-------------

PR: https://git.openjdk.java.net/jdk/pull/8690