RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF)
Emanuel Peter
epeter at openjdk.org
Fri Jun 9 10:03:47 UTC 2023
On Fri, 9 Jun 2023 00:59:21 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Context: `Float.floatToFloat16` -> `vcvtps2ph`.
>>
>> **Problem**
>>
>> vcvtps2ph
>> pre=Assembler::VEX_SIMD_66
>> opc=Assembler::VEX_OPCODE_0F_3A
>> VEX.128.66.0F3A
>> requires F16C
>>
>> https://www.felixcloutier.com/x86/vcvtps2ph
>>
>> So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`.
>>
>> There is also a AVX512 version, but it requires `AVX512VL and AVX512F`.
>>
>> So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`.
>>
>> **Suggested Solution**
>> As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments.
>>
>> **Testing**
>> I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31).
>>
>> Running: tier1-6 + stress testing.
>
> vcvtps2ph is correct. The problem is that the register from XMM16-XMM31 is being passed for KNL. The higher bank register is and AVX512 feature. The fix would be:
> --- a/src/hotspot/cpu/x86/x86.ad
> +++ b/src/hotspot/cpu/x86/x86.ad
> @@ -3638,7 +3638,7 @@ instruct sqrtD_reg(regD dst) %{
> ins_pipe(pipe_slow);
> %}
>
> -instruct convF2HF_reg_reg(rRegI dst, regF src, regF tmp) %{
> +instruct convF2HF_reg_reg(rRegI dst, vlRegF src, vlRegF tmp) %{
> effect(TEMP tmp);
> match(Set dst (ConvF2HF src));
> ins_cost(125);
> @@ -3682,7 +3682,7 @@ instruct vconvF2HF_mem_reg(memory mem, vec src) %{
> ins_pipe( pipe_slow );
> %}
>
> -instruct convHF2F_reg_reg(regF dst, rRegI src) %{
> +instruct convHF2F_reg_reg(vlRegF dst, rRegI src) %{
> match(Set dst (ConvHF2F src));
> format %{ "vcvtph2ps $dst,$src" %}
> ins_encode %{
@sviswa7 Thanks for the quick response! Yes, I was finally able to test it with `sde`, and got an error:
$ /oracle-work/sde-external-9.21.1-2023-04-24-lin/sde -knl -- ./java -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting -XX:CompileCommand=compileonly,Test2::test -XX:CompileCommand=printcompilation,Test2::* -XX:-TieredCompilation -Xbatch Test2.java
CompileCommand: compileonly Test2.test bool compileonly = true
CompileCommand: PrintCompilation Test2.* bool PrintCompilation = true
57913 82 % b Test2::test @ 2 (30 bytes)
61114 83 b Test2::test (30 bytes)
TID 1 SDE-ERROR: Executed instruction not valid for specified chip (KNL): 0x7fa0c11a5d75: vcvtps2ph xmm0, xmm16, 0x4
Instruction bytes are: 62 e3 7d 08 1d c0 04
Looks like I would have been indeed using `xmm16` which is not allowed on KNL.
Ok, I think I also understand why your fix works: `vlRegF` is defined with `constraint(ALLOC_IN_RC(float_reg_vl))`, and `float_reg_vl` is defined as
reg_class_dynamic float_reg_vl(float_reg_evex, float_reg_legacy, %{ VM_Version::supports_evex() && VM_Version::supports_avx512vl() %} );
`reg_class_dynamic` evaluates the condition (checks if we have `evex` and `avx512vl`), and if we have it it picks the larger `float_reg_evex` (XMM0-XMM31), else it picks `float_reg_legacy` (XMM0-XMM15).
@sviswa7 thanks for the patch!
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1584103090
More information about the hotspot-compiler-dev
mailing list