RFR: 8279651: [vectorapi] Implement the missing intrinsics for casting from integrals on x64 [v8]

Mon Jan 17 19:32:26 UTC 2022

On Fri, 14 Jan 2022 04:08:03 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

>> Hi,
>> 
>> This patch implements the remaining casts from integral types on x64.
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix

Overall,

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1496:

> 1494: }
> 1495: 
> 1496: void C2_MacroAssembler::load_vector(XMMRegister dst, Address src, int vlen_in_bytes) {

`loadV` and `storeV` in `x86.ad` can be migrated to these new methods as well.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4150:

> 4148:   jmp(done);
> 4149: 
> 4150:   bind(slow_path);

How common the slow path is expected to be?
Depending on the frequency, it may be sensible to extract the code into a stub shared across all the vector casts in generated code.

src/hotspot/cpu/x86/x86.ad line 1787:

> 1785:     case Op_VectorCastB2X:
> 1786:     case Op_VectorCastS2X:
> 1787:     case Op_VectorCastI2X:

I don't see any new code added for `VectorCastI2X`. Why is it so?

src/hotspot/cpu/x86/x86.ad line 1793:

> 1791:       break;
> 1792:     case Op_VectorCastL2X:
> 1793:       if (is_integral_type(bt) && size_in_bits == 256 && UseAVX < 2) {

No new code for `VectorCastL2X` to integral types? Why?

src/hotspot/cpu/x86/x86.ad line 6881:

> 6879: instruct vcastBtoX(vec dst, vec src) %{
> 6880:   predicate(UseAVX > 1 ||
> 6881:             Matcher::vector_element_basic_type(n) != T_FLOAT ||

Before the change, the implicit predicate was `bt != T_DOUBLE && size_in_bits == 256 && UseAVX < 2`. 

Why do you check `T_FLOAT` then?
Why `Matcher::vector_length_in_bytes(n) < 32` and not `Matcher::vector_length_in_bytes(n) != 32`?

src/hotspot/cpu/x86/x86.ad line 6951:

> 6949: 
> 6950: instruct vcastStoX_evex(vec dst, vec src) %{
> 6951:   predicate((Matcher::vector_element_basic_type(n) == T_BYTE && UseAVX > 2 && VM_Version::supports_avx512vlbw()) ||

The following code is redundant:

  switch (to_elem_bt) {
      case T_BYTE:
        if (!VM_Version::supports_avx512vl()) {
          vlen_enc = Assembler::AVX_512bit;
        }

src/hotspot/cpu/x86/x86.ad line 6952:

> 6950: instruct vcastStoX_evex(vec dst, vec src) %{
> 6951:   predicate((Matcher::vector_element_basic_type(n) == T_BYTE && UseAVX > 2 && VM_Version::supports_avx512vlbw()) ||
> 6952:             (Matcher::vector_element_basic_type(n) == T_FLOAT && (UseAVX > 1 || Matcher::vector_length_in_bytes(n) < 32)) ||

Same question here (as for `VectorCastB2X`): why doesn't it match the condition removed from `Matcher::match_rule_supported_vector()` (e.g., `T_FLOAT` vs `T_DOUBLE`)?

src/hotspot/cpu/x86/x86.ad line 6992:

> 6990: %}
> 6991: 
> 6992: instruct vcastBStoF(vec dst, vec src, vec xtmp) %{

Maybe `vcastBStoF_avx`?

src/hotspot/cpu/x86/x86.ad line 7150:

> 7148: 
> 7149: instruct vcastLtoFD_avx(vec dst, vec src, vec xtmp1, vec xtmp2, rRegI tmp, rFlagsReg cr) %{
> 7150:   predicate((UseAVX <= 2 || !VM_Version::supports_avx512dq()) &&

`!VM_Version::supports_avx512dq()` is equivalent to `UseAVX <= 2 || !VM_Version::supports_avx512dq()` here, isn't it?

src/hotspot/cpu/x86/x86.ad line 7167:

> 7165: 
> 7166: instruct vcastLtoFD_evex(vec dst, vec src, vec xtmp1, vec xtmp2, rRegI tmp, kReg ktmp, rFlagsReg cr) %{
> 7167:   predicate((UseAVX <= 2 || !VM_Version::supports_avx512dq()) &&

Same here. Moreover, `UseAVX <= 2` doesn't make sense for `_evex` case.

-------------

Changes requested by vlivanov (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7002