RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

Thu May 12 23:59:46 UTC 2022

On Tue, 10 May 2022 12:48:25 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Hi All,
>> 
>> Patch adds the planned support for new vector operations and APIs targeted for [JEP 426: Vector API (Fourth Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173)
>> 
>> Following is the brief summary of changes:-
>> 
>> 1)  Extends the scope of existing lanewise API for following new vector operations.
>>    -  VectorOperations.BIT_COUNT: counts the number of one-bits
>>    - VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero bits
>>    - VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing zero bits
>>    - VectorOperations.REVERSE: reversing the order of bits
>>    - VectorOperations.REVERSE_BYTES: reversing the order of bytes
>>    - compress and expand bits: Semantics are based on Hacker's Delight section 7-4 Compress, or Generalized Extract.
>> 
>> 2)  Adds following new APIs to perform cross lane vector compress and expansion operations under the influence of a mask.
>>    - Vector.compress
>>    - Vector.expand 
>>    - VectorMask.compress
>> 
>> 3) Adds predicated and non-predicated versions of following new APIs to load and store the contents of vector from foreign MemorySegments. 
>>   - Vector.fromMemorySegment
>>   - Vector.intoMemorySegment
>> 
>> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support for each newly added operation.
>> 
>> 
>>  Patch has been regressed over AARCH64 and X86 targets different AVX levels. 
>> 
>>  Kindly review and share your feedback.
>> 
>>  Best Regards,
>>  Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits:
> 
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Correcting a typo.
>  - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: AARCH64 backend changes.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - ... and 1 more: https://git.openjdk.java.net/jdk/compare/3fa1c404...b021e082

Overall, looks good.

Some minor questions/suggestions follow.

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5700:

> 5698:                 as_FloatRegister($dst$$reg));
> 5699:     }
> 5700:     if (bt == T_INT) {

I find it hard to reason about the code in its current form.

Maybe make the second `if` (`bt == T_INT`) nested and move it under  `if (bt == T_SHORT || bt == T_INT)`?

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2587:

> 2585: 
> 2586: void MacroAssembler::vmovdqu(XMMRegister dst, AddressLiteral src, Register scratch_reg, int vector_len) {
> 2587:   assert(vector_len <= AVX_512bit, "unexpected vector length");

The assert becomes redundant.

src/hotspot/cpu/x86/matcher_x86.hpp line 195:

> 193:       case Op_PopCountVI:
> 194:         return ((ety == T_INT && VM_Version::supports_avx512_vpopcntdq()) ||
> 195:            (is_subword_type(ety) && VM_Version::supports_avx512_bitalg())) ? 0 : 50;

Should be easier to read when the condition is split. E.g.:

if (is_subword_type(ety)) {
  return VM_Version::supports_avx512_bitalg())) ? 0 : 50;
} else {
  assert(ety == T_INT, "sanity"); // for documentation purposes
  return VM_Version::supports_avx512_vpopcntdq() ? 0 : 50;
}

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7953:

> 7951:     StubRoutines::x86::_vector_iota_indices = generate_iota_indices("iota_indices");
> 7952: 
> 7953:     if (UsePopCountInstruction && VM_Version::supports_avx2() && !VM_Version::supports_avx512_vpopcntdq()) {

Why is the LUT unconditionally generated? `UsePopCountInstruction` still guides the usages.

src/hotspot/cpu/x86/vm_version_x86.hpp line 375:

> 373:     decl(RDTSCP,            "rdtscp",            48) /* RDTSCP instruction */ \
> 374:     decl(RDPID,             "rdpid",             49) /* RDPID instruction */ \
> 375:     decl(FSRM,              "fsrm",              50) /* Fast Short REP MOV */ \

`test/lib-test/jdk/test/whitebox/CPUInfoTest.java` should be adjusted as well, shouldn't it?

src/hotspot/cpu/x86/x86.ad line 2113:

> 2111: 
> 2112:     case Op_CountLeadingZerosV:
> 2113:       if ((bt == T_INT || bt == T_LONG) && VM_Version::supports_avx512cd()) {

Newly introduced `is_non_subword_integral_type(bt)` can be used here instead of `bt == T_INT || bt == T_LONG`.

src/hotspot/share/classfile/vmIntrinsics.hpp line 1152:

> 1150:                                       "Ljdk/internal/vm/vector/VectorSupport$ComExpOperation;)"                                                \
> 1151:                                       "Ljdk/internal/vm/vector/VectorSupport$VectorPayload;")                                                  \
> 1152:    do_name(vector_comexp_op_name,     "comExpOp")                                                                                              \

I don't see much value in trying to shorten the name by abbreviating it. I find it easier to read in an expanded form:
` compressExpandOp`, `vector_compress_expand_op_name`, `_VectorCompressExpand`, etc.

src/hotspot/share/opto/c2compiler.cpp line 521:

> 519:     if (!Matcher::match_rule_supported(Op_SignumF)) return false;
> 520:     break;
> 521:   case vmIntrinsics::_VectorComExp:

Why `_VectorComExp` intrinsic is special? Other vector intrinsics are handled later and in a different manner.

What about `ExpandV` case?

src/hotspot/share/opto/compile.cpp line 3416:

> 3414: 
> 3415:   case Op_ReverseBytesV:
> 3416:   case Op_ReverseV: {

Can you elaborate, please, why it is performed so late in the optimization phase (at the very end during graph reshaping) and not during GVN?

-------------

PR: https://git.openjdk.java.net/jdk/pull/8425