RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]
Vladimir Ivanov
vlivanov at openjdk.java.net
Thu May 12 23:59:46 UTC 2022
On Tue, 10 May 2022 12:48:25 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Hi All,
>>
>> Patch adds the planned support for new vector operations and APIs targeted for [JEP 426: Vector API (Fourth Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173)
>>
>> Following is the brief summary of changes:-
>>
>> 1) Extends the scope of existing lanewise API for following new vector operations.
>> - VectorOperations.BIT_COUNT: counts the number of one-bits
>> - VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero bits
>> - VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing zero bits
>> - VectorOperations.REVERSE: reversing the order of bits
>> - VectorOperations.REVERSE_BYTES: reversing the order of bytes
>> - compress and expand bits: Semantics are based on Hacker's Delight section 7-4 Compress, or Generalized Extract.
>>
>> 2) Adds following new APIs to perform cross lane vector compress and expansion operations under the influence of a mask.
>> - Vector.compress
>> - Vector.expand
>> - VectorMask.compress
>>
>> 3) Adds predicated and non-predicated versions of following new APIs to load and store the contents of vector from foreign MemorySegments.
>> - Vector.fromMemorySegment
>> - Vector.intoMemorySegment
>>
>> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support for each newly added operation.
>>
>>
>> Patch has been regressed over AARCH64 and X86 targets different AVX levels.
>>
>> Kindly review and share your feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits:
>
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
> - 8284960: Correcting a typo.
> - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
> - 8284960: AARCH64 backend changes.
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
> - ... and 1 more: https://git.openjdk.java.net/jdk/compare/3fa1c404...b021e082
Overall, looks good.
Some minor questions/suggestions follow.
src/hotspot/cpu/aarch64/aarch64_neon.ad line 5700:
> 5698: as_FloatRegister($dst$$reg));
> 5699: }
> 5700: if (bt == T_INT) {
I find it hard to reason about the code in its current form.
Maybe make the second `if` (`bt == T_INT`) nested and move it under `if (bt == T_SHORT || bt == T_INT)`?
src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2587:
> 2585:
> 2586: void MacroAssembler::vmovdqu(XMMRegister dst, AddressLiteral src, Register scratch_reg, int vector_len) {
> 2587: assert(vector_len <= AVX_512bit, "unexpected vector length");
The assert becomes redundant.
src/hotspot/cpu/x86/matcher_x86.hpp line 195:
> 193: case Op_PopCountVI:
> 194: return ((ety == T_INT && VM_Version::supports_avx512_vpopcntdq()) ||
> 195: (is_subword_type(ety) && VM_Version::supports_avx512_bitalg())) ? 0 : 50;
Should be easier to read when the condition is split. E.g.:
if (is_subword_type(ety)) {
return VM_Version::supports_avx512_bitalg())) ? 0 : 50;
} else {
assert(ety == T_INT, "sanity"); // for documentation purposes
return VM_Version::supports_avx512_vpopcntdq() ? 0 : 50;
}
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7953:
> 7951: StubRoutines::x86::_vector_iota_indices = generate_iota_indices("iota_indices");
> 7952:
> 7953: if (UsePopCountInstruction && VM_Version::supports_avx2() && !VM_Version::supports_avx512_vpopcntdq()) {
Why is the LUT unconditionally generated? `UsePopCountInstruction` still guides the usages.
src/hotspot/cpu/x86/vm_version_x86.hpp line 375:
> 373: decl(RDTSCP, "rdtscp", 48) /* RDTSCP instruction */ \
> 374: decl(RDPID, "rdpid", 49) /* RDPID instruction */ \
> 375: decl(FSRM, "fsrm", 50) /* Fast Short REP MOV */ \
`test/lib-test/jdk/test/whitebox/CPUInfoTest.java` should be adjusted as well, shouldn't it?
src/hotspot/cpu/x86/x86.ad line 2113:
> 2111:
> 2112: case Op_CountLeadingZerosV:
> 2113: if ((bt == T_INT || bt == T_LONG) && VM_Version::supports_avx512cd()) {
Newly introduced `is_non_subword_integral_type(bt)` can be used here instead of `bt == T_INT || bt == T_LONG`.
src/hotspot/share/classfile/vmIntrinsics.hpp line 1152:
> 1150: "Ljdk/internal/vm/vector/VectorSupport$ComExpOperation;)" \
> 1151: "Ljdk/internal/vm/vector/VectorSupport$VectorPayload;") \
> 1152: do_name(vector_comexp_op_name, "comExpOp") \
I don't see much value in trying to shorten the name by abbreviating it. I find it easier to read in an expanded form:
` compressExpandOp`, `vector_compress_expand_op_name`, `_VectorCompressExpand`, etc.
src/hotspot/share/opto/c2compiler.cpp line 521:
> 519: if (!Matcher::match_rule_supported(Op_SignumF)) return false;
> 520: break;
> 521: case vmIntrinsics::_VectorComExp:
Why `_VectorComExp` intrinsic is special? Other vector intrinsics are handled later and in a different manner.
What about `ExpandV` case?
src/hotspot/share/opto/compile.cpp line 3416:
> 3414:
> 3415: case Op_ReverseBytesV:
> 3416: case Op_ReverseV: {
Can you elaborate, please, why it is performed so late in the optimization phase (at the very end during graph reshaping) and not during GVN?
-------------
PR: https://git.openjdk.java.net/jdk/pull/8425
More information about the hotspot-compiler-dev
mailing list