RFR: 8318650: Optimized subword gather for x86 targets. [v3]
Jatin Bhateja
jbhateja at openjdk.org
Sun Nov 5 13:05:23 UTC 2023
On Fri, 3 Nov 2023 23:07:44 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Restricting masked sub-word gather to AVX512 target to align with integral gather support.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606:
>
>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt, XMMRegister dst, Register base, Register idx_base,
>> 1605: Register offset, Register rtmp, int vlen_enc) {
>> 1606: vpxor(dst, dst, dst, vlen_enc);
>
> Why the vpxor here and in vpgather8b. These are not masked gathers.
This is to clear the vector gathering 8 subwords with each iteration.
> src/hotspot/cpu/x86/x86.ad line 1921:
>
>> 1919: case Op_LoadVectorGatherMasked:
>> 1920: if (!is_subword_type(bt) && size_in_bits < 512 && !VM_Version::supports_avx512vl()) {
>> 1921: return false;
>
> Also need to return false when size_in_bits == 64 for non subword types.
match_rule_supported_vector called in the beginning will enforce these checks.
> src/hotspot/cpu/x86/x86.ad line 1939:
>
>> 1937: // fallthrough
>> 1938: case Op_LoadVectorGather:
>> 1939: if (!is_subword_type(bt) && size_in_bits == 64 ) {
>
> Since this is a fallthrough case, the change also applies to StoreVectorScatter*. The !is_subword_type() is needed only for LoadVectorGather.
We do not intrinsify sub-word scatter operations currently.
> src/hotspot/cpu/x86/x86.ad line 4074:
>
>> 4072: BasicType elem_bt = Matcher::vector_element_basic_type(this);
>> 4073: assert(!is_subword_type(elem_bt), "sanity"); // T_INT, T_LONG, T_FLOAT, T_DOUBLE
>> 4074: __ vpcmpeqd($mask$$XMMRegister, $mask$$XMMRegister, $mask$$XMMRegister, vlen_enc);
>
> vpcmpeqd is expensive instruction as compared to movdqu and also unrelated to subword type support.
compare instruction here does not access a memory operand, hence its cheaper compared to memory loads.
> src/hotspot/share/opto/vectorIntrinsics.cpp line 1579:
>
>> 1577: Node* index = argument(11);
>> 1578: Node* indexMap = argument(12);
>> 1579: Node* indexM = argument(13);
>
> Could be renamed as follows for better understanding: index -> offset, indexM -> indexOffset. Also this should be moved under else part of if (is_scatter).
Naming scheme is based on the actual names used in intrinsic entry point for these arguments.
> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 3830:
>
>> 3828: // Check indices are within array bounds.
>> 3829: // FIXME: Check index under mask controlling.
>> 3830: for (int i = 0; i < vsp.length(); i += lsp.length()) {
>
> The check (i < vsp.length() ) could instead be (i + lsp.length() <= vsp.length()).
Bit sizes of vector shape is multiple of 128 and we do not support NPOT sizes.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571225
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571157
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571152
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571178
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571683
PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1382571546
More information about the hotspot-compiler-dev
mailing list