[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"
Joshua Zhu
jzhu at openjdk.java.net
Wed Oct 6 12:41:27 UTC 2021
On Fri, 27 Aug 2021 09:47:10 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:
> Hi,
>
> I want to propose a new VectorAPI "Selective Store/Load" and share my
> implementation. Currently Alibaba's internal databases are in the
> process of applying VectorAPI and they have requirements on "Selective
> Store" for acceleration.
>
> My proposed VectorAPI is declared as below [1]:
>
> int selectiveIntoArray($type$[] a, int offset, VectorMask<$Boxtype$> m);
>
> The active elements (with their respective bit set in mask) are
> contiguously stored into the array "a". Assume N is the true count of
> mask, the elements starting from a[offset+N] till a[offset+laneCount]
> are left unchanged. The return value represents the number of elements
> store into the array and "offset + return value" is the new offset of
> the next iteration.
> 
> This API will be used like the following manner [2]:
>
> tld.conflict_cnt = 0;
> for (int i = 0; i < ARRAY_LENGTH; i += INT_PREFERRED_SPECIES.length()) {
> IntVector av = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input1, i);
> IntVector bv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input2, i);
> IntVector cv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_index, i);
> VectorMask<Integer> mask = av.compare(VectorOperators.NE, bv);
> tld.conflict_cnt += cv.selectiveIntoArray(tld.conflict_array, tld.conflict_cnt, mask);
> }
>
> My patch includes the following changes:
> * Selective Store VectorAPI for Long & Int
> * Assembler: add x86 instruction "VPCOMPRESSD" and "VPCOMPRESSQ"
> * Instruction selection: vselective_store; kmask_truecount (true count of kregister)
> * Add node "StoreVectorSelective"
> * Add a new parameter "is_selective" in inline_vector_mem_masked_operation()
> in order to distinguish Masked version or Selective version
> * jtreg cases
> * JMH benchmark
>
> TODO parts I will implement:
> * Selective Store for other types
> * Selective Load
> * Some potential optimization. Such as: when mask is allTrue, SelectiveIntoArray() -> IntoArray()
>
> Test:
> * Passed VectorAPI jtreg cases.
> * Result of JMH benchmark to evaluate API's performance in Alibaba's real scenario.
> UseAVX=3; thread number = 8; conflict data percentage: 20% (that means 20% of mask bits are true)
> http://cr.openjdk.java.net/~jzhu/8273057/jmh_benchmark_result.pdf
>
> [1] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-13cc2d6ec18e487ddae05cda671bdb6bb7ffd42ff7bc51a2e00c8c5e622bd55dR4667
> [2] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-951d02bd72a931ac34bc85d1d4e656a14f8943e143fc9282b36b9c76c1893c0cR144
> [3] failed to inline (intrinsic) by https://github.com/openjdk/panama-vector/blob/60aa8ca6dc0b3f1a3ee517db167f9660012858cd/src/hotspot/cpu/x86/x86.ad#L1769
>
> Best Regards,
> Joshua
> I started off the discussion last week as part of https://github.com/[/pull/143](https://github.com/openjdk/panama-vector/pull/143). The email thread, with additional inputs, is at: https://mail.openjdk.java.net/pipermail/panama-dev/2021-October/015223.html
>
> Once you are back from vacation, please do join the discussion and development.
Okay. I will send out what I already implemented to avoid possible repetitive work.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/115
More information about the panama-dev
mailing list