[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"

Joshua Zhu jzhu at openjdk.java.net
Wed Oct 6 12:41:27 UTC 2021


On Fri, 27 Aug 2021 09:47:10 GMT, Joshua Zhu <jzhu at openjdk.org> wrote:

> Hi,
> 
> I want to propose a new VectorAPI "Selective Store/Load" and share my
> implementation. Currently Alibaba's internal databases are in the
> process of applying VectorAPI and they have requirements on "Selective
> Store" for acceleration.
> 
> My proposed VectorAPI is declared as below [1]:
> 
>     int selectiveIntoArray($type$[] a, int offset, VectorMask<$Boxtype$> m);
> 
> The active elements (with their respective bit set in mask) are
> contiguously stored into the array "a". Assume N is the true count of
> mask, the elements starting from a[offset+N] till a[offset+laneCount]
> are left unchanged. The return value represents the number of elements
> store into the array and "offset + return value" is the new offset of
> the next iteration.
> ![image](https://user-images.githubusercontent.com/70769035/131108509-3dcb61f3-e8d0-4b4e-9b49-a72c077aaba6.png)
> This API will be used like the following manner [2]:
> 
>     tld.conflict_cnt = 0;
>     for (int i = 0; i < ARRAY_LENGTH; i += INT_PREFERRED_SPECIES.length()) {
>       IntVector av = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input1, i);
>       IntVector bv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input2, i);
>       IntVector cv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_index, i);
>       VectorMask<Integer> mask = av.compare(VectorOperators.NE, bv);
>       tld.conflict_cnt += cv.selectiveIntoArray(tld.conflict_array, tld.conflict_cnt, mask);
>     }
> 
> My patch includes the following changes:
>   * Selective Store VectorAPI for Long & Int
>   * Assembler: add x86 instruction "VPCOMPRESSD" and "VPCOMPRESSQ"
>   * Instruction selection: vselective_store; kmask_truecount (true count of kregister)
>   * Add node "StoreVectorSelective"
>   * Add a new parameter "is_selective" in inline_vector_mem_masked_operation()
>     in order to distinguish Masked version or Selective version
>   * jtreg cases
>   * JMH benchmark
>       
> TODO parts I will implement:
>   * Selective Store for other types
>   * Selective Load
>   * Some potential optimization. Such as: when mask is allTrue, SelectiveIntoArray() -> IntoArray()
> 
> Test:
>   * Passed VectorAPI jtreg cases.
>   * Result of JMH benchmark to evaluate API's performance in Alibaba's real scenario.
>       UseAVX=3; thread number = 8; conflict data percentage: 20% (that means 20% of mask bits are true)
>       http://cr.openjdk.java.net/~jzhu/8273057/jmh_benchmark_result.pdf
> 
> [1] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-13cc2d6ec18e487ddae05cda671bdb6bb7ffd42ff7bc51a2e00c8c5e622bd55dR4667
> [2] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-951d02bd72a931ac34bc85d1d4e656a14f8943e143fc9282b36b9c76c1893c0cR144
> [3] failed to inline (intrinsic) by https://github.com/openjdk/panama-vector/blob/60aa8ca6dc0b3f1a3ee517db167f9660012858cd/src/hotspot/cpu/x86/x86.ad#L1769
> 
> Best Regards,
> Joshua

> I started off the discussion last week as part of https://github.com/[/pull/143](https://github.com/openjdk/panama-vector/pull/143). The email thread, with additional inputs, is at: https://mail.openjdk.java.net/pipermail/panama-dev/2021-October/015223.html
> 
> Once you are back from vacation, please do join the discussion and development.

Okay. I will send out what I already implemented to avoid possible repetitive work.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/115


More information about the panama-dev mailing list