[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"

John Rose john.r.rose at oracle.com
Fri Aug 27 19:29:45 UTC 2021


I think I would rather see a vector-to-vector compress operation, than a vector-to-memory operation that also includes compression.  Isn’t that the real underlying primitive?

> On Aug 27, 2021, at 2:54 AM, Joshua Zhu <jzhu at openjdk.java.net> wrote:
> 
> Hi,
> 
> I want to propose a new VectorAPI "Selective Store/Load" and share my
> implementation. Currently Alibaba's internal databases are in the
> process of applying VectorAPI and they have requirements on "Selective
> Store" for acceleration.
> 
> My proposed VectorAPI is declared as below [1]:
> 
>    int selectiveIntoArray($type$[] a, int offset, VectorMask<$Boxtype$> m);
> 
> The active elements (with their respective bit set in mask) are
> contiguously stored into the array "a". Assume N is the true count of
> mask, the elements starting from a[offset+N] till a[offset+laneCount]
> are left unchanged. The return value represents the number of elements
> store into the array and "offset + return value" is the new offset of
> the next iteration.
> ![image](https://user-images.githubusercontent.com/70769035/131108509-3dcb61f3-e8d0-4b4e-9b49-a72c077aaba6.png)
> This API will be used like the following manner [2]:
> 
>    tld.conflict_cnt = 0;
>    for (int i = 0; i < ARRAY_LENGTH; i += INT_PREFERRED_SPECIES.length()) {
>      IntVector av = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input1, i);
>      IntVector bv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input2, i);
>      IntVector cv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_index, i);
>      VectorMask<Integer> mask = av.compare(VectorOperators.NE, bv);
>      tld.conflict_cnt += cv.selectiveIntoArray(tld.conflict_array, tld.conflict_cnt, mask);
>    }
> 
> My patch includes the following changes:
>   * Selective Store VectorAPI for Long & Int
>   * Assembler: add x86 instruction "VPCOMPRESSD" and "VPCOMPRESSQ"
>   * Instruction selection: vselective_store; kmask_truecount (true count of kregister)
>   * Add node "StoreVectorSelective"
>   * Add a new parameter "is_selective" in inline_vector_mem_masked_operation()
>     in order to distinguish Masked version or Selective version
>   * jtreg cases
>   * JMH benchmark
> 
> TODO parts I will implement:
>   * Selective Store for other types
>   * Selective Load
>   * Some potential optimization. Such as: when mask is allTrue, SelectiveIntoArray() -> IntoArray()
> 
> Test:
>   * Passed VectorAPI jtreg cases.
>   * Result of JMH benchmark to evaluate API's performance in Alibaba's real scenario.
>       UseAVX=3; thread number = 8; conflict data percentage: 20% (that means 20% of mask bits are true)
>       http://cr.openjdk.java.net/~jzhu/8273057/jmh_benchmark_result.pdf
> 
> [1] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-13cc2d6ec18e487ddae05cda671bdb6bb7ffd42ff7bc51a2e00c8c5e622bd55dR4667
> [2] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-951d02bd72a931ac34bc85d1d4e656a14f8943e143fc9282b36b9c76c1893c0cR144
> [3] failed to inline (intrinsic) by https://github.com/openjdk/panama-vector/blob/60aa8ca6dc0b3f1a3ee517db167f9660012858cd/src/hotspot/cpu/x86/x86.ad#L1769
> 
> Best Regards,
> Joshua
> 
> -------------
> 
> Commit messages:
> - 8273057: [vector] New VectorAPI "SelectiveStore"
> 
> Changes: https://git.openjdk.java.net/panama-vector/pull/115/files
> Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=115&range=00
>  Issue: https://bugs.openjdk.java.net/browse/JDK-8273057
>  Stats: 1432 lines in 90 files changed: 1423 ins; 0 del; 9 mod
>  Patch: https://git.openjdk.java.net/panama-vector/pull/115.diff
>  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/115/head:pull/115
> 
> PR: https://git.openjdk.java.net/panama-vector/pull/115



More information about the panama-dev mailing list