[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"
John Rose
john.r.rose at oracle.com
Fri Aug 27 19:29:45 UTC 2021
I think I would rather see a vector-to-vector compress operation, than a vector-to-memory operation that also includes compression. Isn’t that the real underlying primitive?
> On Aug 27, 2021, at 2:54 AM, Joshua Zhu <jzhu at openjdk.java.net> wrote:
>
> Hi,
>
> I want to propose a new VectorAPI "Selective Store/Load" and share my
> implementation. Currently Alibaba's internal databases are in the
> process of applying VectorAPI and they have requirements on "Selective
> Store" for acceleration.
>
> My proposed VectorAPI is declared as below [1]:
>
> int selectiveIntoArray($type$[] a, int offset, VectorMask<$Boxtype$> m);
>
> The active elements (with their respective bit set in mask) are
> contiguously stored into the array "a". Assume N is the true count of
> mask, the elements starting from a[offset+N] till a[offset+laneCount]
> are left unchanged. The return value represents the number of elements
> store into the array and "offset + return value" is the new offset of
> the next iteration.
> 
> This API will be used like the following manner [2]:
>
> tld.conflict_cnt = 0;
> for (int i = 0; i < ARRAY_LENGTH; i += INT_PREFERRED_SPECIES.length()) {
> IntVector av = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input1, i);
> IntVector bv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input2, i);
> IntVector cv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_index, i);
> VectorMask<Integer> mask = av.compare(VectorOperators.NE, bv);
> tld.conflict_cnt += cv.selectiveIntoArray(tld.conflict_array, tld.conflict_cnt, mask);
> }
>
> My patch includes the following changes:
> * Selective Store VectorAPI for Long & Int
> * Assembler: add x86 instruction "VPCOMPRESSD" and "VPCOMPRESSQ"
> * Instruction selection: vselective_store; kmask_truecount (true count of kregister)
> * Add node "StoreVectorSelective"
> * Add a new parameter "is_selective" in inline_vector_mem_masked_operation()
> in order to distinguish Masked version or Selective version
> * jtreg cases
> * JMH benchmark
>
> TODO parts I will implement:
> * Selective Store for other types
> * Selective Load
> * Some potential optimization. Such as: when mask is allTrue, SelectiveIntoArray() -> IntoArray()
>
> Test:
> * Passed VectorAPI jtreg cases.
> * Result of JMH benchmark to evaluate API's performance in Alibaba's real scenario.
> UseAVX=3; thread number = 8; conflict data percentage: 20% (that means 20% of mask bits are true)
> http://cr.openjdk.java.net/~jzhu/8273057/jmh_benchmark_result.pdf
>
> [1] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-13cc2d6ec18e487ddae05cda671bdb6bb7ffd42ff7bc51a2e00c8c5e622bd55dR4667
> [2] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-951d02bd72a931ac34bc85d1d4e656a14f8943e143fc9282b36b9c76c1893c0cR144
> [3] failed to inline (intrinsic) by https://github.com/openjdk/panama-vector/blob/60aa8ca6dc0b3f1a3ee517db167f9660012858cd/src/hotspot/cpu/x86/x86.ad#L1769
>
> Best Regards,
> Joshua
>
> -------------
>
> Commit messages:
> - 8273057: [vector] New VectorAPI "SelectiveStore"
>
> Changes: https://git.openjdk.java.net/panama-vector/pull/115/files
> Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=115&range=00
> Issue: https://bugs.openjdk.java.net/browse/JDK-8273057
> Stats: 1432 lines in 90 files changed: 1423 ins; 0 del; 9 mod
> Patch: https://git.openjdk.java.net/panama-vector/pull/115.diff
> Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/115/head:pull/115
>
> PR: https://git.openjdk.java.net/panama-vector/pull/115
More information about the panama-dev
mailing list